Introduction to Supervised Learning
Supervised Learning is a category of machine learning algorithms that are based upon the labeled data set. Predictive analytics is achieved for this category of algorithms where the outcome of the algorithm that is known as the dependent variable depends upon the value of independent data variables. It is based upon the training dataset, and it improves through iterations. There are mainly two categories of supervised learning, such as regression and classification. It is implemented into several real-world scenarios, such as predicting sales reviews for the next quarter in the business for a particular product for a retail organization.
Working on Supervised Machine Learning
Let us understand supervised machine learning with the help of an example. Let’s say we have a fruit basket that is filled up with different species of fruits. Our job is to categorize fruits based on their category.
In our case, we have considered four types of fruits: Apple, Banana, Grapes, and Oranges.
Now we will try to mention some of the unique characteristics of these fruits which make them unique.
|Small||Green||Round to oval, Bunch shape cylindrical||
|Big||Red||Rounded shape with a depression at the top||
|Big||Yellow||Long curving cylinder||
Now let us say that you have picked up a fruit from the fruit basket, you looked at its features, e.g., its shape, size, and color, for instance, and then you deduce that the color of this fruit is red, the size if big, the shape is rounded shape with a depression at the top; hence it is an apple.
- Likewise, you do the same for all other remaining fruits as well.
- The rightmost column (“Fruit Name”) is known as the response variable.
- This is how we formulate a supervised learning model; now, it will be quite easy for anybody new (Let’s say a robot or an alien) with given properties to easily group the same type of fruits together.
Types of Supervised Machine Learning Algorithm
Let us see different types of machine learning algorithms:
Regression is used to predict single value output using the training data set. The output value is always called the dependent variable, while the inputs are the independent variable.
We have different types of regression in Supervised Learning.
- Linear Regression: Here, we have only one independent variable used to predict the output, i.e., dependent variable.
- Multiple Regression: Here, we have more than one independent variable used to predict the output, i.e., the dependent variable.
- Polynomial Regression: Here, the graph between the dependent and independent variables follows a polynomial function. For e.g. at first, memory increases with age, then it reaches a threshold at a certain age, and then it starts decreasing as we turn old.
The classification of supervised learning algorithms is used to group similar objects into unique classes.
- Binary classification: If the algorithm is trying to group 2 distinct groups of classes, then it is called binary classification.
- Multiclass classification: If the algorithm is trying to group objects to more than 2 groups, then it is called multiclass classification.
- Strength: Classification algorithms usually perform very well.
- Drawbacks: Prone to overfitting and might be unconstrained. For Example – Email Spam classifier.
- Logistic regression/classification: When the Y variable is a binary categorical (i.e. 0 or 1), we use Logistic regression for the prediction. For Example – Predicting if a given credit card transaction is fraud or not.
- Naive bayes classifiers: The Naïve Bayes classifier is based on the Bayesian theorem. This algorithm is usually best suited when the dimensionality of the inputs is high. It consists of acyclic graphs that are having one parent and many children nodes. The child nodes are independent of each other.
- Decision trees: A decision tree is a tree chart like structure that consists of an internal node (test on attribute), a branch that denotes the outcome of the test and the leaf nodes, representing the distribution of classes. The root node is the topmost node. It is a very widely used technique which is used for classification.
- Support vector machine: A support vector machine or an SVM that does the job of classification by finding the hyperplane, which should maximize the margin between 2 classes. These SVM machines are connected to the kernel functions. Fields where SVMs are extensively used, are biometrics, pattern recognition, etc.
Below are some of the advantages of supervised machine learning models:
- User experiences can optimize the performance of models.
- It produces outputs using previous experience and also allows you to collect data.
- Supervised machine learning algorithms can be used for implementing a number of real-world problems.
The following are the disadvantages given:
- The effort of training supervised machine learning models may take a lot of time if the dataset is bigger.
- The classification of big data sometimes poses a bigger challenge.
- One may have to deal with the problems of overfitting.
- We need lots of good examples if we want the model to perform well while we are training the classifier.
Good Practices while Building Learning Models
Following are the good practices while building machine Models:
- Before building any good machine learning model, the process of preprocessing of data must be performed.
- One must decide the algorithm which should be best suited for a given problem.
- We need to decide what type of data will be used for the training set.
- Needs to decide on the structure of the algorithm and function.
This is a guide to What is Supervised Learning? Here we discussed the concepts, how it works, types, advantages and disadvantages of supervised learning. You can also go through our other suggested articles to learn more –