Introduction to Machine Learning Algorithms
Machine Learning Algorithms are defined as the algorithms that are used for training the models, in machine learning it is divide into three different types, i.e., Supervised Learning( in this dataset are labeled and Regression and Classification techniques are used), Unsupervised Learning (in this dataset are not labeled and techniques like Dimensionality reduction and Clustering are used) and Reinforcement Learning (algorithm in which model learn from its every action) for the development of machine learning solution for applications such as Customer Retention, Image Classification, Skill Acquisition, Customer Segmentation, Game AI, Weather forecasting, Market Forecasting, Diagnostics, etc.
Categories of Machine Learning Algorithms
The field of Machine Learning Algorithms could be categorized into:
- Supervised Learning: In Supervised Learning, the data set is labeled, i.e., for every feature or independent variable, there is a corresponding target data which we would use to train the model.
- Un-Supervised Learning: Unlike in Supervised Learning, the data set is not labeled in this case. Thus clustering technique is used to group the data based on its similarity among the data points in the same group.
- Reinforcement Learning: A special type of Machine Learning where the model learns from each action taken. The model is rewarded for any correct decision made and penalized for any wrong decision, which allows it to learn the patterns and make better accurate decisions on unknown data.
Division of Machine Learning Algorithms
The problems in Machine Learning Algorithms could be divided into:
- Regression: There is a continuous relationship between the dependent and the independent variables. The target variable is numeric in nature, while the independent variables could be numeric or categorical.
- Classification: The most common problem statement you would find in the real world is classifying a data point into some binary, multinomial, or ordinal class. The target variable has only two outcomes (Yes/No, 0/1, True/False). In the Multinomial Classification problem, there are multiple classes in the target variable (Apple/ Orange/Mango, and so on). In the Ordinal classification problem, the target variable is ordered (e.g., the grade of students).
To solve this kind of problem, programmers and scientists have developed some programs or algorithms that could be used on the data to make predictions. These algorithms could be divided into linear and non-linear or tree-based algorithms. Linear algorithms like Linear Regression, Logistic Regression are generally used when there is a linear relationship between the feature and the target variable, whereas the data exhibits non-linear patterns, the tree-based methods such as Decision Tree, Random Forest, Gradient Boosting, etc., are preferred.
There are numerous Machine Learning algorithms in the market currently, and it’s only going to increase considering the amount of research done in this field. Linear and Logistic Regression are generally the first algorithms you learn as a Data Scientist, followed by more advanced algorithms.
Below are some of the Machine Learning algorithms, along with sample code snippets in python:
1. Linear Regression
As the name suggests, this algorithm could be used in cases where the target variable, which is continuous in nature, is linearly dependent on the dependent variables.
It is represented by:
y = a*x + b + e, where y is the target variable we are trying to predict, a is the intercept, and b is the slope, x is our dependent variable used to make the prediction. This is a Simple Linear Regression as there is only one independent variable.
In the case of Multiple Linear Regression, the equation would have been:
y = a1*x1 + a2*x2 + …… + a(n)*x(n) + b + e
Here, e is the error term, and a1, a2.. a (n) are the coefficient of the independent variables.
A metric is used to evaluate the model’s performance, which could be Root Mean Square Error, which is the square root of the mean of the sum of the difference between the actual and the predicted values.
The goal of Linear Regression is to find the best fit line which would minimize the difference between the actual and the predicted data points.
Linear Regression could be written in Python as below:
2. Logistic Regression
In terms of maintaining a linear relationship, it is the same as Linear Regression. However, unlike in Linear Regression, the target variable in Logistic Regression is categorical, i.e., binary, multinomial or ordinal in nature. Moreover, the choice of the activation function is important in Logistic Regression as for binary classification problems, the log of odds in favor, i.e., the sigmoid function, is used.
In the case of a multi-class problem, the softmax function is preferred as a sigmoid function takes a lot of computation time.
The metric used to evaluate a classification problem is generally Accuracy or the ROC curve. The more the area under the ROC, the better is the model. For example, a random graph would have an AUC of 0.5. The value of 1 indicates the most accuracy, whereas 0 indicates the least accuracy.
Logistic Regression could be written in learning as:
3. K-Nearest Neighbors
Machine Learning Algorithms could be used for both classification and regression problems. The idea behind the KNN method is that it predicts the value of a new data point based on its K Nearest Neighbors. K is generally preferred as an odd number to avoid any conflict. While classifying any new data point, the class with the highest mode within the Neighbors is taken into consideration. While for the regression problem, the mean is considered as the value.
I learned the KNN is written as:
KNN is used in building a recommendation engine.
4. Support Vector Machines
A classification algorithm where a hyperplane separates the two classes. In a binary classification problem, two vectors from two distinct classes are considered known as the support vectors, and the hyperplane is drawn at a maximum distance from the support vectors.
As you can see, a single line separates the two classes. However, in most cases, the data would not be perfect, and a simple hyperplane would not be able to separate the classes. Hence, you need to tune parameters such as Regularization, Kernel, Gamma, and so on.
The kernel could be linear or polynomial, depending on how the data is separated. In this case, the kernel is linear in nature. In the case of Regularization, you need to choose an optimum value of C, as the high value could lead to overfitting while a small value could underfit the model. Gamma defines the influence of a single training example. Points close to the line are considered in high gamma and vice versa for low gamma.
In sklearn, SVM is written as:
5. Naive Bayes
It works on the principle of Bayes Theorem, which finds the probability of an event considering some true conditions.
Bayes Theorem is represented as:
The algorithm is called Naive because it believes all variables are independent, and the presence of one variable doesn’t have any relation to the other variables, which is never the case in real life. As a result, naive Bayes could be used in Email Spam classification and in text classification.
Naïve Bayes code in Python:
6. Decision Tree
Used for classification and regression problems, the Decision Tree algorithm is one of the most simple and easily interpretable Machine Learning algorithms. Moreover, it is not affected by outliers or missing values in the data and could capture the non-linear relationships between the dependent and the independent variables.
To build a Decision Tree, all features are considered at first, but the feature with the maximum information gain is taken as the final root node based on which the successive splitting is done. This splitting continues on the child node based on the maximum information criteria, and it stops until all the instances have been classified or the data could not be split further. Decision Trees are often prone to overfitting, and thus it is necessary to tune the hyperparameter like maximum depth, min leaf nodes, minimum samples, maximum features and so on. There is a greedy approach that sets constraints at each step and chooses the best possible criteria for that split to reduce overfitting. There is another better approach called Pruning, where the tree is first built up to a certain pre-defined depth, and then starting from the bottom, the nodes are removed if it doesn’t improve the model.
In sklearn, Decision Trees are coded as:
7. Random Forest
To reduce overfitting in the Decision Tree, it is required to reduce the variance of the model, and thus the concept of bagging came into place. Bagging is a technique where the output of several classifiers is taken to form the final output. Random Forest is one such bagging method where the dataset is sampled into multiple datasets, and the features are selected at random for each set. Then on each sampled data, the Decision Tree algorithm is applied to get the output from each mode. In the case of a Regression problem, the mean of the output of all the models is taken, whereas, in the case of classification problems, the class which gets the maximum vote is considered to classify the data point. Random Forest is not influenced by outliers, missing values in the data, and it also helps in dimensionality reduction as well. However, it is not interpretable, which is a drawback for Random Forest.
In Python, you could code Random Forest as:
8. K-means Clustering
So far, we have worked with supervised learning problems where there is a corresponding output for every input. Now, we would learn about unsupervised learning, where the data is unlabelled and needs to be clustered into specific groups. There are several clustering techniques available. However, the most common of them is the K-means clustering. Ink-means, k refers to the number of clusters that need to be set in prior to maintaining maximum variance in the dataset. Once the k is set, the centroids are initialized. The centroids are then adjusted repeatedly so that the distance between the data points within a centroid is maximum and the distance between two separate is maximum. Euclidean distance, Manhattan distance, etc., are some of the distance formula used for this purpose.
The value of k could be found from the elbow method.
K-means clustering is used in e-commerce industries where customers are grouped together based on their behavioral patterns. It could also be used in Risk Analytics.
Below is the python code:
Data Scientist is the sexiest job in the 21st century, and Machine Learning is certainly one of its key areas of expertise. To be a Data Scientist, one needs to possess an in-depth understanding of all these algorithms and also several other new techniques such as Deep Learning.
This has been a guide to Machine Learning Algorithms. Here we have discussed the basic concept, categories, problems, and different algorithms of machine language. You can also go through our other suggested articles to learn more –