EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

Machine Learning Algorithms

Home » Data Science » Data Science Tutorials » Machine Learning Tutorial » Machine Learning Algorithms

Machine Learning Algorithms

Introduction to Machine Learning Algorithms

Machine Learning Algorithms are defined as the algorithms that are used for training the models, in machine learning it is divide into three different types i.e. Supervised Learning( in this dataset are labeled and Regression and Classification techniques are used), Unsupervised Learning(in this dataset are not labeled and techniques like Dimensionality reduction and Clustering are used) and Reinforcement Learning(algorithm in which model learn from its each action) for the development of machine learning solution for applications such as Customer Retention, Image Classification, Skill Acquisition, Customer Segmentation, Game AI, Weather forecasting, Market Forecasting, Diagnostics, etc.

Categories-of-Machine-Learning

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

Categories of Machine Learning Algorithms

The field of Machine Learning Algorithms could be categorized into –

  • Supervised Learning – In Supervised Learning, the data set is labeled, i.e., for every feature or independent variable, there is a corresponding target data which we would use to train the model.
  • UN-Supervised Learning – Unlike in Supervised Learning, the data set is not labeled in this case. Thus clustering technique is used to group the data based on its similarity among the data points in the same group.
  • Reinforcement Learning – A special type of Machine Learning where the model learns from each action taken. The model is rewarded for any correct decision made and penalized for any wrong decision which allows it to learn the patterns, and make better accurate decisions on unknown data.

Division of Machine Learning Algorithms

The problems in Machine Learning Algorithms could be divided into –

  • Regression – There is a continuous relationship between the dependent and the independent variables. The target variable is numeric in nature while the independent variables could be numeric or categorical.
  • Classification – The most common problem statement you would find in the real world is classifying a data point into some binary, multinomial or ordinal class. In the Binary Classification problem, the target variable has only two outcomes (Yes/No, 0/1, True/False). In the Multinomial Classification problem, there are multiple classes in the target variable (Apple/ Orange/Mango, and so on). In the Ordinal classification problem, the target variable is ordered (e.g. – the grade of students).

Now, to solve this kind of problems, programmers and scientist have developed some programs or algorithms which could be used on the data to make predictions. These algorithms could be divided into linear and non-linear or tree-based algorithms. Linear algorithms like Linear Regression, Logistic Regression are generally used when there is a linear relationship between the feature and the target variable whereas the data which exhibits non-linear patterns, the tree-based methods such as Decision Tree, Random Forest, Gradient Boosting, etc., are preferred.

So far, we got a brief intuition about Machine Learning. Now you would learn some of its pre-programmed algorithms that you could use in your next project.

Algorithms

There are numerous Machine Learning algorithms that are in the market currently and it’s only going to increase considering the amount of research that’s being done in this field. Linear and Logistic Regression are generally the first algorithms you learn as a Data Scientist followed by more advanced algorithms.

Popular Course in this category
Machine Learning Training (17 Courses, 27+ Projects)17 Online Courses | 27 Hands-on Projects | 159+ Hours | Verifiable Certificate of Completion | Lifetime Access
4.7 (8,387 ratings)
Course Price

View Course

Related Courses
Deep Learning Training (15 Courses, 24+ Projects)Artificial Intelligence Training (3 Courses, 2 Project)

Below are some of the Machine Learning algorithms along with sample code snippets in python.

1. Linear Regression

As the name suggests, this algorithm could be used in cases where the target variable which is continuous in nature is linearly dependent on the dependent variables. It is represented by –

y = a*x + b + e, where y is the target variable we are trying to predict, a is the intercept and b is the slope, x is our dependent variable used to make the prediction. This is a Simple Linear Regression as there is only one independent variable. In the case of Multiple Linear Regression, the equation would have been –

y = a1*x1 + a2*x2 + …… + a(n)*x(n) + b + e

Here, e is the error term and a1, a2.. a (n) are the coefficient of the independent variables.

To evaluate the performance of the model, a metric is used which in this case could be Root Mean Square Error which is the square root of the mean of the sum of the difference between the actual and the predicted values.

Root Mean Square Error ( Machine Learning Algorithms)

The goal of Linear Regression is to find the best fit line which would minimize the difference between the actual and the predicted data points.

actual and the predicted data points

Linear Regression could be written in Python as below –

Linear Regression written in PythonLinear Regression written in Python 2

2. Logistic Regression

In terms of maintaining a linear relationship, it is the same as Linear Regression. However, unlike in Linear Regression, the target variable in Logistic Regression is categorical i.e., binary, multinomial or ordinal in nature. The choice of the activation function is important in Logistic Regression as for binary classification problems, the log of odds in favor i.e., the sigmoid function is used.

Logistic Regression( Machine Learning Algorithms)

In the case of a multi-class problem, the softmax function is preferred as a sigmoid function takes a lot of computation time.

Logistic Regression sigmoid functiom( Machine Learning Algorithms)

The metric used to evaluate a classification problem is generally Accuracy or the ROC curve. The more the area under the ROC, the better is the model. A random graph would have an AUC of 0.5. The value of 1 indicates most accuracy, whereas 0 indicates the least accuracy.

FALSE POSITIVE RATE

Logistic Regression could be written in learning as –

Logistic Regression written in sklearn
Logistic Regression written in sklearn 2

3. K-Nearest Neighbors

Machine Learning Algorithms could be used for both classification and regression problems. The idea behind the KNN method is that it predicts the value of a new data point based on its K Nearest Neighbors. K is generally preferred as an odd number to avoid any conflict. While classifying any new data point, the class with the highest mode within the Neighbors is taken into consideration. While for the regression problem, the mean is considered as the value.

Machine Learning 4.1

I learned the KNN is written as –

KNN is written sklearn
KNN is written sklearn 2

KNN is used in building a recommendation engine.

4. Support Vector Machines

A classification algorithm where a hyperplane separates the two classes. In a binary classification problem, two vectors from two distinct classes are considered known as the support vectors and the hyperplane is drawn at maximum distance from the support vectors.

Machine Learning 5.1

As you can see, a single line separates the two classes. However, in most cases, the data would not be such perfect and a simple hyperplane would not be able to separate the classes. Hence, you need to tune parameters such as Regularization, Kernel, Gamma, and so on.

The kernel could be linear or polynomial depending on how the data is separated. In this case, the kernel is linear in nature. In the case of Regularization, you need to choose an optimum value of C, as the high value could lead to overfitting while a small value could underfit the model. The influence of a single training example is defined by Gamma. Points close to the line are considered in high gamma and vice versa for low gamma.

In sklearn, SVM is written as –

SVM is written sklearn
SVM is written sklearn 2

5. Naive Bayes

It works on the principle of Bayes Theorem which finds the probability of an event considering some true conditions. Bayes Theorem is represented as –

The algorithm is called Naive because it believes all variables are independent and the presence of one variable doesn’t have any relation to the other variables which is never the case in real life. Naive Bayes could be used in Email Spam classification and in text classification.

Naïve Bayes code in Python –

Naïve Bayes code in Python

6. Decision Tree

Used for both classification and regression problems, the Decision Tree algorithm is one the most simple and easily interpretable Machine Learning algorithms. It is not affected by outliers or missing values in the data and could capture the non-linear relationships between the dependent and the independent variables.

Machine Learning Graph 6.1

To build a Decision Tree, all features are considered at first but the feature with the maximum information gain is taken as the final root node based on which the successive splitting is done. This splitting continues on the child node based on the maximum information criteria and it stops until all the instances have been classified or the data could not be split further. Decision Trees are often prone to overfitting and thus it is necessary to tune the hyperparameter like maximum depth, min leaf nodes, minimum samples, maximum features and so on. To reduce overfitting, there is a greedy approach that sets constraints at each step and chooses the best possible criteria for that split. There is another better approach called Pruning where the tree is first built up to a certain pre-defined depth and then starting from the bottom the nodes are removed if it doesn’t improve the model.

In sklearn, Decision Trees are coded as –

Decision Trees In sklearn
Decision Trees In sklearn 2
Decision Trees In sklearn 3

7. Random Forest

To reduce overfitting in Decision Tree, it is required to reduce the variance of the model and thus the concept of bagging came into place. Bagging is a technique where the output of several classifiers is taken to form the final output. Random Forest is one such bagging method where the dataset is sampled into multiple datasets and the features are selected at random for each set. Then on each sampled data, the Decision Tree algorithm is applied to get the output from each mode. In the case of a Regression problem, the mean of the output of all the models is taken whereas, in case of classification problem, the class which gets the maximum vote is considered to classify the data point. Random Forest is not influenced by outliers, missing values in the data and it also helps in dimensionality reduction as well. However, it is not interpretable which a drawback for Random Forest. In Python, you could code Random Forest as –

Python, you could code Random Forest (Machine Learning Algorithms)
Python, you could code Random Forest 2(Machine Learning Algorithms)

8. K-means Clustering

So far, we worked with supervised learning problems where for every input there is a corresponding output. Now, we would learn about unsupervised learning where the data is unlabelled and needs to be clustered into specific groups. There are several clustering techniques available. However, the most common of them is the K-means clustering. Ink-means, k refers to the number of clusters that need to be set in prior to maintaining maximum variance in the dataset. Once the k is set, the centroids are initialized. The centroids are then adjusted repeatedly so that the distance between the data points within a centroid is maximum and the distance between two separate is maximum. Euclidean distance, Manhattan distance, etc, are some of the distance formula used for this purpose.

The value of k could be found from the elbow method.

Machine Learning Graph 7

K-means clustering is used in e-commerce industries where customers are grouped together based on their behavioral patterns. It could also be used in Risk Analytics. Below is the python code –

K-means clustering in python code
K-means clustering in python code 2(Machine Learning Algorithms)

Conclusion

Data Scientist is the sexiest job in the 21st century and Machine Learning is certainly one of its key areas of expertise. To be a Data Scientist, one needs to possess an in-depth understanding of all these algorithms and also several other new techniques such as Deep Learning.

Recommended Articles

This has been a guide to Machine Learning Algorithms. Here we have discussed the basic concept, Categories, problems, and different algorithms of Machine Language. You can also go through our other Suggested Articles to learn more –

  1. Machine Learning Techniques
  2. What Is Deep learning
  3. Data Scientist vs Machine Learning
  4. Supervised Learning vs Unsupervised Learning
  5. Hyperparameter Machine Learning
  6. Guide to Stages of Machine Learning Lifecycle

Machine Learning Training (17 Courses, 27+ Projects)

17 Online Courses

27 Hands-on Projects

159+ Hours

Verifiable Certificate of Completion

Lifetime Access

Learn More

0 Shares
Share
Tweet
Share
Primary Sidebar
Machine Learning Tutorial
  • Algorithms
    • Machine Learning Algorithms
    • Types of Machine Learning Algorithms
    • Bayes Theorem
    • AdaBoost Algorithm
    • Classification Algorithms
    • Clustering Algorithm
    • Gradient Boosting Algorithm
    • Mean Shift Algorithm
    • Hierarchical Clustering Algorithm
    • What is a Greedy Algorithm?
    • What is Genetic Algorithm?
    • Random Forest Algorithm
    • Nearest Neighbors Algorithm
    • Weak Law of Large Numbers
    • Ray Tracing Algorithm
    • SVM Algorithm
    • Naive Bayes Algorithm
    • Neural Network Algorithms
    • Boosting Algorithm
    • XGBoost Algorithm
    • Pattern Searching
    • Loss Functions in Machine Learning
    • Decision Tree in Machine Learning
    • Hyperparameter Machine Learning
    • Unsupervised Machine Learning
    • K- Means Clustering Algorithm
    • KNN Algorithm
    • Monty Hall Problem
  • Basic
    • Introduction To Machine Learning
    • What is Machine Learning?
    • Uses of Machine Learning
    • Applications of Machine Learning
    • Careers in Machine Learning
    • What is Machine Cycle?
    • Machine Learning Feature
    • Machine Learning Programming Languages
    • Machine Learning Tools
    • Machine Learning Models
    • Machine Learning Platform
    • Machine Learning Libraries
    • Machine Learning Life Cycle
    • Machine Learning System
    • Machine Learning Datasets
    • Types of Machine Learning
    • Machine Learning Methods
    • Machine Learning Software
    • Machine Learning Techniques
    • Machine Learning Feature Selection
    • Ensemble Methods in Machine Learning
    • Decision Making Techniques
    • Restricted Boltzmann Machine
    • Regularization Machine Learning
    • What is Regression?
    • What is Linear Regression?
    • What is Decision Tree?
    • What is Random Forest
  • Supervised
    • What is Supervised Learning
    • Supervised Machine Learning
    • Supervised Machine Learning Algorithms
    • Perceptron Learning Algorithm
    • Simple Linear Regression
    • Polynomial Regression
    • Multivariate Regression
    • Regression in Machine Learning
    • Hierarchical Clustering Analysis
    • Linear Regression Analysis
    • Support Vector Regression
    • Linear Regression Modeling
    • Multiple Linear Regression
    • Linear Algebra in Machine Learning
    • Statistics for Machine Learning
    • What is Regression Analysis?
    • Linear Regression Analysis
    • Clustering Methods
    • Backward Elimination
    • Ensemble Techniques
    • Bagging and Boosting
    • Linear Regression Modeling
    • What is Reinforcement Learning
  • Classification
    • Kernel Methods in Machine Learning
    • Clustering in Machine Learning
    • Machine Learning Architecture
    • Machine Learning C++ Library
    • Machine Learning Frameworks
    • Data Preprocessing in Machine Learning
    • Data Science Machine Learning
    • Classification of Neural Network
    • Neural Network Machine Learning
    • What is Convolutional Neural Network?
    • Single Layer Neural Network
    • Kernel Methods
    • Forward and Backward Chaining
    • Forward Chaining
    • Backward Chaining
  • Deep Learning
    • What Is Deep learning
    • Deep Learning
    • Application of Deep Learning
    • Careers in Deep Learnings
    • Deep Learning Frameworks
    • Deep Learning Model
    • Deep Learning Algorithms
    • Deep Learning Technique
    • Deep Learning Networks
    • Deep Learning Libraries
    • Deep Learning Toolbox
    • Types of Neural Networks
    • Convolutional Neural Networks
    • Create Decision Tree
    • Deep Learning for NLP
    • Caffe Deep Learning
    • Deep Learning with TensorFlow
  • RPA
    • What is RPA
    • What is Robotics?
    • Benefits of RPA
    • RPA Applications
    • Types of Robots
    • RPA Tools
    • Line Follower Robot
    • What is Blue Prism?
    • RPA vs BPM
  • Pytorch
    • PyTorch Versions
    • Single Layer Perceptron
    • PyTorch vs Keras
    • torch.nn Module
  • UiPath
    • What is UiPath
    • UiPath Careers
    • UiPath Architecture
    • UiPath Orchestrator
    • Uipath Reframework
    • UiPath Studio
  • Interview Questions
    • Machine Learning Interview Questions
    • Deep Learning Interview Questions And Answer
    • Machine Learning Cheat Sheet

Related Courses

Machine Learning Training

Deep Learning Training

Artificial Intelligence Training

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2020 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you
Book Your One Instructor : One Learner Free Class

Let’s Get Started

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA Login

Forgot Password?

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you