EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Data Science Tutorial for Beginners Data Science Algorithms
Secondary Sidebar
Data Science Tutorial
  • Basics
    • Introduction To Data Science
    • What is Data Science
    • Data Science Career
    • Data Science Skills
    • Data Science Applications
    • Data Science Algorithms
    • Data Science Languages
    • Data Science Lifecycle
    • Data Science Platform
    • Data Science Techniques
    • Data Science Tools
    • Best Data Science Programs
    • Data Science its Growing Importance
    • Data Science Machine Learning
    • Python Libraries For Data Science
    • Data Science Interview Questions
    • Data Engineer Tools
    • Data Scientist Jobs
    • Data Architect Jobs
    • Career in Data Science

Related Courses

Data Scientist Certification Course

Data Science with Python Course

Data Science Certification Course

Data Science Algorithms

By Priya PedamkarPriya Pedamkar

Data Science Algorithms

Introductions to Data Science Algorithms

A high-level description of the essential algorithms used in Data Science. As you already know, data science is a field of study where decisions are made based on the insights we get from the data instead of classic rule-based deterministic approaches. Typically we can divide a machine learning task into three parts.

  • Obtaining the data and mapping the business problem,
  • Applying machine learning techniques and observing the performance metric
  • Testing and deploying the model

In this whole life cycle, we use various data science algorithms to solve the task at hand. This article will divide the most commonly used algorithms based on their learning types and will have a high-level discussion on those.

Types of Data Science Algorithms

We can simply divide machine learning or data science algorithms into the following types based on the learning methodologies.

  1. Supervised Algorithms
  2. Unsupervised Algorithms

1. Supervised Algorithms

As the name suggests, supervised algorithms are a class of machine learning algorithms where the model is trained with the labelled data. For example, based on the historical data, you want to predict a customer will default on a loan or not. After preprocess and feature engineering of the labelled data, supervised algorithms are trained over the structured data and tested over a new data point or, in this case, to predict a loan defaulter. Let’s dive into the most popular supervised machine learning algorithms.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

K Nearest Neighbors

K nearest neighbours (KNN) is one of the simplest yet powerful machines learning algorithms. It is a supervised algorithm where the classification is done based on k nearest data points. The idea behind KNN is that similar points are clustered together; by measuring the nearest data points’ properties, we can classify a test data point. For example, we solve a standard classification problem where we want to predict a data point belongs to class A or class B. Let k=3; now we will test 3 nearest datapoint of the test data point; if two of them belongs to class A, we will declare the test data point as class A otherwise class B. The right value of K is found through cross-validation. It has a linear time complexity hence can not be used for low latency applications.

Linear Regression

Linear regression is a supervised data science algorithm.

Output:

linear regression in data science

Variable is continuous. The idea is to find a hyperplane where the maximum number of points lies in the hyperplane. For example, predicting rain is a standard regression problem where linear regression can be used. Linear regression assumes that the relation between the independent and dependent variables is linear, and there is very little or no multicollinearity.

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,527 ratings)
Logistic Regression

Though the name says regression, logistic regression is a supervised classification algorithm.

Output:

Logistic Regression

The geometric intuition is that we can separate different Class labels using a linear decision boundary. The output variable of logistic regression is categorical. Please note that we can not use mean squared error as a cost function for logistic regression as it is nonconvex for logistic regression.

Support Vector Machine

In logistic regression, our main motto was to find a separating linear surface.

Output:

Support Vector Machine

We can consider the Support vector machine as an extension of this idea to find a hyperplane that maximizes the margin. But what is a margin?. For a vector W (the decision surface we need to come up with), we draw two parallel lines on both sides. The distance between these two lines is called the margin. SVM assumes the data is linearly separable. Though we can use SVM for nonlinear data also using the Kernel trick.

Decision Tree

Decision Tree is a nested If-Else based classifier that uses a tree-like graph structure to make the decision. Decision Trees are trendy and one of the most used supervised machine learning algorithms in the whole area of data science. It provides better stability and accuracy in most cases than other supervised algorithms and robust to outliers. The decision tree’s output variable is usually categorical, but it also can be used to solve regression problems.

Ensembles

Ensembles are a popular category of data science algorithms where multiple models are used to improve performance. If you are familiar with Kaggle (a platform by google for practising and competing in data science challenges), you will find the most winner solutions using some ensembles.

We can roughly divide ensembles into the following categories.

  • Bagging
  • Boosting
  • Stacking
  • Cascading

Random Forest, Gradient Boosting Decision Trees are examples of some popular ensemble algorithms.

2. Unsupervised Algorithms

Unsupervised algorithms are used for the tasks where the data is unlabelled. The most popular use case of unsupervised algorithms is clustering. Clustering is the task of grouping together similar data points without manual intervention. Let’s discuss some of the popular unsupervised machine learning algorithms here

K Means

K Means is a randomized unsupervised algorithm used for clustering.K Means follows the below steps

1.Initialize K points randomly(c1,c2..ck)

2. For each point (Xi) in the data set

Select nearest Ci {i=1,2,3..k}

Add Xi to Ci

3. Recompute the centroid using proper metrics (i.e. intracluster distance)

4, Repeat step (2)(3) until converges

K Means++

The initialization step in K means is purely random, and based on the initialization, the clustering changes drastically. K means++ solves this problem by initializing k in a probabilistic way instead of pure randomization. K means++ is more stable than classic K means.

K Medoids

K medoids is also a clustering algorithm based on K means. The main difference between the two is the centroids of K means does not necessarily exist in the data set, which is not the case for K medoids. K medoids offer better interpretability of clusters. K means minimizes the total squared error, while K medoids minimize the dissimilarity between points.

Conclusion

In this article, we discussed the most popular machine learning algorithms used in data science. After all these, a question may come to your mind that ‘Which algorithm is the best?’ Clearly, there is no winner here. It solely depends on the task at hand and business requirements. As a best practice, it always starts with the simplest algorithm and increases the complexity gradually.

Recommended Articles

This has been a guide to Data Science Algorithms. Here we have discussed an overview of data science algorithms with two data science algorithms in detail. You can also go through our given articles to learn more –

  1. Data Science Platform
  2. Data Science Languages
  3. Classification Algorithms
  4. Simple Ways to Create a Decision Tree
  5. Types of Machine Learning
Popular Course in this category
All in One Data Science Bundle (360+ Courses, 50+ projects)
  360+ Online Courses |  1500+ Hours |  Verifiable Certificates |  Lifetime Access
4.7
Price

View Course

Related Courses

Data Scientist Training (85 Courses, 67+ Projects)4.9
Data Science with Python Training (24 Courses, 14+ Projects)4.8
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more