EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Machine Learning Tutorial K- Means Clustering Algorithm
Secondary Sidebar
Machine Learning Tutorial
  • Algorithms
    • Machine Learning Algorithms
    • Apriori Algorithm in Machine Learning
    • Types of Machine Learning Algorithms
    • Bayes Theorem
    • AdaBoost Algorithm
    • Classification Algorithms
    • Clustering Algorithm
    • Gradient Boosting Algorithm
    • Mean Shift Algorithm
    • Hierarchical Clustering Algorithm
    • Hierarchical Clustering Agglomerative
    • What is a Greedy Algorithm?
    • What is Genetic Algorithm?
    • Random Forest Algorithm
    • Nearest Neighbors Algorithm
    • Weak Law of Large Numbers
    • Ray Tracing Algorithm
    • SVM Algorithm
    • Naive Bayes Algorithm
    • Neural Network Algorithms
    • Boosting Algorithm
    • XGBoost Algorithm
    • Pattern Searching
    • Loss Functions in Machine Learning
    • Decision Tree in Machine Learning
    • Hyperparameter Machine Learning
    • Unsupervised Machine Learning
    • K- Means Clustering Algorithm
    • KNN Algorithm
    • Monty Hall Problem
  • Basic
    • Introduction To Machine Learning
    • What is Machine Learning?
    • Uses of Machine Learning
    • Applications of Machine Learning
    • Naive Bayes in Machine Learning
    • Dataset Labelling
    • DataSet Example
    • Deep Learning Techniques
    • Dataset ZFS
    • Careers in Machine Learning
    • What is Machine Cycle?
    • Machine Learning Feature
    • Machine Learning Programming Languages
    • What is Kernel in Machine Learning
    • Machine Learning Tools
    • Machine Learning Models
    • Machine Learning Platform
    • Machine Learning Libraries
    • Machine Learning Life Cycle
    • Machine Learning System
    • Machine Learning Datasets
    • Machine Learning Certifications
    • Machine Learning Python vs R
    • Optimization for Machine Learning
    • Types of Machine Learning
    • Machine Learning Methods
    • Machine Learning Software
    • Machine Learning Techniques
    • Machine Learning Feature Selection
    • Ensemble Methods in Machine Learning
    • Support Vector Machine in Machine Learning
    • Decision Making Techniques
    • Restricted Boltzmann Machine
    • Regularization Machine Learning
    • What is Regression?
    • What is Linear Regression?
    • Dataset for Linear Regression
    • Decision tree limitations
    • What is Decision Tree?
    • What is Random Forest
  • Supervised
    • What is Supervised Learning
    • Supervised Machine Learning
    • Supervised Machine Learning Algorithms
    • Perceptron Learning Algorithm
    • Simple Linear Regression
    • Polynomial Regression
    • Multivariate Regression
    • Regression in Machine Learning
    • Hierarchical Clustering Analysis
    • Linear Regression Analysis
    • Support Vector Regression
    • Multiple Linear Regression
    • Linear Algebra in Machine Learning
    • Statistics for Machine Learning
    • What is Regression Analysis?
    • Clustering Methods
    • Backward Elimination
    • Ensemble Techniques
    • Bagging and Boosting
    • Linear Regression Modeling
    • What is Reinforcement Learning
  • Classification
    • Kernel Methods in Machine Learning
    • Clustering in Machine Learning
    • Machine Learning Architecture
    • Automation Anywhere Architecture
    • Machine Learning C++ Library
    • Machine Learning Frameworks
    • Data Preprocessing in Machine Learning
    • Data Science Machine Learning
    • Classification of Neural Network
    • Neural Network Machine Learning
    • What is Convolutional Neural Network?
    • Single Layer Neural Network
    • Kernel Methods
    • Forward and Backward Chaining
    • Forward Chaining
    • Backward Chaining
  • Deep Learning
    • What Is Deep learning
    • Overviews Deep Learning
    • Application of Deep Learning
    • Careers in Deep Learnings
    • Deep Learning Frameworks
    • Deep Learning Model
    • Deep Learning Algorithms
    • Deep Learning Technique
    • Deep Learning Networks
    • Deep Learning Libraries
    • Deep Learning Toolbox
    • Types of Neural Networks
    • Convolutional Neural Networks
    • Create Decision Tree
    • Deep Learning for NLP
    • Caffe Deep Learning
    • Deep Learning with TensorFlow
  • RPA
    • What is RPA
    • What is Robotics?
    • Benefits of RPA
    • RPA Applications
    • Types of Robots
    • RPA Tools
    • Line Follower Robot
    • What is Blue Prism?
    • RPA vs BPM
  • Interview Questions
    • Deep Learning Interview Questions And Answer
    • Machine Learning Cheat Sheet

Related Courses

Machine Learning Training

Deep Learning Training

Artificial Intelligence Training

K- Means Clustering Algorithm

By Priya PedamkarPriya Pedamkar

K Means Clustering Algorithm

Introduction to K- Means Clustering Algorithm?

K-Means clustering algorithm is defined as an unsupervised learning method having an iterative process in which the dataset are grouped into k number of predefined non-overlapping clusters or subgroups, making the inner points of the cluster as similar as possible while trying to keep the clusters at distinct space it allocates the data points to a cluster so that the sum of the squared distance between the clusters centroid and the data point is at a minimum, at this position the centroid of the cluster is the arithmetic mean of the data points that are in the clusters.

Understanding K- Means Clustering Algorithm

This algorithm is an iterative algorithm that partitions the dataset according to their features into K number of predefined non- overlapping distinct clusters or subgroups. It makes the data points of inter clusters as similar as possible and also tries to keep the clusters as far as possible. It allocates the data points to a cluster if the sum of the squared distance between the cluster’s centroid and the data points is at a minimum, where the cluster’s centroid is the arithmetic mean of the data points that are in the cluster. A less variation in the cluster results in similar or homogeneous data points within the cluster.

How does K- Means Clustering Algorithm Works?

K- Means Clustering Algorithm needs the following inputs:

  • K = number of subgroups or clusters
  • Sample or Training Set = {x1, x2, x3,………xn}

Now let us assume we have a data set that is unlabeled, and we need to divide it into clusters.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

Input 1

Now we need to find the number of clusters.

This can be done by two methods:

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,408 ratings)
  • Elbow Method
  • Purpose Method

1. Elbow Method

In this method, a curve is drawn between “within the sum of squares” (WSS) and the number of clusters. The curve plotted resembles a human arm. It is called the elbow method because the point of the elbow in the curve gives us the optimum number of clusters. In the graph or curve, after the elbow point, the value of WSS changes very slowly, so the elbow point must be considered to give the final value of the number of clusters.

Input 2

2. Purpose-Based

In this method, the data is divided based on different metrics, and after then it is judged how well it performed for that case. For example, the arrangement of the shirts in the men’s clothing department in a mall is done on the criteria of the sizes. It can be done on the basis of price and the brands also. The best suitable would be chosen to give the optimal number of clusters, i.e. the value of K.

Now lets us get back to our given data set above. We can calculate the number of clusters, i.e. the value of K, by using any of the above methods.

How to use the Above Methods?

Now let us see the execution process:

Step 1: Initialization

Firstly, initialize any random points called the centroids of the cluster. While initializing, you must take care that the centroids of the cluster must be less than the number of training data points. This algorithm is an iterative algorithm; hence the next two steps are performed iteratively.

Initialisation

Step 2: Cluster Assignment

After initialization, all data points are traversed, and the distance between all the centroids and the data points are calculated. Now the clusters would be formed depending upon the minimum distance from the centroids. In this example, the data is divided into two clusters.

Cluster Assignment

Step 3: Moving Centroid

As the clusters formed in the above step are not optimized, so we need to form optimized clusters. For this, we need to move the centroids iteratively to a new location. Take data points of one cluster, compute their average and then move the centroid of that cluster to this new location. Repeat the same step for all other clusters.

Moving Centroid

Step 4: Optimization

The above two steps are done iteratively until the centroids stop moving, i.e. they do not change their positions anymore and have become static. Once this is done, the k- means algorithm is termed to be converged.

Step 5: Convergence

Now, this algorithm has converged, and distinct clusters are formed and clearly visible. This algorithm can give different results depending on how the clusters were initialized in the first step.

K- Means Clustering Algorithm - Convergence

Applications of K- Means Clustering Algorithm

Below are the applications mentioned:

  • Market segmentation
  • Document clustering
  • Image segmentation
  • Image compression
  • Vector quantization
  • Cluster analysis
  • Feature learning or dictionary learning
  • Identifying crime-prone areas
  • Insurance fraud detection
  • Public transport data analysis
  • Clustering of IT assets
  • Customer segmentation
  • Identifying Cancerous data
  • Used in search engines
  • Drug Activity Prediction

Advantages of K- Means Clustering Algorithm

Below are the advantages mentioned:

  • It is fast
  • Robust
  • Easy to understand
  • Comparatively efficient
  • If data sets are distinct, then gives the best results
  • Produce tighter clusters
  • When centroids are recomputed, the cluster changes.
  • Flexible
  • Easy to interpret
  • Better computational cost
  • Enhances Accuracy
  • Works better with spherical clusters

Disadvantages of K- Means Clustering Algorithm

Below are the disadvantages mentioned:

  • Needs prior specification for the number of cluster centers
  • If there are two highly overlapping data, then it cannot be distinguished and cannot tell that there are two clusters
  • With the different representations of the data, the results achieved are also different
  • Euclidean distance can unequally weigh the factors
  • It gives the local optima of the squared error function
  • Sometimes choosing the centroids randomly cannot give fruitful results
  • It can be used only if the meaning is defined
  • Cannot handle outliers and noisy data
  • Do not work for the non-linear data set
  • Lacks consistency
  • Sensitive to scale
  • If very large data sets are encountered, then the computer may crash.
  • Prediction issues

Recommended Articles

This has been a guide to K- Means Clustering Algorithm. Here we discussed the basic concept, working, applications with advantages and disadvantages. You can also go through our other suggested articles to learn more –

  1. What is Neural Networks?
  2. What Is Data Mining?
  3. Machine Learning vs Neural Network
  4. Clustering in Machine Learning
Popular Course in this category
Machine Learning Training (20 Courses, 29+ Projects)
  19 Online Courses |  29 Hands-on Projects |  178+ Hours |  Verifiable Certificate of Completion
4.7
Price

View Course

Related Courses

Deep Learning Training (18 Courses, 24+ Projects)4.9
Artificial Intelligence AI Training (5 Courses, 2 Project)4.8
2 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more