EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Machine Learning Tutorial Hierarchical Clustering Algorithm
Secondary Sidebar
Machine Learning Tutorial
  • Algorithms
    • Machine Learning Algorithms
    • Apriori Algorithm in Machine Learning
    • Types of Machine Learning Algorithms
    • Bayes Theorem
    • AdaBoost Algorithm
    • Classification Algorithms
    • Clustering Algorithm
    • Gradient Boosting Algorithm
    • Mean Shift Algorithm
    • Hierarchical Clustering Algorithm
    • Hierarchical Clustering Agglomerative
    • What is a Greedy Algorithm?
    • What is Genetic Algorithm?
    • Random Forest Algorithm
    • Nearest Neighbors Algorithm
    • Weak Law of Large Numbers
    • Ray Tracing Algorithm
    • SVM Algorithm
    • Naive Bayes Algorithm
    • Neural Network Algorithms
    • Boosting Algorithm
    • XGBoost Algorithm
    • Pattern Searching
    • Loss Functions in Machine Learning
    • Decision Tree in Machine Learning
    • Hyperparameter Machine Learning
    • Unsupervised Machine Learning
    • K- Means Clustering Algorithm
    • KNN Algorithm
    • Monty Hall Problem
  • Basic
    • Introduction To Machine Learning
    • What is Machine Learning?
    • Uses of Machine Learning
    • Applications of Machine Learning
    • Naive Bayes in Machine Learning
    • Dataset Labelling
    • DataSet Example
    • Deep Learning Techniques
    • Dataset ZFS
    • Careers in Machine Learning
    • What is Machine Cycle?
    • Machine Learning Feature
    • Machine Learning Programming Languages
    • What is Kernel in Machine Learning
    • Machine Learning Tools
    • Machine Learning Models
    • Machine Learning Platform
    • Machine Learning Libraries
    • Machine Learning Life Cycle
    • Machine Learning System
    • Machine Learning Datasets
    • Machine Learning Certifications
    • Machine Learning Python vs R
    • Optimization for Machine Learning
    • Types of Machine Learning
    • Machine Learning Methods
    • Machine Learning Software
    • Machine Learning Techniques
    • Machine Learning Feature Selection
    • Ensemble Methods in Machine Learning
    • Support Vector Machine in Machine Learning
    • Decision Making Techniques
    • Restricted Boltzmann Machine
    • Regularization Machine Learning
    • What is Regression?
    • What is Linear Regression?
    • Dataset for Linear Regression
    • Decision tree limitations
    • What is Decision Tree?
    • What is Random Forest
  • Supervised
    • What is Supervised Learning
    • Supervised Machine Learning
    • Supervised Machine Learning Algorithms
    • Perceptron Learning Algorithm
    • Simple Linear Regression
    • Polynomial Regression
    • Multivariate Regression
    • Regression in Machine Learning
    • Hierarchical Clustering Analysis
    • Linear Regression Analysis
    • Support Vector Regression
    • Multiple Linear Regression
    • Linear Algebra in Machine Learning
    • Statistics for Machine Learning
    • What is Regression Analysis?
    • Clustering Methods
    • Backward Elimination
    • Ensemble Techniques
    • Bagging and Boosting
    • Linear Regression Modeling
    • What is Reinforcement Learning
  • Classification
    • Kernel Methods in Machine Learning
    • Clustering in Machine Learning
    • Machine Learning Architecture
    • Automation Anywhere Architecture
    • Machine Learning C++ Library
    • Machine Learning Frameworks
    • Data Preprocessing in Machine Learning
    • Data Science Machine Learning
    • Classification of Neural Network
    • Neural Network Machine Learning
    • What is Convolutional Neural Network?
    • Single Layer Neural Network
    • Kernel Methods
    • Forward and Backward Chaining
    • Forward Chaining
    • Backward Chaining
  • Deep Learning
    • What Is Deep learning
    • Overviews Deep Learning
    • Application of Deep Learning
    • Careers in Deep Learnings
    • Deep Learning Frameworks
    • Deep Learning Model
    • Deep Learning Algorithms
    • Deep Learning Technique
    • Deep Learning Networks
    • Deep Learning Libraries
    • Deep Learning Toolbox
    • Types of Neural Networks
    • Convolutional Neural Networks
    • Create Decision Tree
    • Deep Learning for NLP
    • Caffe Deep Learning
    • Deep Learning with TensorFlow
  • RPA
    • What is RPA
    • What is Robotics?
    • Benefits of RPA
    • RPA Applications
    • Types of Robots
    • RPA Tools
    • Line Follower Robot
    • What is Blue Prism?
    • RPA vs BPM
  • Interview Questions
    • Deep Learning Interview Questions And Answer
    • Machine Learning Cheat Sheet

Related Courses

Machine Learning Training

Deep Learning Training

Artificial Intelligence Training

Hierarchical Clustering Algorithm

By Priya PedamkarPriya Pedamkar

Hierarchical Clustering Algorithm

Introduction to Hierarchical Clustering Algorithm

The hierarchical clustering algorithm is an unsupervised Machine Learning technique. It aims at finding natural grouping based on the characteristics of the data. The hierarchical clustering algorithm aims to find nested groups of the data by building the hierarchy. It is similar to the biological taxonomy of the plant or animal kingdom. Hierarchical clusters are generally represented using the hierarchical tree known as a dendrogram.

Types of Hierarchical Clustering Algorithm

Hierarchical clustering algorithms are of 2 types:

  • Divisive
  • Agglomerative

1. Divisive

This is a top-down approach, where it initially considers the entire data as one group, and then iteratively splits the data into subgroups. If the number of a hierarchical clustering algorithm is known, then the process of division stops once the number of clusters is achieved. Else, the process stops when the data can be no more split, which means the subgroup obtained from the current iteration is the same as the one obtained from the previous iteration (one can also consider that the division stops when each data point is a cluster).

2. Agglomerative

It is a bottom-up approach that relies on the merging of clusters. Initially, the data is split into m singleton clusters (where the value of m is the number of samples/data points). Two clusters are merged into one iteratively thus reducing the number of clusters in every iteration. This process of merging clusters stops when all clusters have been merged into one or the number of desired clusters is achieved.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

At level 1, there are m clusters that get reduced to 1 cluster at level m. Those data points which get merged to form a cluster at a lower level stay in the same cluster at the higher levels as well. Eg., suppose there are 8 points x1..x8, so initially, there are 8 clusters at level 1. Suppose points x1 and x2 get merged into a cluster at level 2, then till level 8, they stay in the same cluster.

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,294 ratings)

Hierarchical Clustering Algorithm eg1

The above figure shows a dendrogram representation of the agglomeration clustering approach for 8 data points as well as the similarity scale corresponding to each level. The levels of clusters give us an idea of how similar the data points in the clusters are. As shown in fig 1, earlier the data points get merged into a cluster, the similar they are. So, the data points within a cluster at level 2 (eg. x2 and x3) are more similar than those data points in a cluster at level 6 (eg. x1 and x2).

Hierarchical Clustering Algorithm eg2

The above figure shows the Set or Venn diagram representation of the agglomerative clustering approach of the above-mentioned 8 data points. Both dendrograms and set representations can be used for clustering. However, a dendrogram is usually a preferable asset representation that cannot quantitatively represent the cluster similarities.

Steps for Hierarchical Clustering Algorithm

Let us follow the following steps for the hierarchical clustering algorithm which are given below:

1. Algorithm

Agglomerative hierarchical clustering algorithm.

  1. Begin initialize c, c1 = n, Di = {xi}, i = 1,…,n ‘
  2. Do c1 = c1 – 1
  3. Find nearest clusters, say, Di and Dj
  4. Merge Di and Dj
  5. Until c = c1
  6. Return c clusters
  7. End

This algorithm begins with n clusters initially where each data point is a cluster. With each iteration, the number of clusters reduces by 1 as the 2 nearest clusters get merged. This process continues until the number of clusters reduces to the predefined value c.

How to Decide Which Clusters are Near?

That is defined using several distance metrics which are as follows:

  • The minimum distance between the clusters dmin(Di,Dj). Useful for single.
  • The maximum distance between the clusters dmax(Di,Dj). Useful for complete.
  • The average distance between the clusters davg(Di,Dj).
What is Single Linkage and Complete Linkage?
  • When dmin(di,dj) is used to find the distance between two clusters, and the algorithm terminates if the distance between the nearest clusters exceeds a threshold, then the algorithm is called a single linkage algorithm.
  • When dmax(Di,Dj) is used to find the distance between two clusters, and the algorithm terminates if the distance between the nearest clusters exceeds a threshold, then the algorithm is called a complete linkage algorithm.
  • Let us consider each data point as a node of a graph. There is an edge between two data points if they belong to the same cluster. When two nearest clusters are merged, an edge is added. It is called a single linkage because there exists a unique path from one node to the other.
  • The complete linkage algorithm merges two clusters by minimizing the distance between the two farthest points.
  • A single linkage algorithm generates a spanning tree. However, this algorithm is susceptible to noise. A complete linkage algorithm generates a complete graph.
What is Time Complexity of the Algorithm?

Suppose we have n patterns in d-dimensional space, and we use dmin(Di,Dj) to form c clusters. We need to calculate n(n − 1) inter-point distances — each of which is an O(d2) calculation — and place the results in an inter-point distance table. The space complexity is O(n2). Finding the minimum distance pair (for the first merging) requires that we step through the complete list, keeping the index of the smallest distance.

Thus for the first agglomerative step, the complexity is O(n(n − 1)(d2 + 1)) = O(n2d2). For another arbitrary agglomeration step (i.e., from c1 to c1 − 1), we merely step through the n(n − 1) – c1 “unused” distances in the list and find the smallest for which x and x’ lie in different clusters. This is, again, O(n(n−1)−c1). The total time complexity is thus O(cn2d2), and in typical conditions n >> c.

2. Visualization

Once the data is split into clusters, it is a good practice to visualize the clusters so as to get an idea of how does the grouping look. But visualizing this high-dimensional data is difficult. Hence we use Principal Component Analysis (PCA) for visualization. After PCA, we obtain the data points in the low dimensional space (generally 2D or 3D) which we can plot to see the grouping.

Note: High dimension means a large number of features and not a number of data points.

3. Evaluation

One of the methods for the evaluation of clusters is that the distance of the points between the clusters (inter-cluster distance) should be much more than the distance of the points within the cluster (intracluster distance).

Conclusion

The hierarchical clustering algorithm is used to find nested patterns in data Hierarchical clustering is of 2 types – Divisive and Agglomerative Dendrogram and set/Venn diagram can be used for representation Single linkage merges two clusters by minimizing the minimum distance between them. It forms a spanning Complete linkage merges two clusters by minimizing the maximum distance between It forms a complete graph. The total time complexity of the hierarchical clustering algorithm is O(cn2d2), where c is the predefined number of clusters, n is the number of patterns and d is the d- dimensional space of the n patterns.

Recommended Articles

This is a guide to the Hierarchical Clustering algorithm. Here we discuss the types and steps of hierarchical clustering algorithms. You may also look at the following articles to learn more-

  1. Hierarchical Clustering Analysis
  2. Hierarchical Clustering in R’
  3. Clustering Algorithm
  4. What is Clustering in Data Mining?
Popular Course in this category
Machine Learning Training (20 Courses, 29+ Projects)
  19 Online Courses |  29 Hands-on Projects |  178+ Hours |  Verifiable Certificate of Completion
4.7
Price

View Course

Related Courses

Deep Learning Training (18 Courses, 24+ Projects)4.9
Artificial Intelligence AI Training (5 Courses, 2 Project)4.8
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more