EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

Hierarchical Clustering Analysis

By Priya PedamkarPriya Pedamkar

Home » Data Science » Data Science Tutorials » Machine Learning Tutorial » Hierarchical Clustering Analysis

 Hierarchical Clustering Analysis

Overview of Hierarchical Clustering Analysis

Hierarchical Clustering analysis is an algorithm used to group the data points with similar properties. These groups are termed as clusters. As a result of hierarchical clustering, we get a set of clusters where these clusters are different from each other. Clustering of this data into clusters is classified as Agglomerative Clustering( involving decomposition of cluster using bottom-up strategy ) and Divisive Clustering ( involving decomposition of cluster using top-down strategy )

There are various types of clustering analysis; one such type is Hierarchical clustering.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

 Hierarchical Clustering

Hierarchical clustering will help in creating clusters in a proper order/hierarchy. For example, the most common everyday example we see is how we order our files and folders in our computer by proper hierarchy.

Types of Hierarchical Clustering

Hierarchical clustering is further classified into two types, i.e., Agglomerative clustering and Divisive Clustering (DIANA)

 Hierarchical Clustering Analysis 2

Agglomerative clustering

In this case of clustering, the hierarchical decomposition is done with the help of bottom-up strategy where it starts by creating atomic (small) clusters by adding one data object at a time and then merges them together to form a big cluster at the end, where this cluster meets all the termination conditions. This procedure is iterative until all the data points are brought under one single big cluster.

AGNES (AGglomerative NESting)  is a type of agglomerative clustering that combines the data objects into a cluster based on similarity. The result of this algorithm is a tree-based structure called Dendrogram. Here it uses the distance metrics to decide which data points should be combined with which cluster. Basically, it constructs a distance matrix and checks for the pair of clusters with the smallest distance, and combines them.

Popular Course in this category
Sale
Statistical Analysis Training (15 Courses, 10+ Projects)15 Online Courses | 10 Hands-on Projects | 140+ Hours | Verifiable Certificate of Completion | Lifetime Access
4.5 (9,032 ratings)
Course Price

View Course

Related Courses
Machine Learning Training (19 Courses, 29+ Projects)Deep Learning Training (15 Courses, 24+ Projects)Artificial Intelligence Training (5 Courses, 2 Project)

 Hierarchical Clustering Analysis 3png

The above figure shows Agglomerative vs Divisive clustering

Based on how the distance between each cluster is measured, we can have 3 different methods

  • Single linkage: Where the shortest distance between the two points in each cluster is defined as the distance between the clusters.
  • Complete linkage: In this case, we will consider the longest distance between each cluster’s points as the distance between the clusters.
  • Average linkage: Here, we will take the average between each point in one cluster to every other point in the other cluster.

Now let’s discuss the strengths & weakness in AGNES; this algorithm has a time complexity of at least O(n2); hence it doesn’t do well in scaling, and one other major drawback is that whatever has been done can never be undone, i.e. If we incorrectly group any cluster in an earlier stage of the algorithm then we will not be able to change the outcome/modify it. But this algorithm has a bright side since there are many smaller clusters are formed; it can be helpful in the process of discovery. It produces an ordering of objects that is very helpful in visualization.

Divisive Clustering (DIANA)

Diana basically stands for Divisive Analysis; this is another type of hierarchical clustering where basically it works on the principle of top-down approach (inverse of AGNES) where the algorithm begins by forming a big cluster, and it recursively divides the most dissimilar cluster into two, and it goes on until we’re all the similar data points belong in their respective clusters. These divisive algorithms result in highly accurate hierarchies than the agglomerative approach, but they are computationally expensive.

Divisive Clustering

The above figure shows Divisive clustering step by step process

Multiphase Hierarchical Clustering

To improve the quality of clusters generated by the above-mentioned hierarchical clustering techniques, we integrate our hierarchical clustering techniques with other clustering techniques called multiphase clustering. The different types of multiphase clustering are as follows:

  • BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies)
  • ROCK (RObust Clustering using links)
  • CHAMELEON

 1. Balanced Iterative Reducing and Clustering using Hierarchies

This method is mainly used for clustering a huge quantity of numeric data by integrating our hierarchical/micro clustering at the initial phase and macro clustering/iterative partitioning at the later phase. This method helps in overcoming the scalability problem we faced in AGNES and the inability to undo what was done in the before step. BIRCH uses two important concepts in its algorithm

a. Clustering feature (Helps in summarizing the cluster)

CF is defined as <n, LS, SS> (n- number of data points in the cluster, the linear sum of n points, the square sum of n points). Storing the feature of a cluster helps avoid storing detailed information about it, and CF is additive in nature for different clusters.

CF1 + CF2 = <n1+n2, LS1+LS2, SS1+SS2>

b. Clustering feature tree (helps in representing a cluster as a hierarchy)

CF tree is a balanced tree with branching factor B (maximum number of children) and threshold T (max number of sub-clusters that can be stored in leaf nodes).

The algorithm basically works in 2 phases; in phase 1, it scans the database and builds an in-memory CF tree, and in phase 2, it uses the clustering algorithm, which helps in clustering the leaf nodes by removing the outliers (sparse clusters) and groups the cluster with maximum density. The one and only drawback of this algorithm are that it handles only the numeric data type.

2. Robust clustering using link’s

Link is defined as the number of common neighbors between two objects. ROCK algorithm is a type of clustering algorithm which uses this concept of link with the categorical dataset. As we know that the distance measured clustering algorithms does not provide high-quality clusters for the categorical dataset, but in the case of ROCK, it considers the neighborhoods of the data points as well, i.e. if two data points have the same neighbourhood, then they are most likely to belong in the same cluster. The algorithm will construct a sparse graph in the first step taking into account the similarity matrix with the concept of neighbourhood and similarity threshold. The second step uses the agglomerative hierarchical clustering technique on the sparse graph.

3. Chameleon

This type of hierarchical clustering algorithm uses the concept of dynamic modelling. Wondering why it is called dynamic? It’s called dynamic because it can automatically adapt to the internal cluster characteristics by evaluating the cluster similarity, i.e. how well connected the data points are within a cluster and at the proximity of clusters. One of the drawbacks of chameleon is that the processing cost is too high (O(n2) for n objects is the worst-case time complexity).

Framework of Chameleon

Image source – Google

Conclusion

This article has learned what a cluster is and what is cluster analysis, different types of hierarchical clustering techniques, and their advantages and disadvantages. Each of the techniques we discussed has its own plus and minus; hence, we need to understand our data with proper exploratory data analysis and choose the algorithm with caution before going ahead with an algorithm.

Recommended Articles

This is a guide to Hierarchical Clustering Analysis. Here we discuss the overview, agglomerative clustering, divisive Clustering (DIANA), and multiphase hierarchical clustering. You may also look at the following articles to learn more

  1. Hierarchical Clustering in R
  2. Clustering Algorithm
  3. clusters
  4. Clustering Methods

Statistical Analysis Training (10 Courses, 5+ Projects)

15 Online Courses

10 Hands-on Projects

140+ Hours

Verifiable Certificate of Completion

Lifetime Access

Learn More

0 Shares
Share
Tweet
Share
Primary Sidebar
Machine Learning Tutorial
  • Supervised
    • What is Supervised Learning
    • Supervised Machine Learning
    • Supervised Machine Learning Algorithms
    • Perceptron Learning Algorithm
    • Simple Linear Regression
    • Polynomial Regression
    • Multivariate Regression
    • Regression in Machine Learning
    • Hierarchical Clustering Analysis
    • Linear Regression Analysis
    • Support Vector Regression
    • Multiple Linear Regression
    • Linear Algebra in Machine Learning
    • Statistics for Machine Learning
    • What is Regression Analysis?
    • Clustering Methods
    • Backward Elimination
    • Ensemble Techniques
    • Bagging and Boosting
    • Linear Regression Modeling
    • What is Reinforcement Learning
  • Basic
    • Introduction To Machine Learning
    • What is Machine Learning?
    • Uses of Machine Learning
    • Applications of Machine Learning
    • Naive Bayes in Machine Learning
    • Dataset Labelling
    • DataSet Example
    • Dataset ZFS
    • Careers in Machine Learning
    • What is Machine Cycle?
    • Machine Learning Feature
    • Machine Learning Programming Languages
    • What is Kernel in Machine Learning
    • Machine Learning Tools
    • Machine Learning Models
    • Machine Learning Platform
    • Machine Learning Libraries
    • Machine Learning Life Cycle
    • Machine Learning System
    • Machine Learning Datasets
    • Top 7 Useful Benefits Of Machine Learning Certifications
    • Machine Learning Python vs R
    • Optimization for Machine Learning
    • Types of Machine Learning
    • Machine Learning Methods
    • Machine Learning Software
    • Machine Learning Techniques
    • Machine Learning Feature Selection
    • Ensemble Methods in Machine Learning
    • Support Vector Machine in Machine Learning
    • Decision Making Techniques
    • Restricted Boltzmann Machine
    • Regularization Machine Learning
    • What is Regression?
    • What is Linear Regression?
    • Dataset for Linear Regression
    • Decision tree limitations
    • What is Decision Tree?
    • What is Random Forest
  • Algorithms
    • Machine Learning Algorithms
    • Apriori Algorithm in Machine Learning
    • Types of Machine Learning Algorithms
    • Bayes Theorem
    • AdaBoost Algorithm
    • Classification Algorithms
    • Clustering Algorithm
    • Gradient Boosting Algorithm
    • Mean Shift Algorithm
    • Hierarchical Clustering Algorithm
    • Hierarchical Clustering Agglomerative
    • What is a Greedy Algorithm?
    • What is Genetic Algorithm?
    • Random Forest Algorithm
    • Nearest Neighbors Algorithm
    • Weak Law of Large Numbers
    • Ray Tracing Algorithm
    • SVM Algorithm
    • Naive Bayes Algorithm
    • Neural Network Algorithms
    • Boosting Algorithm
    • XGBoost Algorithm
    • Pattern Searching
    • Loss Functions in Machine Learning
    • Decision Tree in Machine Learning
    • Hyperparameter Machine Learning
    • Unsupervised Machine Learning
    • K- Means Clustering Algorithm
    • KNN Algorithm
    • Monty Hall Problem
  • Classification
    • Kernel Methods in Machine Learning
    • Clustering in Machine Learning
    • Machine Learning Architecture
    • Automation Anywhere Architecture
    • Machine Learning C++ Library
    • Machine Learning Frameworks
    • Data Preprocessing in Machine Learning
    • Data Science Machine Learning
    • Classification of Neural Network
    • Neural Network Machine Learning
    • What is Convolutional Neural Network?
    • Single Layer Neural Network
    • Kernel Methods
    • Forward and Backward Chaining
    • Forward Chaining
    • Backward Chaining
  • Deep Learning
    • What Is Deep learning
    • Overviews Deep Learning
    • Application of Deep Learning
    • Careers in Deep Learnings
    • Deep Learning Frameworks
    • Deep Learning Model
    • Deep Learning Algorithms
    • Deep Learning Technique
    • Deep Learning Networks
    • Deep Learning Libraries
    • Deep Learning Toolbox
    • Types of Neural Networks
    • Convolutional Neural Networks
    • Create Decision Tree
    • Deep Learning for NLP
    • Caffe Deep Learning
    • Deep Learning with TensorFlow
  • RPA
    • What is RPA
    • What is Robotics?
    • Benefits of RPA
    • RPA Applications
    • Types of Robots
    • RPA Tools
    • Line Follower Robot
    • What is Blue Prism?
    • RPA vs BPM
  • PyTorch
    • PyTorch Tensors
    • What is PyTorch?
    • PyTorch MSELoss()
    • PyTorch NLLLOSS
    • PyTorch MaxPool2d
    • PyTorch Pretrained Models
    • PyTorch Squeeze
    • PyTorch Reinforcement Learning
    • PyTorch zero_grad
    • PyTorch norm
    • PyTorch VAE
    • PyTorch Early Stopping
    • PyTorch requires_grad
    • PyTorch MNIST
    • PyTorch Conv2d
    • Dataset Pytorch
    • PyTorch tanh
    • PyTorch bmm
    • PyTorch profiler
    • PyTorch unsqueeze
    • PyTorch adam
    • PyTorch backward
    • PyTorch concatenate
    • PyTorch Embedding
    • PyTorch Tensor to NumPy
    • PyTorch Normalize
    • PyTorch ReLU
    • PyTorch Autograd
    • PyTorch Transpose
    • PyTorch Object Detection
    • PyTorch Autoencoder
    • PyTorch Loss
    • PyTorch repeat
    • PyTorch gather
    • PyTorch sequential
    • PyTorch U-NET
    • PyTorch Sigmoid
    • PyTorch Neural Network
    • PyTorch Quantization
    • PyTorch Ignite
    • PyTorch Versions
    • PyTorch TensorBoard
    • PyTorch Dropout
    • PyTorch Model
    • PyTorch optimizer
    • PyTorch ResNet
    • PyTorch CNN
    • PyTorch Detach
    • Single Layer Perceptron
    • PyTorch vs Keras
    • torch.nn Module
  • UiPath
    • What is UiPath
    • UiPath Action Center
    • UiPath?Orchestrator
    • UiPath web automation
    • UiPath Orchestrator API
    • UiPath Delay
    • UiPath Careers
    • UiPath Architecture
    • UiPath version
    • Uipath Reframework
    • UiPath Studio
  • Interview Questions
    • Deep Learning Interview Questions And Answer
    • Machine Learning Cheat Sheet

Related Courses

Machine Learning Training

Deep Learning Training

Artificial Intelligence Training

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Special Offer - Statistical Analysis Training (10 Courses, 5+ Projects) Learn More