EDUCBA Logo

EDUCBA

MENUMENU
  • Explore
    • EDUCBA Pro
    • PRO Bundles
    • Featured Skills
    • New & Trending
    • Fresh Entries
    • Finance
    • Data Science
    • Programming and Dev
    • Excel
    • Marketing
    • HR
    • PDP
    • VFX and Design
    • Project Management
    • Exam Prep
    • All Courses
  • Blog
  • Enterprise
  • Free Courses
  • Log in
  • Sign Up
Home Data Science Data Science Tutorials R Programming Tutorial Hierarchical Clustering in R
 

Hierarchical Clustering in R

Afshan Banu
Article byAfshan Banu
EDUCBA
Reviewed byRavi Rathore

Updated March 21, 2023

HIERARCHICAL Clustring in r

 

 

Introduction to Hierarchical Clustering in R

A hierarchical clustering mechanism allows grouping of similar objects into units termed as clusters, and which enables the user to study them separately so as to accomplish an objective, as a part of a research or study of a business problem, and that the algorithmic concept can be very effectively implemented in R programming which provides a robust set of methods including but not limited just to the function hclust() so that the user can specifically study the data in the context of hierarchical nature of clustering technique.

Watch our Demo Courses and Videos

Valuation, Hadoop, Excel, Mobile Apps, Web Development & many more.

How does Clustering Work?

Clustering algorithms group a set of similar data points into clusters. The clustering algorithm’s main goal is to create clusters of data points that are similar in the features. In other words, data points within a cluster are similar, and data points in one cluster are dissimilar from data points in another cluster.

There are mainly two-approach uses in the hierarchical clustering algorithm, as given below:

1. Agglomerative

It begins with each observation in a single cluster. Then, the similarity measure in the observation further merges the clusters to make a single cluster until no farther merge possible; this approach is called an agglomerative approach.

2. Divisive

It begins with all observation in a single cluster and farther splits based on the similarity measure or dissimilarity measure cluster until no split possible; this approach is called a divisive method.

Now let’s start hierarchical clustering algorithms; hierarchical clustering can be performed top-down or bottom-up. We start with a bottom-up or agglomerative approach, where we start creating one cluster for each data point and then merge clusters based on some similarity measure in the data points. The next important point is that how we can measure the similarity. There are many distance matrix are available like Euclidean, Jaccard, Manhattan, Canberra, Minkowski etc., to find the dissimilarity measure. The choice of the distance matrix depends on the type of the data set available; for example, if the data set contains continuous numerical values, then the good choice is the Euclidean distance matrix, whereas if the data set contains binary data, the good choice is Jaccard distance matrix and so on.

Implementing Hierarchical Clustering in R

The steps required to perform to implement hierarchical clustering in R are:

1. Install all Required R Packages

We are going to use the below packages, so install all these packages before using:

install.packages ( "cluster" ) # for clustering algorithms
install.packages ( "tidyverse" ) # for data manipulation
install.packages ( "factoextra" ) # for clustering visualization
# includes package in R as –
library ( "cluster" )
library( "tidyverse" )
library( "factoextra" )

2. Data Preparation

The data Prepare for hierarchical cluster analysis; this step is very basic and important; we need to mainly perform two tasks here that are scaling and estimate missing value. The data must be scaled or standardized, or normalized to make variables comparable. The scaled or standardized or normalized is a process of transforming the variables such that they should have a standard deviation of one and mean zero.

If any missing value is present in our data set, then it is very important to impute the missing value or remove the data point itself. There are different options available to impute the missing value like average, mean, median value to estimate the missing value. For example, we use here iris built-in dataset, in which we want to cluster the iris type of plants; the iris data set contain 3 classes for each class 50 instances. It contains 5 features as Sepal. Length, Sepal.Width, Petal.Length, Petal.Width and Species.

The R code:

data <- iris
print(data)
# the sample of data set showing below which contain 1 sample for each class

“Sepal.Length” “Sepal.Width” “Petal.Length” “Petal.Width” “Species”

1      4.9      3.5       1.3      0.2          setosa

51    7.0      3.1       4.5       1.3         Versicolor

101   6.3     3.2       6.0      1.9         Virginia

data <- na.omit(data) # remove missing value
data <- scale(df) # scaling the variables or features

3. Specify which Hierarchical Clustering Algorithms using

The different types of hierarchical clustering algorithms as agglomerative hierarchical clustering and divisive hierarchical clustering, are available in R. The required functions are –

  • Agglomerative hierarchical clustering (HC): hclust function, which is available in stats package, and Agnes function, which is available in cluster package] for
  • Divisive hierarchical clustering: Diana function, which is available in cluster package.

4. Computing Hierarchical Clustering

The distance matrix needs to be calculated, and put the data point to the correct cluster to compute the hierarchical clustering. There are different ways we can calculate the distance between the cluster, as given below:

Complete Linkage: Maximum distance calculates between clusters before merging.

max distance

Single Linkage: Minimum distance calculates between the clusters before merging.

min distance

Average Linkage: Calculates the average distance between clusters before merging.

avg distance

R code:

cluster <- hclust(data, method = "average" )

Centroid Linkage: The distance between the two centroids of the clusters calculates before merging.

R code:

# matrix of Dissimilarity
dis_mat <- dist(data, method = "euclidean")
# Hierarchical clustering using Complete Linkage
cluster <- hclust(data, method = "complete" )
#  or Compute with agnes
cluster <- agnes(data, method = "complete")

5. Dendrograms

The dendrogram is used to manage the number of clusters obtained. It performs the same as in k-means k performs to control a number of clustering. The current function we can use to cut the dendrogram.

R code:

cutree(as.hclust(cluster), k = 3)

The full R code:

ibrary(scatterplot3d)
data <- iris
print( data )
data <- na.omit(data)
data <- scale(data)
# matrix of Dissimilarity
dis_mat <- dist(data, method = "euclidean")
# creating hierarchical clustering with Complete Linkage
cluster <- hclust(data, method = "complete" )
# Dendrogram plot
plot(cluster)
# or agnes can be used to compute hierarchical clustering
Cluster2 <- agnes(data, method = "complete")
# Dendrogram plot
plot(cluster2)

The plot of the above R code:

cluster

Conclusion

There are mainly two types of machine learning algorithms supervised learning algorithms and unsupervised learning algorithms. Clustering algorithms are an example of unsupervised learning algorithms. Clustering algorithms groups a set of similar data points into clusters. There are mainly two-approach uses in the hierarchical clustering algorithm, as given below agglomerative hierarchical clustering and divisive hierarchical clustering.

Recommended Articles

This is a guide to Hierarchical Clustering in R. Here; we discuss how clustering works and implementing hierarchical clustering in R in detail. You can also go through our other related articles to learn more-

  1. Clustering Algorithm
  2. What is Clustering in Data Mining?
  3. Clustering Methods
  4. Machine Learning Methods
  5. Overview of Hierarchical Clustering Analysis
  6. List of R Packages
  7. Guide to Methods of Data Mining Cluster Analysis

Primary Sidebar

Footer

Follow us!
  • EDUCBA FacebookEDUCBA TwitterEDUCBA LinkedINEDUCBA Instagram
  • EDUCBA YoutubeEDUCBA CourseraEDUCBA Udemy
APPS
EDUCBA Android AppEDUCBA iOS App
Blog
  • Blog
  • Free Tutorials
  • About us
  • Contact us
  • Log in
Courses
  • Enterprise Solutions
  • Free Courses
  • Explore Programs
  • All Courses
  • All in One Bundles
  • Sign up
Email
  • [email protected]

ISO 10004:2018 & ISO 9001:2015 Certified

© 2025 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

EDUCBA Login

Forgot Password?

🚀 Limited Time Offer! - ENROLL NOW