What is Cluster Analysis?
Cluster Analysis is a statistical tool which is used to classify objects into groups called clusters, where the objects belonging to one cluster are more similar to the other objects in that same cluster and the objects of other clusters are completely different. In simple words cluster analysis divides data into clusters that are meaningful and useful. Clustering is used mainly for two purposes – clustering for understanding and clustering for utility.
Application of cluster analysis
- Cluster analysis is used in many fields like machine learning, market research, pattern recognition, data analysis, information retrieval, image processing and data compression.
- Cluster analysis can help the marketers to find out distinct groups of their customer base.
- Cluster analysis is used in the field of biology to find out plant and animal taxonomies and categorize genes with similar characteristics
- Cluster analysis is used in an earth observation database to group the houses in a city according to the house type, value and location.
- Clustering can also be used to segment the documents on the web based on a specific criteria
- In data mining, cluster analysis is used to gain in-depth understanding about the characteristics of data in each cluster.
Clustering methods can be divided into the following categories
- Partitioning method
- Hierarchical Method
- Density based method
- Grid Based Method
- Model Based Method
- Constraint Based Method
Advantages of Cluster Analysis
Given below are the advantages of cluster analysis
- Cluster analysis gives a quick overview of data
- It can be used if there are many groups in data
- Cluster analysis can be used when there are unusual similarity measures to be done
- Cluster analysis can be added on ordination plots and it is good for the nearest neighbours
Approaches to cluster analysis
There are a number of different approaches used to carry out cluster analysis which are divided into two
- Hierarchical Method – Agglomerative Methods and Divisive Methods
- Non Hierarchical Method also known as K-means Clustering methods
Cluster Analysis Course Objectives
At the end of this course you will be able to know
- How to use cluster analysis in data mining
- About the various types of clusters
- About the Marketing applications of cluster analysis
- Implications of wide variety of clustering techniques
- Use clustering in statistical analysis
Pre Requisites for Cluster Analysis course
Basic knowledge of statistics is required. Some familiarity with data analysis will be considered as an added advantage though it is not a necessity.
The target audience of this course are listed below
- Research professionals
- Data Analysts
- Data Miners
- And anyone who is interested in learning about cluster analysis
Cluster Analysis Course Description
Section 1: Introduction
Meaning of Cluster Analysis
The term cluster analysis includes a number of different algorithms and methods for grouping of data and objects. It is an exploratory data analysis tool. Cluster analysis is used to discover data structures without explaining why they exist. This section includes the brief introduction, history and benefits of cluster analysis.
Understanding of Cluster Analysis
Under this section we will learn about good clustering which produces high quality clusters and also you will learn how to measure the quality of clustering. The other topics included in this section are major clustering approaches, techniques of cluster analysis, basic concepts and algorithms of cluster analysis.
Example of Cluster Analysis
Clustering is used in every aspect of our daily life. Under this chapter you will learn see some illustration and practical application of cluster analysis in various fields. One example is given with a retail chain of stores across various locations. Another example is given based on market segmentation. Finally a simple numerical example is given which explains the objectives of cluster analysis. An example from each field like marketing, land use, biology, Psychology, Medicine, information retrieval, etc where cluster analysis is used are also given under this section.
Section 2: Types of Clustering
Hierarchical method of Clustering
Hierarchical clustering is a set of nested clusters that are organized in the form of a tree. The hierarchical clustering also contains different methods under it to find out which clusters should be joined at each stage. There are two main types of hierarchical clustering – Agglomerative and Divisive. The agglomerative clustering algorithm is explained in detail with example under this section.
The main methods of hierarchical clustering are also explained in brief in this section
- Nearest Neighbour Method (Single Linkage Method)
- Furthest Neighbour Method (Complete Linkage Method)
- Average Linkage Method (Between Groups)
- Centroid Method
- Ward’s Method
Single linkage clustering
Single linkage method is also known as the nearest neighbour method. This methods is used to measure the distance between clusters where there are more than two observations. The major topics included in this section are listed below
- Spanning tree
- Contracting Space
- Dendrogram or tree diagram
- Example of nearest neighbour method using diagrams
Linkage methods, Wards method
The single linkage method is explained in detail in the previous chapter. This section deals with the other two linkage methods – Complete linkage and Average Linkage.
In Complete linkage method the distance between the two clusters is said to be the maximum distance between the members. The formula is explained in this section. An example is given in detail to make you understand easily.
In average linkage method the distance between two clusters is considered as the average distance between all the pairs in the two clusters. This method is explained in detail under this section with an example.
In centroid method the mean value of each variable of each cluster is found out and the distance between centroids is used to merge the clusters. This method is also explained with an example.
In the ward’s method the pairs of clusters are combined and the sum of the squared distances within each cluster is found out. Finally the lowest sum of squares is chosen. This method is more popular. This section contains examples of this method.
k means clustering
K means clustering is also known as Non Hierarchical clustering. Under this method the desired number of clusters are mentioned beforehand and the best solution is chosen from that. The steps for carrying out K means clustering is mentioned in this chapter.
K means and Example of K means, difference between hierarchical and non hierarchical clustering
The important points of K means clustering is mentioned in this chapter which includes Partitional clustering approach, centroid and K means algorithm. The details of K means clustering is explained using the following points
- Initial Centroids
- Similarity measures
- Happening of convergences
- Complexity of K means
- Types of K means clustering – Sub optimal clustering and Optimal Clustering
- Solutions to Initial Centroids problem
- Evaluating K means cluster
- Difference between Hierarchical Clustering and K means Clustering
- Strengths of K means clustering
- Limitations of K means clustering
Example of K means no. of cluster, Statistical tests, Dendrogram, Scree plot
With its computation K means clustering is considered as a Analysis of Variance (ANOVA) in reverse. The physical fitness example is given to explain the K means clustering method. The K means clustering is explained with other examples using plots and graphs.
Dendrogram – When carrying out a hierarchical cluster analysis, the result can be represented in the form of a diagram which is known as Dendrogram. This diagram explains which are the clusters which have been joined at each stage of the analysis and what was its distance at the time of joining. This helps to select the optimum number of clusters. An example of a Dendrogram is given under this heading.
Scree Plot displays the eigenvalues connected with a component in descending order versus the number of the component. The pattern of Scree plot and the properties of Scree plot in cluster analysis is discussed in this section.
Two step cluster analysis, Evaluation
The two step cluster analysis is used to reveal natural clusters within a data set. It runs pre clustering method first and then hierarchical method. This section contains the following topics under it
- Algorithm of two step cluster analysis
- The two steps of the two step cluster analysis
- Case study – classifying motor vehicles using two step cluster analysis
Example for Listwise and Pairwise deletion of missing values , SPSS windows of output
Listwise and Pairwise deletions are used to find out the missing data. These techniques are used when a data is missing completely at random. Listwise deletion deletes all the data if there is one or more missing values. Pairwise deletion tries to minimize the loss that can be caused because of Listwise deletion. Listwise and Pairwise deletion has its own advantages and disadvantages. This section includes the following topics
- What is Listwise deletion
- Example of Listwise deletion
- What is Pairwise deletion
- Example of Pairwise deletion
SPSS windows of output
In SPSS cluster analysis can be found under Analyze à Classify. SPSS offers three methods of cluster analysis – Hierarchical, K means and Two step cluster. This section includes examples of performing cluster analysis in SPSS.
K means cluster theory, SPSS windows for k means
This section explains what is K means clustering method, its history, algorithm, initialization methods, applications and description.
SPSS is another statistical software which is used to perform cluster analysis. The steps to conduct cluster analysis in SPSS is simple and it lets you to choose the variables on which the cluster analysis needs to be performed. You can perform K means in SPSS by going to the Analyze à Classify à K means cluster. The steps for performing K means cluster analysis in SPSS in given under this chapter. Necessary screenshots are also provided for your easy reference.
FAQ’s General Questions
- What technical support will be provided ?
Our customer support centre will be available at your service 24*7. Through that you can ask your queries and contact your instructors. You can also email your queries to the mail id provided in the site for technical support.
- How can I get access to my course ?
You will be sent an email along with your user name and password. A link will also be sent for your learning course.
- How much time commitment is required for each course ?
Each course requires at least 8 hours to be spent every week. You can choose your flexible time and complete the course at your convenience. Flexibility to learn on your own time is an advantage of taking an online course with educba.
This is an excellent introductory course on Cluster analysis. The course covers mainly two types of cluster analysis – Hierarchical and K means. The quality of the material in this course are of high standards. The course flow from one topic into another is best. The examples under each section makes the learning and understanding process easy. Thanks to educba for offering this course.
This is my first online course and it provided me a good experience. The syllabus of this course makes it more interesting. It is not stuffed with content. The content is good and self explanatory. It gave me a greater overview of the clustering methods and techniques which I was not aware of before taking this course. This course is recommended to someone who is new to the concept of cluster analysis as well as to one who knows how to apply cluster analysis to data. Overall a great course to begin with cluster analysis.
This is a good course on cluster analysis. It covers all the important topics and gives good examples to understand the methods and algorithms. It also gives some real life applications of clustering as examples and thus it makes the content more interesting and engaging. I loved this course and would definitely recommend.
|Where do our learners come from?|
|Professionals from around the world have benefited from eduCBA’s Cluster Analysis courses. Some of the top places that our learners come from include New York, Dubai, San Francisco, Bay Area, New Jersey, Houston, Seattle, Toronto, London, Berlin, UAE, Chicago, UK, Hong Kong, Singapore, Australia, New Zealand, India, Bangalore, New Delhi, Mumbai, Pune, Kolkata, Hyderabad and Gurgaon among many.|