• Skip to primary navigation
  • Skip to content
  • Skip to primary sidebar
  • Skip to footer
EDUCBA

EDUCBA

MENUMENU
  • Resources
        • Data & Analytics Career

          • Big Data Analytics Jobs
          • Hadoop developer interview Questions
          • Big Data Vs Machine Learning
        • Data and Analytics Career
        • Interview Questions

          • Career in Cloud Computing Technology
          • Big Data interview questions
          • Data Scientist vs Machine Learning
        • Interview Questions
        • Machine Learning

          • What is Machine Learning
          • Machine Learning Tools
          • Neural Network Algorithms
        • Head to Head Differences
        • Others

          • Resources (A-Z)
          • Data and Analytics Basics
          • Business Analytics
          • View All
  • Free Courses
  • All Courses
        • Certification Courses

          Data Science Course
        • All in One Bundle

          All-in-One-Data-Science-Bundle
        • Machine Learning Course

          Machine-Learning-Training
        • Others

          • Hadoop Certification Training
          • Cloud Computing Training Course
          • R Programming Course
          • AWS Training Course
          • SAS Training Course
          • View All
  • 360+ Courses All in One Bundle
  • Login

Introducing Best Comparison of Cluster v/s Factor analysis

Home » Data Science » Blog » Statistical Analysis » Introducing Best Comparison of Cluster v/s Factor analysis

Cluster Analysis  vs Factor Analysis

What is Cluster Analysis

Cluster analysis groups data based on the characteristics they possess. Cluster analysis groups objects based upon the factors that makes them similar. Cluster analysis is otherwise called Segmentation analysis or taxonomy analysis. Cluster analysis does not differentiate  dependent and independent variables. Cluster analysis is used in a wide variety of fields such as psychology, biology, statistics, data mining, pattern recognition and other social sciences.

Objective of Cluster Analysis

The main cluster analysis objective is to address the heterogeneity in each set of data. The other cluster analysis objectives are

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

  • Taxonomy description – Identifying groups within the data
  • Data simplification – The ability to analyze groups of similar observations instead of all individual observation
  • Hypothesis generation or testing – Develop hypothesis based on the nature of the data or to test the previously stated hypothesis
  • Relationship Identification – The simplified structure from cluster analysis that describes the relationships

There are two main purposes of cluster analysis – Understanding and Utility.

In the circumstance of Understanding, cluster analysis groups objects that share some common characteristics

In the purpose of Utility, cluster analysis provides the characteristics of each data object to the clusters to which they belong.

Cluster analysis goes hand in hand with factor analysis and discriminant analysis.

Popular Course in this category
Cyber Week Sale
Statistical Analysis Training (10 Courses, 5+ Projects) 10 Online Courses | 5 Hands-on Projects | 126+ Hours | Verifiable Certificate of Completion | Lifetime Access
4.5 (2,319 ratings)
Course Price

View Course

Related Courses
Minitab Training (5 Courses, 2+ Projects)EViews Econometrics Training (6 Courses, 5+ Projects)SPSS Certification Training (2 Courses, 5+ Projects)

You should ask to yourself few cluster analysis questions before starting with it

  • What variables are relevant ?
  • Is the sample size enough ?
  • Can outliers be detected and should it be removed ?
  • How should object similarity to be measured ?
  • Should data be standardized ?

scatter

Types of Clusters

There are three major type of clustering

  • Hierarchical Clustering – Which contains Agglomerative and Divisive method
  • Partitional Clustering – Contains K-Means, Fuzzy K-Means, Isodata under it
  • Density based Clustering – Has Denclust, CLUPOT, Mean Shift, SVC, Parzen-Watershed under it

Assumptions in Cluster Analysis

There are always two assumptions in cluster analysis

  • It is assumed that the sample is a representative of the population
  • It is assumed that the variables are not correlated. Even if variables are correlated remove correlated variables or use distance measures that compensates for the correlation.

Steps in Cluster Analysis

    • Step 1 : Define the Problem
    • Step 2 : Decide the appropriate similarity measure
    • Step 3 : Decide on how to group the objects
    • Step 4 : Decide the number of clusters
    • Step 5 : Interpret, describe and validate the cluster

Cluster Analysis in SPSS

In SPSS you can find the cluster analysis option in Analyze/Classify option. In SPSS there are three methods for the cluster analysis – K-Means Cluster, Hierarchical Cluster and Two Step Cluster.

K-Means cluster method classifies a given set of data through a fixed number of clusters. This method is easy to understand and gives best output when the data are well separated from each other.

Two Step cluster analysis is a tool designed to handle large data sets. It creates clusters on both categorical and continuous variables.

Hierarchical cluster is the most commonly used method of cluster analysis. It combines cases into homogeneous clusters by bringing them together through a series of sequential steps.

Hierarchical cluster analysis contains three steps

  • Calculate the distance
  • Link the clusters
  • Choosing a solution by selecting the right number of clusters

Given below are the steps for performing Hierarchical Cluster analysis in SPSS.

  • First step is to select the variables which are to be clustered. The below dialog box explains it to you
  • By clicking the statistics option in the above dialog box, you will get the dialog box where you want to specify the output
  • In the dialog box plots, add the Dendrogram. Dendrogram is the graphical representation of the hierarchical cluster analysis method. It shows how the clusters are combined at every step until it forms a single cluster.
  • The dialog box method is crucial. You can mention the distance and clustering method here. In SPSS there are three measures for Interval, counts and binary data.
  • The Squared Euclidian Distance is the sum of the squared differences without taking the square root.
  • In the counts you can select between Chi Square and Phi Square measure
  • In the Binary section you have a lot of options to choose. Squared Euclidean distance is the best option to use.
  • Next step is to choose the cluster method. It is always recommended to use Single Linkage or Nearest Neighbour as it easily helps to identify the outliers. After the outliers are identified you can use Ward’s Method.
  • The last step is Standardization

SPSS

Criticisms of Cluster Analysis

The most common criticisms are listed below

  • It is descriptive, theoretical and non inferential.
  • It will produce clusters regardless of the actual existence of any structure
  • It is cannot be used widely as it totally depends upon the variables used as a basis for the similarity measure

What is Factor Analysis ?

Factor analysis is an explorative analysis which helps in grouping similar variables into dimensions. It can be used to simplify the data by reducing the dimensions of the observations. Factor analysis has several different rotation methods.

Factor analysis is used mostly for data reduction purposes.

There are two types of factor analysis – Exploratory and Confirmatory

  • Exploratory method is used when you do not have a pre defined idea about the structures or dimensions in a set of variables.
  • Confirmatory method is used when you want to test specific hypothesis about the structures or dimensions in a set of variables.

Objectives of Factor Analysis

There are two main objectives of Factor Analysis which is mentioned below

  • Identification of the underlying factors – This includes clustering variables into homogenous sets, creating new variables and helping to gain knowledge about the categories
  • Screening of variables – It is helpful in regression and identifies groupings to allow you to select one variable that represents many.

Assumptions of Factor analysis

There are four main assumptions of Factor analysis which are mentioned below

  • Models are usually based on linear relationships
  • It assumes that the data collected are interval scaled
  • Multicollinearity in the data is desirable as the objective is to find out the interrelated set of variables
  • The data should be open and responsive for factor analysis. It should not be in such a way that a variable is only correlated with itself and no correlation exists with any other variable. Factor analysis cannot be done on such data.

Types of Factoring

  • Principal component factoring – Most commonly used method where factor weights are computed to extract the maximum possible variance and continues until there is no meaningful variance left.
  • Canonical factor analysis – Finds factors which have the highest canonical correlation with the observed variables
  • Common factor analysis – Seeks the least number of factors which can account for the common variance of a set of variables
  • Image factoring – Based on the correlation matrix where each variable is predicted from the others using multiple regression
  • Alpha Factoring – Maximizes the reliability of factors
  • Factor regression model – Combination of factor model and regression model whose factors are partially known

Criteria of Factor analysis

  1. Eigenvalue criteria

  • Represents the amount of variance in the original variables that is connected with a factor
  • Sum of the square of the factor loadings of each variable on a factor represents the eigenvalue
  • Factors with eigenvalues which are greater than 1.0 are kept
  1. Scree Plot Criteria

  • A plot of the eigenvalues against the number of factors, in order of extraction.
  • The shape of the plot determines the number of factors
  1. Percentage of Variance Criteria

  • The number of factors extracted is found out so that the increasing percentage of variance extracted by the factors reaches the level of satisfaction.
  1. Significance Test Criteria

  • Statistical importance of the separate eigenvalues is found out, and only those factors that are statistically significant are retained

Factor analysis is used in various fields like Psychology, Sociology, Political Science, Education and Mental health.

Factor Analysis in SPSS

In SPSS the factor analysis option can be found in the Analyze à Dimension reduction à Factor

  • Start by adding the variables to the list of variables section
  • Click the Descriptive tab and add few statistics under which the assumptions of factor analysis are verified.
  • Click the Extraction option which will let you to choose the extraction method and cut off value for extraction
  • Principal Components (PCA) is the default extraction method which extracts even uncorrelated linear combinations of the variables. PCA can be used when a correlation matrix is singular. It is very similar to Canonical Correlation Analysis where the first factor has maximum variance and the following factors explain smaller portion of the variance.
  • The second most general analysis is Principal axis factoring. It identifies the latent constructs behind the observations.
  • Next step is to select a rotation method. The most frequently used method is Varimax. This method simplifies the interpretation of the factors.
  • The second method is Quartimax. This method rotates the factors in order to minimize the number of factors. It simplifies the interpretation of the observed variable.
  • Next method is Equamax which is a combination of the above two methods.
  • In the dialog box by clicking on the “options” you can manage the missing values
  • Before saving the results to data set, first run the factor analysis and check for assumptions and confirm that the results are meaningful and useful.

Cluster Analysis vs Factor Analysis

Both cluster analysis and factor analysis are unsupervised learning method which is used for segmentation of data. Many researchers who are new to this field feel that the cluster analysis and factor analysis are similar. It might seem similar but they differ in many ways. The differences between cluster analysis and factor analysis are listed below

  • Objective

The objective of cluster and factor analysis are different. The objective of cluster analysis is to divide the observations into homogeneous and distinct groups. The factor analysis on the other hand explains the homogeneity of the variables resulting from the similarity of values.

  • Complexity

Complexity is another factor on which cluster and factor analysis differ. The data size affects the analysis differently. If the data size is too big then it becomes computationally intractable in cluster analysis.

  • Solution

The solution to a problem is more or less similar in both the factor and cluster analysis. But factor analysis provides a better solution to the researcher in a better aspect. Cluster analysis do not yield best result as all the algorithms in cluster analysis are computationally inefficient.

  • Applications

Factor analysis and cluster analysis are applied differently to real data. Factor analysis is suitable for simplifying complex models. It reduces the large set of variables to a much smaller set of factors. The researcher can develop a set of hypothesis and run factor analysis to confirm or deny these hypothesis.

Cluster analysis is suitable for classifying objects based on certain criteria. The researcher can measure certain aspects of a group and divide them into specific categories using cluster analysis.

There are also lot of other differences which are mentioned below

  • Cluster analysis attempts to group cases whereas factor analysis attempts to group features.
  • Cluster analysis is used to find smaller groups of cases that are representative of a data as a whole. Factor analysis is used to find a smaller group of features that are representative of data sets original features.
  • The most important part of cluster analysis is finding the number of clusters. Basically clustering methods are divided into two – Agglomerative method and Partitioning method. Agglomerative method starts with each case in its own cluster and stops when a criteria is reached. Partitioning method starts with all cases in one cluster.
  • Factor analysis is used to find out an underlying structure in a set of data.

Conclusion

Hope this article would have helped you to understand the basics of Cluster analysis and Factor analysis and the differences between the two.

Related Courses :-

  1. Cluster Analysis Course

Statistical Analysis Training (10 Courses, 4+ Projects)

10 Online Courses

5 Hands-on Projects

126+ Hours

Verifiable Certificate of Completion

Lifetime Access

Learn More

12 Shares
Share
Tweet
Share
Reader Interactions

Comments

  1. naina says

    January 26, 2017 at 8:04 pm

    Very nice

    Reply
  2. zaha says

    October 4, 2017 at 5:24 am

    thanks that was very informative

    Reply
  3. Ella says

    March 29, 2018 at 7:57 am

    It is fantastic article that helps me differ the two analysis technics. I was struggling in choosing the right one. Huge Thanks to the author.

    Reply
  4. Mathieu says

    August 3, 2018 at 2:41 pm

    Great text, thank you

    Reply
Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar
Data Analytics Tutorials Tutorials
  • Statistical Analysis
    • R Boxplot labels
    • List of R Packages
    • R Packages
    • R Data Types
    • Switch Statement in R
    • One Way ANOVA in R
    • Factorial in R
    • If Else Statement in R
    • R list
    • R Program Functions
    • KNN Algorithm in R
    • R Vectors
    • What is R Programming Language?
    • While Loop in R
    • Sorting in R
    • What is SAS
    • SAS Command
    • Loops in SAS
    • MATLAB vs R
    • Standard Deviation vs Mean
    • What is SPSS and How Does It Work?
    • Free Statistical Analysis Software
    • Interpret Results Using ANOVA Test
    • Cluster v/s Factor analysis
    • SPSS vs SAS
    • SPSS vs Stata
    • R vs SPSS
    • Talend vs Mulesoft
    • Predictive Analytics vs Statistics
    • Excel vs Tableau
    • SPSS vs EXCEL
    • What is Minitab?
    • Clustering Methods
    • Statistics vs Machine learning
    • What is SPSS?
    • Regression vs ANOVA
  • Big Data (151+)
  • Business Analytics (40+)
  • Cloud Computing (82+)
  • Data Analytics Basics (202+)
  • Data Analytics Careers (36+)
  • Data Mining (30+)
  • Data Visualization (88+)
  • Interview Questions (50+)
  • Machine Learning (141+)
  • Data Commands (4+)
  • Power Bi (6+)
Data Analytics Tutorials Courses
  • Statistical Analysis Training
  • Minitab Training
  • EViews Econometrics Training
  • SPSS Certification Training
Footer
About Us
  • Who is EDUCBA?
  • Sign Up
  •  
Free Courses
  • Free Course on Data Science
  • Free Course on Machine Learning
  • Free Coruse on Statistics
  • Free Course on Data Analytics
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course
  • Tableau Training
  • Azure Training Course
  • IoT Course
  • Minitab Training
  • SPSS Certification Course
  • Data Science with Python Course
Resources
  • Resources (A To Z)
  • Data & Analytics Career
  • Interview Questions
  • Data Visualization
  • Data and Analytics Basics
  • Cloud Computing
Apps
  • iPhone & iPad
  • Android
Support
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions

© 2019 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA Login

Forgot Password?

Let’s Get Started
Please provide your Email ID
Email ID is incorrect

Cyber Week Offer - Statistical Analysis Training (10 Courses, 4+ Projects) View More

Cyber Week Offer - Cyber Week Offer - Statistical Analysis Training (10 Courses, 4+ Projects) View More