EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

Principal Component Analysis

By Priya PedamkarPriya Pedamkar

Home » Data Science » Data Science Tutorials » Artificial Intelligence Tutorial » Principal Component Analysis

Principal Component Analysis

Introduction to Principal Component Analysis

In data science, we generally have large datasets with multiple features to work on. If the computation of your models gets slow enough or your system is not powerful enough to perform such a huge computation, then you might end up looking for the alternatives. This is where PCA comes in to picture. With the help of Principal Component Analysis, you reduce the dimension of the dataset that contains many features or independent variables that are highly correlated with each other, while keeping the variation in the dataset up to a maximum extent. The new features that are created by combining the correlated features are termed as principal components. These principal components are the eigenvectors that are decomposed from a covariance matrix; hence they are orthogonal.

Why do we need PCA?

PCA is primarily used for dimensionality reduction in domains like facial recognition, computer vision, image compression and finding patterns in the field of finance, psychology, data mining, etc. PCA is used to extract the important information out of the dataset by combining the redundant features. These features are expressed in the form of new variables termed as principal components. Since the visualization of the features in the dataset is limited, we can also use PCA to reduce the dimensionality of the dataset to 2 or 3 principal components and then visualize to get a better insight into it.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

How PCA works?

Following steps are involved in the working of PCA:

  • Data Normalization
  • Computing the Covariance Matrix
  • Computing the Eigen Value and vectors from the Covariance Matrix
  • Choose the first k eigenvectors where k is the required dimension.
  • Transform the datapoints into k dimension

1. Data Normalization

It is important to perform data scaling before running PCA on the dataset. Because if we use data of if different scales, then we end up getting miss leading principle components. To do so, you need to perform mean normalization, and optionally you can also perform feature scaling

Suppose we have m dataset, i.e.: x1,x2,…..XM of dimension n.

Then compute the mean of each feature using the following equation:

principal component

Where:

  • uj: It means of the jth feature
  • m: Size of the dataset
  • x(i)j: jth feature of data sample i

Once you have the mean of each feature then update each x(i)j with x(i)j – uj.

Popular Course in this category
All in One Data Science Bundle (360+ Courses, 50+ projects)360+ Online Courses | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (3,220 ratings)
Course Price

View Course

Related Courses
Artificial Intelligence Training (3 Courses, 2 Project)Machine Learning Training (17 Courses, 27+ Projects)

Note that if the features are of a different scale, then you must use the following equation to normalize the data:

principal component

Where:

  • x(i)j: jth feature of data sample i
  • sj: The difference between the max and min element of the jth feature

2. Computing the Covariance Matrix

Assuming that the reader knows about covariance and variance, We will see what the Covariance matrix is represented.

Assuming that we have a dataset of 4 dimensions (a, b, c, d) and the variance is represented as Va, Vb, Vc, Vd for each dimension and Covariance is represented as Ca,b for dimension across a and b, Cb,c for dimension across b and c and so on. Then the Covariance matrix is given as:

Covariance Matrix

The variance of the dataset is the diagonal element in the Covariance Matrix while the covariance of the dataset is the off-diagonal element.

If we have a dataset X of m*n dimension where m is the number of data points, and n is the number of dimensions then the covariance matrix sigma is given by:

Covariance Matrix

3. Computing the Eigen Value and vectors from the Covariance Matrix

Now we need to decompose the covariance matrix to get the eigenvector and the value. This is done using a single vector decomposition. I won’t be going in the details of svd as it is out of scope for this article. Though it is important to note that there are functions in the popular programming languages like MATLAB, python to compute the svd. In the octave, you can get the eigenvalue and vector using the svd() function.

[U, S, V] = svd()

Where:

  • Σ: Covariance matrix (sigma)
  • U: Matrix containing the eigenvectors
  • S: Diagonal matrix containing the eigenvalues.

4. Choose the first k eigenvectors where k is the required dimension

Once we have the matrix containing the eigenvectors we can select the first k columns of it. Where k Is the required dimension.

Ureduce = U(:,1:k)

Ureduce is the matrix of the eigenvector that is needed to perform the data compression.

5. Transform the datapoints into k dimension

To transform the dataset X n*m from n dimension to k dimension by taking the transpose of the Ureduce matrix and then multiply it with the dataset.

z = UreduceT * x;

where:

  • z: Its new features
  • As a result, you will get a dataset of dimension m*k from m*n.

Properties of Principal Component Analysis

Following are the list of properties that are PCA possess:

  1. It transforms high dimensional data set into a low dimension before using it for training the model.
  2. Principle components of PCA are the linear combinations of the original features, the eigenvector found from the covariance matrix satisfies the principle of least square.
  3. It helps in determining the features that are needed to explain the covariation.

Advantages of Principal Component Analysis

PCA’s main advantage is its low sensitivity towards noise due to dimensionality reduction. Following are the other advantages of PCA:

  1. It speeds up the learning algorithm.
  2. Increases Efficiency due to lower dimensions.
  3. It reduces the space required to store the data.
  4. Data Visualization
  5. Low sensitivity towards the noise.
  6. Removes redundant variables

Conclusion

PCA method is a part of the unsupervised machine learning technique. These techniques are most suitable for images that have no class labels. Introduction to PCA and its working has been provided. And as mentioned above the advantages of the PCA have also been discussed in this article.

Recommended Articles

This is a guide to Principal Component Analysis. Here we discuss why do we need PCA, how does it work with appropriate steps involved along with advantages of Principal Components. You can also go through our other related articles to learn more–

  1. Matlab Commands
  2. Multidimensional Array in C
  3. Statistical Analysis Softwares
  4. Statistical Analysis Regression

All in One Data Science Bundle (360+ Courses, 50+ projects)

360+ Online Courses

1500+ Hours

Verifiable Certificates

Lifetime Access

Learn More

0 Shares
Share
Tweet
Share
Primary Sidebar
Artificial Intelligence Tutorial
  • Basics
    • Introduction to Artificial Intelligence
    • What is Artificial Intelligence
    • Careers in Artificial Intelligence
    • Future of Artificial Intelligence
    • Uses of Artificial Intelligence
    • Artificial Intelligence Ethics
    • Types of Artificial Intelligence
    • Artificial Intelligence Tools & Applications
    • Artificial Intelligence Applications
    • Advantages of Artificial Intelligence
    • Artificial Intelligence Tools
    • Benefits of Artificial Intelligence
    • Artificial Intelligence Companies
    • Artificial Intelligence Techniques
    • Artificial Intelligence Software
    • How Artificial Intelligence Works
    • Importance of Artificial Intelligence
    • Subsets of Artificial Intelligence
    • Artificial Intelligence Problems
    • Artificial Intelligence Technology
    • Application of Neural Network
    • Applications of NLP
    • Global Positioning Systems
    • Production System in AI
    • Agents in Artificial Intelligence
    • Intelligent Agent in AI
    • Artificial Intelligence Algorithm
    • Search Algorithms in AI
    • Informed Search
    • Bidirectional Search
    • Adversarial Search
    • Uninformed Search
    • Uniform Cost Search
    • Hill Climbing in Artificial Intelligence
    • Propositional Logic in AI
    • Minimax Algorithm
    • Applications of Fuzzy Logic
    • Fuzzy Logic System
    • Implementation of Neural Networks
    • Turing Test in AI
    • Recurrent Neural Networks (RNN)
    • Spiking Neural Network
    • Feedforward Neural Networks
    • Probabilistic Neural Network
    • Overfitting Neural Network
    • Means-Ends Analysis
    • DNN Neural Network
    • Principal Component Analysis
    • Artificial Intelligence Interview
  • Pattern Recognition
    • Pattern Recognition
    • Pattern Recognition Algorithms
    • Forensic Tools
    • PRTools
    • Pattern Recognition Applications

Related Courses

Artificial Intelligence Training Courses

All One Data Science Training Courses

Machine Learning Course

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2020 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA Login

Forgot Password?

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you
Book Your One Instructor : One Learner Free Class

Let’s Get Started

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

Special Offer - All in One Data Science Bundle (360+ Courses, 50+ projects) Learn More