EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

Exploratory Data Analysis

By Priya PedamkarPriya Pedamkar

Home » Data Science » Data Science Tutorials » Big Data Tutorial » Exploratory Data Analysis

exploratory data analysis

What is Exploratory Data Analysis?

Exploratory Data Analysis is a basic data analysis technique that is acronymic as EDA in the analytics industry. EDA is associated with several concepts and best practices that are applied at the initial phase of the analytics project. EDA is associated with graphical visualization techniques to identify data patterns and comparative data analysis. EDA is a preferred technique for feature engineering and feature selection processes for data science projects. Some of the widely used EDA techniques are univariate analysis, bivariate analysis, multivariate analysis, bar chart, box plot, pie carat, line graph, frequency table, histogram, and scatter plots. EDA is very useful for the data preparation phase for which will complement the machine learning models.

How Exploratory Data Analysis is Performed?

Let us see how the exploratory data analysis is performed:

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

1. Univariate Analysis

‘Uni’ means ‘One’. As the name suggests univariate analysis is the data analysis where only a single variable is involved. The variable can be either a ‘Categorical’ variable or ‘Numerical’ variable. Let us discuss the most commonly used graphical methods used for exploratory data analysis of univariate analysis.

2. Frequency Table and Count Plot

Frequency tables or count plots are used to identify the frequency or how many times a value occurs. For example, we are tossing an unbiased coin 5 times (H, T, H, H, T). The frequency or count of the head here is 3. Let us see how the count plot looks from a movie review data set. Count plot is also referred to as a bar plot because of the rectangular bars. We generate bar plot in python using the Seaborn library

Count plot of Movie Review Dataset

Exploratory Data Analysis 1

Source Link: https://stackoverflow.com/questions/48043365/how-to-improve-this-seaborn-countplot

3. Histograms

Histograms are the smoothen version of Kernel density estimation. Histograms help us to get knowledge about the underlying distribution of the data. For example, the preprocessing methodologies of a normal (bell-shaped curve) distribution will significantly different than other skewed distribution like Pareto distribution. Let’s see how the distribution of flight arrival displays in the form of a histogram.

histogram 1

Source: Wikipedia

4. Pie Chart

A pie chart is a circle which is divided into parts based on the relative count or frequency of a sample or population. Suppose we want to compare the relative performance or sales or multiple products, a pie chart is a useful graphical way to visualize it.

Exploratory-Data-Analysis1

5. Box Plot with Whisker and Violin Plot

Box plot with whisker is used to graphically display the 25-50-75 percentile values of the variable. Box plot gives us a clear picture of where 50%, 25%, or 95% of the values lie in our data. Violin plot is the enhanced plot of boxplot which includes some more information (distribution of the variable) of the variable. Let us show how the boxplot and violin plot looks.

violin plot

Source: Wikipedia

6. Multivariate Analysis

Multivariate analysis is the methodology of comparative analysis between multiple variables. If we compare the two variables it is called bi-variate analysis. Scatter plots, contour plots, multivariate probability density plots are the most commonly used graphical methods to analyze multi-dimensional data. Let us show how a scatter plot looks like.

Popular Course in this category
Hadoop Training Program (20 Courses, 14+ Projects, 4 Quizzes)20 Online Courses | 14 Hands-on Projects | 135+ Hours | Verifiable Certificate of Completion | Lifetime Access | 4 Quizzes with Solutions
4.5 (6,087 ratings)
Course Price

View Course

Related Courses
MapReduce Training (2 Courses, 4+ Projects)Splunk Training Program (4 Courses, 7+ Projects)Apache Pig Training (2 Courses, 4+ Projects)

Exploratory-Data-Analysis4

Advantages and Disadvantages of Exploratory Data Analysis

Below are given the advantages and disadvantages of Exploratory Data Analysis:

Advantages of EDA

  • It gives us valuable insights into the data.
  • It helps us with feature selection (i.e using PCA)
  • Visualization is an effective way of detecting outliers.

Disadvantages of EDA

  • If not perform properly EDA can misguide a problem.
  • EDA does not effective when we deal with high-dimensional data.

Applications of Exploratory Data Analysis

Let’s analyze the applications of Exploratory Data Analysis with a use case of univariate analysis where we will seek the measurement of the central tendency of the data:

  • Measurement of central tendency gives us an overview of the univariate variable. Central tendency is the measurement of Mean, Median, and Mode. Mean is the simple average where the median is the 50% percentile and Mode is the most frequently occurring value. Suppose we want the get the knowledge about the salary of a data scientist. Also, suppose we have carefully collected data of the data scientist with similar expertise and experience range.
  • Now if we want to get the average it is simply the total salary of all the data scientists of the sample divided by the number of data scientists in the sample or population. But if you think carefully the average salary is not a proper term because in the presence of some extreme values the result will be skewed. Suppose for maximum cases the salary is between 8-10 LPA and for one or two cases it is 32 LPA. Now adding all these the average will be skewed. Median is more suitable for such situations, it is more robust to outliers.

Conclusion

In this article, we have discussed the various methodologies involved in exploratory data analysis, the applications, advantages, and disadvantages it. We also walked through the sample codes to generate the plots in python using seaborn and  Matplotlib libraries.  EDA is the art part of data science literature which helps to get valuable insights and visualize the data.

Recommended Articles

This is a guide to Exploratory Data Analysis. Here we discuss the Introduction to EDA, how Exploratory Data Analysis is Performed? along with applications of EDA and the advantages and disadvantages. You can also go through our other suggested articles –

  1. What is Data Analysis?
  2. Data Analysis Techniques
  3. Data Analysis Tools
  4. NoSQL Data Models

All in One Data Science Bundle (360+ Courses, 50+ projects)

360+ Online Courses

50+ projects

1500+ Hours

Verifiable Certificates

Lifetime Access

Learn More

0 Shares
Share
Tweet
Share
Primary Sidebar
Big Data Tutorial
  • Statistical Analysis
    • Statistical Analysis
    • Statistical Analysis Types
    • Statistical Analysis Softwares
    • Free Statistical Analysis Software in the market
    • Types of Data in Statistics
    • Statistical Analysis Tools
    • Statistical Data Analysis Techniques
    • Statistical Analysis Methods
    • Exploratory Data Analysis
    • Statistical Analysis Regression
  • Big Data Basics
    • Introduction To Big Data
    • What is Big Data
    • Big Data Architecture
    • Big data Concepts
    • Careers in Big Data
    • Is Big Data a Database
    • Trends Of Big Data
    • Big Data Technologies
    • Big Data Programming Languages
    • Challenges of Big Data Analytics
    • What is Big Data Technology
    • Most Critical Aspect of Big Data
    • What is Big data and Hadoop
    • What Is NOSQL
    • Big Data Techniques
    • Big Data in Banking
    • Big Data interview questions
  • Big data and analytics
    • What is Big data analytics
    • What is Data Analysis
    • What is Data Analyst
    • What is Data Analytics
    • Careers in Data Analytics
    • Data Analysis Process
    • Who is a Data Scientist
    • What is Data Visualization
    • Types of Data Visualization
    • Types of Qualitative Data
    • Secondary Data Analysis
    • Data Visualization Tools
    • Benefits of Data Visualization
    • Best Data Visualization Tools
    • What is a Data Scientist?
    • What do Data Scientists Do
    • Skills Required for Data Scientist
    • Data Scientist Skills
    • How to Become a Data Scientist
    • Data Analyst Associate
    • Big Data Analytics
    • Big Data Analytics Examples
    • Big Data Analytics Jobs
    • Customer Data
    • Big Data Analytics Salary
    • Big Data Analytics Software
    • Big Data Analytics Techniques
    • Big Data Analytics Tools
    • Data Analysis Techniques
    • Data Analysis Software
    • Data Quality Tools
    • Data Analysis Tools
    • Data Analysis Tools Research
    • Types of Data Analysis
    • Types of Quantitative Research
    • What is Qualitative Data Analysis
    • Free Data Analysis Tools
    • Data Analytics Trends in 2019
    • Types of Data Analysis Techniques
    • Data Analytics Interview Questions
    • Data Analyst Interview Questions

Related Courses

Hadoop Certification Training

MapReduce Training

Splunk Training Certification

Apache Pig Training

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2020 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA Login

Forgot Password?

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you
Book Your One Instructor : One Learner Free Class

Let’s Get Started

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

Special Offer - Hadoop Certification Training Learn More