Introduction to Data Analysis
Data analysis is the process of capturing the useful information by inspecting, cleansing, transforming and modeling the dataset, methodologies involved in doing so can be categorized as Descriptive Analysis(it get the insight of the data numerically), Exploratory Analysis( it get the insight of the data visually), Predictive Analysis( it gets the insight of the data using historic events) and Inferential Analysis(this involves getting the insight of the population by obtaining the information from the sample).
Types of Data Analysis
Based on the methodologies used data analysis can be divided into the following four parts:
- Descriptive Analysis
- Exploratory Data Analysis
- Predictive Analysis
- Inferential Analysis
1. Descriptive Analysis
Descriptive analysis is the numerical way to get insights about the data. In the descriptive analysis, we get a summarized value of the numerical variables. Suppose you are analyzing the sales data of a car manufacturer. In the literature of descriptive analysis, you will seek questions like what is the mean, mode of the selling price of a car type, what was the revenue incurred by selling a particular type of car, etc. We can get the central tendency and the dispersion of the numerical variables of the data using this type of analysis. In most of the practical data science use cases, descriptive analysis will help you to get the high-level information of the data and getting used to the data set. Important terminologies of the descriptive analysis are:
- Mean (average of all numbers in a list of numbers)
- Mode (most frequent number in a list of numbers)
- Median (middle value of a list of numbers)
- Standard deviation (amount of variation of a set of values from the mean value)
- Variance (square of standard deviation)
- Inter Quartile Range (values between 25 and 75 percentile of a list of numbers)
In python, pandas library provides a method called ‘describe’, which provides descriptive information about the data frame. We also use other libraries like stats model or can develop our code as per use case.
2. Exploratory Data Analysis
In contrast to descriptive data analysis where we analyze the data numerically, exploratory data analysis is the visual way to analyze the data. Once we have a basic understanding of the data at hand through descriptive analysis, we will move to exploratory data analysis. We can also divide the exploratory data analysis into two parts:
- Uni variate analysis (exploring characteristic of a single variable)
- Multivariate analysis (comparative analysis of multiple variables, if we compare the correlation of two variables, it is called bivariate analysis)
In the visual way of data analysis, we use various kinds of plots and graphs for analyzing data. To analyze a single variable (univariate analysis) we can use a bar plot, histograms, box plot with whisker, violin plot, etc. For multivariate analysis, we use the scatter plot, contour plots, multi-dimensional plots, etc.
But why we need Exploratory Data Analysis?
- Exploratory data analysis gives a visual way to describe the data, which helps to identify the characteristics of the data more clearly.
- It helps us to identify which features are more important. This is particularly useful when we deal with high dimensional data. (i.e methods like PCA and t-SNE helps in dimensionality reduction).
- It is an effective way to explain the incurred result to executives and non-technical stack holders.
In python, there are many libraries to perform exploratory data analysis. Matplotlib, Seaborn, Plotly, Bokeh, etc. are the most popular among these.
3. Predictive Analysis
What happens if we know the mistakes we will make in the future in advance? We will try to avoid those right? Predictive analysis is nothing but the most scientific way to predict future outcomes by analyzing historical events. The heart of data science is based on predictive analysis. Predictive analysis helps us to answer the following questions: ‘Can we predict if a buyer will purchase a specific product or not?’ Or ‘Can we estimate the total cost an Insurer has to pay for the claims? ‘Or ‘Can we estimate the amount of rainfall in upcoming monsoon?’
Predictive analysis helps us to give the approximated or most likely outcome of the important questions that then results in massive scaled business and socio-economical changes. Machine learning models are developed based on historical data to predict the outcome of similar unseen future events.
4. Inferential Analysis
Inferential analysis is the literature of data science, while we predict the referential outcome for multiple sectors. For example, deriving the consumer price index or per capita income. It is not feasible to reach each consumer one by one and calculate. Instead of this we scientifically take samples from the population and by the help of statistical analysis, we derive the index.
In this article, we have discussed the various methodologies of data analysis. Do we need to use all of these methods or we can use any of them? Well, now it is based on the use case and domain of the application. But in most cases, we will start with descriptive and exploratory data analysis and develop predictive models to predict future outcomes.
This is a guide to Types of Data Analysis. Here we discuss a brief overview of Data Analysis and various methodologies based on the use case and domain of the application. You can also go through our suggested articles to learn more –