EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

What is Data Analytics

By Priya PedamkarPriya Pedamkar

Home » Data Science » Data Science Tutorials » Big Data Tutorial » What is Data Analytics

What is Data Analytics

Introduction to Data Analytics

Data analytics is the science of raw data analysis to draw conclusions about it. Data Analytics refers to the techniques for analyzing data for improving productivity and the profit of the business. Data is extracted and cleaned from different sources to analyze various patterns. Many data analytics techniques and processes are automated into mechanical processes and algorithms which handle raw data for human consumption

Types of Data Analytics

The Data Analytics Process is subjectively categorized into three types based on the purpose of analyzing data as

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

  • Descriptive Analytics
  • Predictive Analytics
  • Prescriptive Analytics

The features of the above-listed types of Analytics are described below:

1. Descriptive Analytics

Descriptive Analytics focuses on summarizing past data to derive inferences. Most commonly used measures to characterize historical data distribution quantitatively includes

  • Measures of Central Tendency – Mean, Median, Quartiles, Mode.
  • Measures of variability or spread – Range, Inter-Quartile Range, Percentiles.

In recent times, the difficulties and limitations involved to collect, store and comprehend massive data heaps are overcome with the statistical inference process. Generalized inferences about population dataset statistics are deduced by using sampling methods along with the application of central limiting theory.

A leading news broadcaster gathers casted vote details of randomly chosen voters at the exit of a poll station on the election day to derive statistical inferences about the preferences of the entire population.

Popular Course in this category
All in One Data Science Bundle (360+ Courses, 50+ projects)360+ Online Courses | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (3,220 ratings)
Course Price

View Course

Related Courses
Hadoop Training Program (20 Courses, 14+ Projects, 4 Quizzes)MapReduce Training (2 Courses, 4+ Projects)Splunk Training Program (4 Courses, 7+ Projects)Apache Pig Training (2 Courses, 4+ Projects)

Repeated sampling of population dataset results in chunks of samples with a sufficiently large sample size. Clustered sampling is generally preferred to generate well-stratified, unbiased representatives of the population dataset. The statistical measure of interest is calculated on the sampled data chunks to obtain a distribution of sample statistic values called a sampling distribution. The characteristics of sampling distribution are related to that of the population dataset using the central limiting theory.

2. Predictive Analytics

Predictive Analytics exploits patterns in historical or past data to estimate future outcomes, identify trends, uncover potential risks and opportunities or forecast process behavior. As Prediction use-cases are plausible in nature, these approaches employ probabilistic models to measure the likelihood of all possible outcomes.

The chatBot in Customer Service Portal of financial firm pro-actively learns the customers’ intent or need to be based on his/her past activities in its web domain. With the predicted context, chatBot interactively converses with the customer to deliver apt services quickly and achieve better customer satisfaction.

In addition to the extrapolation scenarios to predict what happens in the future based on available past data, there are few applications that guess missed data entries with help of available data samples. This approximation of missed values within the range of given data samples is technically referred to as Interpolation.

A powerful image editor application supports reconstructing missed parts of texture due to super-imposed text by interpolating feature function at the missed block. Feature function can be interpreted as a mathematical notation of patterns in the texture of a distorted image.

The significant factors that influence the choice of predictive models/strategies are:

  • Prediction Accuracy: That conveys the degree of closeness between a predicted value and actual value. A lower variance of the difference between the predicted value and actual value implies a higher predictive model’s accuracy.
  • Speed of Predictions: It is prioritized high in real-time tracking applications
  • Model Learning Rate: It depends on the model’s complexity and computations involved in calculating model parameters.

3. Prescriptive Analytics

Prescriptive Analytics uses knowledge discovered as a part of both descriptive and predictive analysis to recommend a context-aware course of actions. Advanced statistical techniques and computational-intensive optimization methods are implemented to understand the distribution of estimated predictions.

On precise terms, impact and benefit of each outcome, that are estimated during predictive analytics,  is evaluated to make heuristic and time-sensitive decisions for a given set of conditions.

A Stock market consultancy firm performs SWOT (Strength, Weakness, Opportunities, and Threat) analysis on predicted prices for stocks in investors’ portfolio and recommends the best Buy-Sell options to its clients.

Process Flow in Data Analytics

The process of data analytics have various stages of data processing as explained below:

1. Data Extraction

Data ingestion from multiple data sources of various types, including web pages, databases, legacy applications, results in input datasets of different formats. The data formats inputted to the data analytics flow can be broadly classified as

  • Structured data have a clear definition of data-types along with associated field length or field delimiters. This type of data can be easily queried like the content stored in the Relational Database (RDBMS)
  • Semi-structured data lack precise layout definition but data elements can be identified, separated and grouped based on a standard schema or other metadata rules. An XML file employs tagging to hold data whereas the Javascript object Notation file (JSON) holds data in name-value pairs. NoSQL (Not only SQL) databases like MongoDB, but couch base are also used to store semi-structured data.
  • Unstructured data includes social media conversations, images, audio clips etc. Traditional data parsing methods fail to understand this data. Unstructured data is stored in data lakes.

Implementation of data parsing for structured and semi-structured data is incorporated in various ETL tools like Ab Initio, Informatica, Datastage and open source alternatives like Talend.

2. Data Cleaning and Transformation

Cleaning of parsed data is done to ensure data consistency and availability of relevant data for the later stages in a process flow. The major cleansing operations in Data analytics are:

  • Detection and elimination of outliers in the data volumes
  • Removing duplicates in the dataset
  • Handling missing entries in data records with the understanding of functionality or use-cases
  • Validations for permissible field values in data records like “31-February” cannot be a valid value in any of date fields.

Cleansed data is transformed into a suitable format to analyze data. Data transformations include

  • A filter of unwanted data records.
  • Joining the data fetched from different sources.
  • Aggregation or grouping of data
  • Data typecasting

3. KPI/Insight derivation

Data Mining, Deep learning methods are used to evaluate Key Performance Indicators(KPI) or derive valuable insights from the cleaned and transformed data. Based on the objective of analytics, data analysis is performed using various pattern recognition techniques like k-means clustering, SVM classification, Bayesian classifiers etc.. and machine learning models like Markov models, Gaussian Mixture Models(GMM) etc..

Probabilistic models in the training phase learn optimal model parameters and in the validation phase, the model is tested using k-fold cross validation testing to avoid over-fitting and under-fitting errors.

Most commonly used programming language for data analysis are R and Python. Both have a rich set of libraries (SciPy, NumPy, Pandas) that are open-sourced to perform complex data analysis.

4. Data Visualization

Data visualization is the process of clear and effective presentation of uncovered patterns, derived conclusions from the data using graphs, plots, dashboards, and graphics.

  • Data reporting tools like QlikView, Tableau etc., display KPI and other derived metrics at various levels of granularity.
  • Reporting tools enable end users to create customized reports with pivot, drill-down options using user-friendly drag and drop interfaces
  • Interactive data visualization libraries like D3.js (Data-driven documents), HTML5-Anycharts etc.. are used to increase the ability to explore analyzed data

Recommended Articles

This has been a guide to What is Data Analytics. Here we discussed the different type of data analytics with the process flow. You can also go through other suggested articles to learn more –

  1. Data Analyst Interview Questions and Answers
  2. What is Data Visualization?
  3. What is Big data analytics?
  4. What is Minitab?

All in One Data Science Bundle (360+ Courses, 50+ projects)

360+ Online Courses

1500+ Hours

Verifiable Certificates

Lifetime Access

Learn More

7 Shares
Share
Tweet
Share
Primary Sidebar
Big Data Tutorial
  • Big data and analytics
    • What is Big data analytics
    • What is Data Analysis
    • What is Data Analyst
    • What is Data Analytics
    • Careers in Data Analytics
    • Data Analysis Process
    • Who is a Data Scientist
    • What is Data Visualization
    • Types of Data Visualization
    • Types of Qualitative Data
    • Secondary Data Analysis
    • Data Visualization Tools
    • Benefits of Data Visualization
    • Best Data Visualization Tools
    • What is a Data Scientist?
    • What do Data Scientists Do
    • Skills Required for Data Scientist
    • Data Scientist Skills
    • How to Become a Data Scientist
    • Data Analyst Associate
    • Big Data Analytics
    • Big Data Analytics Examples
    • Big Data Analytics Jobs
    • Customer Data
    • Big Data Analytics Salary
    • Big Data Analytics Software
    • Big Data Analytics Techniques
    • Big Data Analytics Tools
    • Data Analysis Techniques
    • Data Analysis Software
    • Data Quality Tools
    • Data Analysis Tools
    • Data Analysis Tools Research
    • Types of Data Analysis
    • Types of Quantitative Research
    • What is Qualitative Data Analysis
    • Free Data Analysis Tools
    • Data Analytics Trends in 2019
    • Types of Data Analysis Techniques
    • Data Analytics Interview Questions
    • Data Analyst Interview Questions
  • Big Data Basics
    • Introduction To Big Data
    • What is Big Data
    • Big Data Architecture
    • Big data Concepts
    • Careers in Big Data
    • Is Big Data a Database
    • Trends Of Big Data
    • Big Data Technologies
    • Big Data Programming Languages
    • Challenges of Big Data Analytics
    • What is Big Data Technology
    • Most Critical Aspect of Big Data
    • What is Big data and Hadoop
    • What Is NOSQL
    • Big Data Techniques
    • Big Data in Banking
    • Big Data interview questions
  • Statistical Analysis
    • Statistical Analysis
    • Statistical Analysis Types
    • Statistical Analysis Softwares
    • Free Statistical Analysis Software in the market
    • Types of Data in Statistics
    • Statistical Analysis Tools
    • Statistical Data Analysis Techniques
    • Statistical Analysis Methods
    • Exploratory Data Analysis
    • Statistical Analysis Regression

Related Courses

Hadoop Certification Training

MapReduce Training

Splunk Training Certification

Apache Pig Training

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2020 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA Login

Forgot Password?

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you
Book Your One Instructor : One Learner Free Class

Let’s Get Started

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

Special Offer - All in One Data Science Bundle (360+ Courses, 50+ projects) Learn More