EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

Linear Regression Analysis

Home » Data Science » Data Science Tutorials » Machine Learning Tutorial » Linear Regression Analysis

Linear-Regression-Analysis

Introduction to Linear Regression Analysis

Linear Regression analysis is among the most widely used statistical analysis technique as it involves the study of additive and linear relationship between single and multiple variables techniques. The analysis using single variable is termed as the simple linear analysis while with multiple variables are termed as multiple linear analysis. Basically, in linear regression analysis we try to figure out the relationship of the independent and the dependent variables and that’s why it has multiple advantages such as, being simple and powerful, in making better business decisions, etc.

The 3 Types of Regression Analysis

These three Regression analyses have maximum use cases in the real world otherwise there are more than 15 types of regression analysis. Types of regression analysis which we are going to discuss are:

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

  1. Linear Regression Analysis
  2. Multiple Linear Regression Analysis
  3. Logistic Regression

In this article, we will focus on Simple Linear Regression analysis. This analysis helps us to identify the relationship between the independent factor and the dependent factor. In simpler words, the Regression model helps us to find that how the changes in the independent factor affect the dependent factor. This model helps us in multiple ways like:

  • It is a simple and powerful statistical model
  • It will help us in making prediction and forecast
  • It will help us to make a better business decision
  • It will help us to analyze the results and correcting errors

The Equation of Linear Regression and Split it into relevant parts

Y = β1 + β2X + ϵ
  • Where β1 in the mathematical terminology known as intercept and β2 in the mathematical terminology known as a slope. They are also known as regression coefficients. ϵ is the error term, it is the part of Y the regression model is unable to explain.
  • Y is a dependent variable (other terms which are interchangeably used for dependent variables are response variable, regressand, measured variable, observed variable, responding variable, explained variable, outcome variable, experimental variable, and/or output variable).
  • X is an independent variable (regressors, controlled variable, manipulated a variable, explanatory variable, exposure variable, and/or input variable).

linear regression graph

Problem: For understanding what is linear regression analysis we are taking the “Cars” dataset which comes by default in R directories. In this dataset, there are 50 observations (basically rows) and 2 variables (columns). Columns names are “Dist” and “Speed”. Here we have to see the impact on distance variables due to change speed variables. To see the structure of the data we can run a code Str(dataset). This code helps us to understand the structure of the dataset. These functionalities help us to make better decisions because we have a better picture in our mind about the dataset structure. This code helps us to identify the type of datasets.

Code:

regression code

Similarly to check the statistics checkpoints of the dataset we can use code Summary(cars). This Code provides mean, median, range of the dataset in a go, which the researcher can use while dealing with the problem.

Output:

regression output

Here we can see the statistical output of every variable we have in our dataset.

Popular Course in this category
Statistical Analysis Training (10 Courses, 5+ Projects)10 Online Courses | 5 Hands-on Projects | 126+ Hours | Verifiable Certificate of Completion | Lifetime Access
4.5 (5,715 ratings)
Course Price

View Course

Related Courses
Machine Learning Training (17 Courses, 27+ Projects)Deep Learning Training (15 Courses, 24+ Projects)Artificial Intelligence Training (3 Courses, 2 Project)

The Graphical Representation of Datasets

Types of Graphical representation which will cover here are and why:

  • Scatter Plot: With the help of the graph, we can see in which direction our linear regression model is going, whether there is any strong evidence to prove our model or not.
  • Box Plot: Helps us to find outliers.
  • Density Plot: Help us to understand the distribution of the independent variable, in our case, the independent variable is “Speed”.

Advantages of Graphical Representation

Here the following advantages are as follows:

  • Easy to understand
  • Helps us to take quick decision
  • Comparative analysis
  • Less Effort and time

1. Scatter Plot: It will help to visualize any relationships between the independent variable and the dependent variable.

Code:

scatter plot

Output:

scatter plot (Linear Regression Analysis)

We can see from the graph a linearly increasing relationship between the dependent variable (Distance) and the independent variable (Speed).

2. Box Plot: Box plot helps us to identify the outliers in the datasets. Advantages of using a box plot are:

  • Graphical display of variables location and spread.
  • It helps us to understand the data’s skewness and symmetry.

Code:

boxplot

Output:

boxplot (Linear Regression Analysis)

3. Density Plot (to check the normality of the distribution)

Code:

density plotOutput:

density plot (Linear Regression Analysis)

Correlation Analysis

This Analysis helps us to find the relationship between the variables. There are mainly six types of correlation analysis.

  1. Positive Correlation (0.01 to 0.99)
  2. Negative Correlation (-0.99 to -0.01)
  3. No Correlation
  4. Perfect Correlation
  5. Strong Correlation (a value closer to ± 0.99)
  6. Weak Correlation (a value closer to 0)

Scatter plot helps us to identify which types of correlation datasets have among them and the code for finding the correlation is

correlation code

Output:

correlation output

Here we have a strong positive correlation between Speed and Distance, which means they have a direct relationship among them.

Linear Regression Model

This is the core component of the analysis, earlier we were just trying and testing things whether the dataset we have is logical enough to run such analysis or not. The function we are planning to use is lm(). This function contains two elements which are Formula and Data. Before assigning that which variable is dependent or independent we have to be very sure about that because our whole formula depends on that.

The formula looks like this,

Linear Regression <- lm(Dependent Variable ~ Independent Variable, data=Date.Frame)

Code:

liner regression model

Output:

liner regression model2

As we can recall from the above segment of the article the equation of linear regression is:

Y = β1 + β2X + ϵ

Now we will fit in the information which we got from the above code in this equation.

dist = −17.579 + 3.932∗speed

Only finding the equation of linear regression is not sufficient we have to check its statistic significant also. For this, we have to pass a code “Summary” on our linear regression model.

Code:

summary linear regression

Output:

summary linear regression output

There are multiple ways of checking the statistic significant of a model, here we are using the P-value method. We can consider a model statistically fit when the P-value is less than the pre-determined statistical significant level, which is ideally 0.05. We can see in our table of summary(linear_regression) that P-value is below 0.05 level, so we can conclude that our model is statistically significant. Once we are sure about our model, we can use our dataset to predict things.

Recommended Articles

This is a guide to Linear Regression Analysis. Here we discuss the three types of Linear Regression Analysis, the Graphical Representation of Datasets with advantages and linear regression models. You can also go through our other related articles to learn more-

  1. Regression Formula
  2. Linear Regression in R
  3. Types of Data Analysis Techniques
  4. What is Regression Analysis?
  5. Top Differences of Regression vs Classification
  6. Top 6 Differences of Linear Regression vs Logistic Regression
  7. Guide to Simple Linear Regression in R
  8. What is Linear Regression?

Statistical Analysis Training (10 Courses, 5+ Projects)

10 Online Courses

5 Hands-on Projects

126+ Hours

Verifiable Certificate of Completion

Lifetime Access

Learn More

0 Shares
Share
Tweet
Share
Primary Sidebar
Machine Learning Tutorial
  • Supervised
    • What is Supervised Learning
    • Supervised Machine Learning
    • Supervised Machine Learning Algorithms
    • Perceptron Learning Algorithm
    • Simple Linear Regression
    • Polynomial Regression
    • Multivariate Regression
    • Regression in Machine Learning
    • Hierarchical Clustering Analysis
    • Linear Regression Analysis
    • Support Vector Regression
    • Linear Regression Modeling
    • Multiple Linear Regression
    • Linear Algebra in Machine Learning
    • Statistics for Machine Learning
    • What is Regression Analysis?
    • Linear Regression Analysis
    • Clustering Methods
    • Backward Elimination
    • Ensemble Techniques
    • Bagging and Boosting
    • Linear Regression Modeling
    • What is Reinforcement Learning
  • Basic
    • Introduction To Machine Learning
    • What is Machine Learning?
    • Uses of Machine Learning
    • Applications of Machine Learning
    • Careers in Machine Learning
    • What is Machine Cycle?
    • Machine Learning Feature
    • Machine Learning Programming Languages
    • Machine Learning Tools
    • Machine Learning Models
    • Machine Learning Platform
    • Machine Learning Libraries
    • Machine Learning Life Cycle
    • Machine Learning System
    • Machine Learning Datasets
    • Types of Machine Learning
    • Machine Learning Methods
    • Machine Learning Software
    • Machine Learning Techniques
    • Machine Learning Feature Selection
    • Ensemble Methods in Machine Learning
    • Decision Making Techniques
    • Restricted Boltzmann Machine
    • Regularization Machine Learning
    • What is Regression?
    • What is Linear Regression?
    • What is Decision Tree?
    • What is Random Forest
  • Algorithms
    • Machine Learning Algorithms
    • Types of Machine Learning Algorithms
    • Bayes Theorem
    • AdaBoost Algorithm
    • Classification Algorithms
    • Clustering Algorithm
    • Gradient Boosting Algorithm
    • Mean Shift Algorithm
    • Hierarchical Clustering Algorithm
    • What is a Greedy Algorithm?
    • What is Genetic Algorithm?
    • Random Forest Algorithm
    • Nearest Neighbors Algorithm
    • Weak Law of Large Numbers
    • Ray Tracing Algorithm
    • SVM Algorithm
    • Naive Bayes Algorithm
    • Neural Network Algorithms
    • Boosting Algorithm
    • XGBoost Algorithm
    • Pattern Searching
    • Loss Functions in Machine Learning
    • Decision Tree in Machine Learning
    • Hyperparameter Machine Learning
    • Unsupervised Machine Learning
    • K- Means Clustering Algorithm
    • KNN Algorithm
    • Monty Hall Problem
  • Classification
    • Kernel Methods in Machine Learning
    • Clustering in Machine Learning
    • Machine Learning Architecture
    • Machine Learning C++ Library
    • Machine Learning Frameworks
    • Data Preprocessing in Machine Learning
    • Data Science Machine Learning
    • Classification of Neural Network
    • Neural Network Machine Learning
    • What is Convolutional Neural Network?
    • Single Layer Neural Network
    • Kernel Methods
    • Forward and Backward Chaining
    • Forward Chaining
    • Backward Chaining
  • Deep Learning
    • What Is Deep learning
    • Deep Learning
    • Application of Deep Learning
    • Careers in Deep Learnings
    • Deep Learning Frameworks
    • Deep Learning Model
    • Deep Learning Algorithms
    • Deep Learning Technique
    • Deep Learning Networks
    • Deep Learning Libraries
    • Deep Learning Toolbox
    • Types of Neural Networks
    • Convolutional Neural Networks
    • Create Decision Tree
    • Deep Learning for NLP
    • Caffe Deep Learning
    • Deep Learning with TensorFlow
  • RPA
    • What is RPA
    • What is Robotics?
    • Benefits of RPA
    • RPA Applications
    • Types of Robots
    • RPA Tools
    • Line Follower Robot
    • What is Blue Prism?
    • RPA vs BPM
  • Pytorch
    • PyTorch Versions
    • Single Layer Perceptron
    • PyTorch vs Keras
    • torch.nn Module
  • UiPath
    • What is UiPath
    • UiPath Careers
    • UiPath Architecture
    • UiPath Orchestrator
    • Uipath Reframework
    • UiPath Studio
  • Interview Questions
    • Machine Learning Interview Questions
    • Deep Learning Interview Questions And Answer
    • Machine Learning Cheat Sheet

Related Courses

Machine Learning Training

Deep Learning Training

Artificial Intelligence Training

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2020 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you
Book Your One Instructor : One Learner Free Class

Let’s Get Started

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA Login

Forgot Password?

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

Special Offer - Statistical Analysis Training (10 Courses, 5+ Projects) Learn More