EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

Simple Linear Regression

Home » Data Science » Data Science Tutorials » Machine Learning Tutorial » Simple Linear Regression

simple linear regressions

Introduction to Simple Linear Regression

From Dictionary: A return to a former or less developed state.

In Statistics: A measure of the relation between the mean value of one variable and corresponding values of the other variables.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

The regression, in which the relationship between the input variable (independent variable) and target variable (dependent variable) is considered linear is called Linear regression. Simple Linear Regression is a type of linear regression where we have only one independent variable to predict the dependent variable. Simple Linear Regression is one of the machine learning algorithms. Simple linear regression belongs to the family of Supervised Learning. Regression is used for predicting continuous values.

Model of Simple Linear Regression

Let’s make it simple. How it all started?

It all started in 1800 with Francis Galton. He studied the relationship in height between fathers and their sons. He observed a pattern: Either son’s height would be as tall as his father’s height or the son’s height will tend to be closer to the overall avg height of all people. This phenomenon is nothing but regression.

For Example, Shaq O’Neal is a very famous NBA player and is 2.16 meters tall. His sons Shaqir and Shareef O’neal are 1.96 meters and 2.06 meters tall respectively. The average population height is 1.76 meters. Son’s height regress (drift toward) the mean height.

How we do Regression?

Calculating a regression with only two data points:

Simple Linear Regression1

All we want to do to find the best regression is to draw a line that is as close to every dot as possible. In the case of two data points it’s easy to draw a line, just join them.

Popular Course in this category
Statistical Analysis Training (10 Courses, 5+ Projects)10 Online Courses | 5 Hands-on Projects | 126+ Hours | Verifiable Certificate of Completion | Lifetime Access
4.5 (5,715 ratings)
Course Price

View Course

Related Courses
Machine Learning Training (17 Courses, 27+ Projects)Deep Learning Training (15 Courses, 24+ Projects)Artificial Intelligence Training (3 Courses, 2 Project)

Now if we are having a number of data points now how to draw the line which is as close as possible to each and every data point.

Simple Linear Regression2

In this case, our goal is to minimize the vertical distance between the line and all the data points. In this way, we predict the best line for our Linear regression model.

What Simple Linear Regression does is?

Below is the detail explanation of Simple Linear Regression:

  • It Draws lots and lots of possible lines of lines and then does any of this analysis.
  • Sum of squared errors.
  • Sum of absolute errors.
  • least square method…etc
  • For our analysis, we will be using the least square method.
  • We will make a difference of all points and will calculate the square of the sum of all the points. Whichever line gives the minimum sum will be our best line.

For Example: By doing this we could take multiple men and their son’s height and do things like telling a man how tall his son could be. before he was even born.

Simple 3
Google Image

The above figure shows a simple linear regression. The line represents the regression line. Given by: y = a + b * x

Where y is the dependent variable (DV): For e.g., how the salary of a person changes depending on the number of years of experience that the employee has. So here the salary of an employee or person will be your dependent variable.

The dependent variable is our target variable, the one we want to predict using linear regression.

x is our independent variable (IV): The dependent variable is the cause of the change independent variable. In the above example, the number of years of experience is our dependent variable, because the number of years of experience is causing the change in the salary of the employee.

simple 4

  • b is the coefficient variable for our independent variable x. This coefficient plays a crucial role. It says how a unit change in x (IV) is going to affect y (DV). It is referred to as the coefficient of proportionate also. In terms of mathematics, it is up to you is the slope of the line or you can say steep of the line.
  • In our example, if slope (b) is less, which means the number of years will yield less increment in salary on the other hand if the slope (b) is more will yield a high increase in salary with an increase in the number of years of experience.
  • a is a constant value. It is referred to as intercept also, which is where the line is intersecting the y-axis or DV axis. In another way we can say when an employee has zero years of experience (x) then the salary (y) for that employee will be constant (a).

How does Least Square work?

Below are the points for least square work:

  • It draws an arbitrary line according to the data trends.
  • It takes data points and draws vertical lines. It considers vertical distance as a parameter.
  • These vertical lines will cut the regression line and gives the corresponding point for data points.
  • It will then find the vertical difference between each data point and its corresponding data point on the regression line.
  • It will calculate the error that is square of the difference.
  • It then calculates the sum of errors.
  • Then again it will draw a line and will repeat the above procedure once again.
  • It draws a number of lines in this fashion and the line which gives the least sum of error is chosen as the best line.
  • This best line is our simple linear regression line.

Application of Simple Linear Regression

Regression analysis is performed to predict the continuous variable. The regression analysis has a wide variety of applications. Some examples are as follows:

  • Predictive Analytics
  • Effectiveness of marketing,
  • pricing of any listing
  • promotion prediction for a product.

Here we are going to discuss one application of linear regression for predictive analytics. We will do modelling using python.

The steps we are going to follow to build our model are as follows:

  • We will do import the libraries and datasets.
  • We will pre-process the data.
  • We will divide the data into the test set and the training set.
  • We will create a model which will try to predict the target variable based on our training set
  • We will predict the target variable for the test set.
  • We will analyze the results predicted by the model

For our Analysis, we are going to use a salary dataset with the data of 30 employees.

# Importing the libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset (Sample of data is shown in table)

dataset = pd.read_csv('Salary_Data.csv')

Years of Experience Salary
1.5 37731
1.1 39343
2.2 39891
2 43525
1.3 46205
3.2 54445
4 55749

# Pre-processing the dataset, here we will divide the data set into the dependent variable and independent variable. x as independent and y as dependent or target variable

X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)

Here test size 1/3 shows that from total data 2/3 part is for training the model and rest 1/3 is used for testing the model.

# Let’s Fit our Simple Linear Regression  model to the Training set

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

Linear Regression model is trained now. This model will be used for predicting the dependent variable.

# Predicting the Test set results

y_pred = regressor.predict(X_test)

# Visualising the Test set results

plt.scatter(X_test, y_test, color = 'blue')
plt.plot(X_train, regressor.predict(X_train), color = 'red')
plt.title('Salary of Employee vs Experience (Test set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

simple 5

# Parameter of model

print(regressor.intercept_)
print(regressor.coef_)
26816.19224403119
[9345.94244312]

So the interceptor (a) value is 26816. This suggests that any fresher (zero experience) would be getting around 26816 amount as salary.

The coefficient for our model came out as 9345.94. It suggests that keeping all the other parameters constant, the change in one unit of the independent variable (years of exp.) will yield a change of 9345 units in salary.

Regression Evaluation Metrics

There are basically 3 important evaluation metrics methods are available for regression analysis:

  • Mean Absolute Error (MAE): It shows the mean of the absolute errors, which is the difference between predicted and actual.
  • Mean Squared Error (MSE): It shows the mean value of squared errors.
  • Root Mean Squared Error (RMSE): It shows the square root of the mean of the squared errors.

We can Compare the above these methods:

  • MAE: It shows the average error and the easiest of all three methods.
  • MSE: This one is more popular than MAE because it enhances the larger errors, which in result shows more insights.
  • RMSE: This one is better than MSE because we can interpret the error in terms of y.

These 3 are nothing but the loss functions.

# Evaluation of model

from sklearn import metrics
print('MAE:', metrics.mean_absolute_error(y_test, y_pred))
print('MSE:', metrics.mean_squared_error(y_test, y_pred))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
MAE: 3426.4269374307123
MSE: 21026037.329511296
RMSE: 4585.4157204675885

Conclusion

Linear Regression analysis is a powerful tool for machine learning algorithms, which is used for predicting continuous variables like salary, sales, performance, etc. Linear regression considers the linear relationship between independent and dependent variables. Simple linear regression has only one independent variable based on which the model predicts the target variable. We have discussed the model and application of linear regression with an example of predictive analysis to predict the salary of employees.

Recommended Articles

This is a guide to Simple Linear Regression. Here we discuss the model and application of linear regression, using a predictive analysis example for predicting employees ‘ salaries. You can also go through our other related articles to learn more-

  1. Matplotlib In Python | Top 14 Plots in Matplotlib
  2. Dictionary in Python | Methods and Examples
  3. Linear Regression vs Logistic Regression | Top Differences
  4. What is Linear Regression?

Statistical Analysis Training (10 Courses, 5+ Projects)

10 Online Courses

5 Hands-on Projects

126+ Hours

Verifiable Certificate of Completion

Lifetime Access

Learn More

0 Shares
Share
Tweet
Share
Primary Sidebar
Machine Learning Tutorial
  • Supervised
    • What is Supervised Learning
    • Supervised Machine Learning
    • Supervised Machine Learning Algorithms
    • Perceptron Learning Algorithm
    • Simple Linear Regression
    • Polynomial Regression
    • Multivariate Regression
    • Regression in Machine Learning
    • Hierarchical Clustering Analysis
    • Linear Regression Analysis
    • Support Vector Regression
    • Linear Regression Modeling
    • Multiple Linear Regression
    • Linear Algebra in Machine Learning
    • Statistics for Machine Learning
    • What is Regression Analysis?
    • Linear Regression Analysis
    • Clustering Methods
    • Backward Elimination
    • Ensemble Techniques
    • Bagging and Boosting
    • Linear Regression Modeling
    • What is Reinforcement Learning
  • Basic
    • Introduction To Machine Learning
    • What is Machine Learning?
    • Uses of Machine Learning
    • Applications of Machine Learning
    • Careers in Machine Learning
    • What is Machine Cycle?
    • Machine Learning Feature
    • Machine Learning Programming Languages
    • Machine Learning Tools
    • Machine Learning Models
    • Machine Learning Platform
    • Machine Learning Libraries
    • Machine Learning Life Cycle
    • Machine Learning System
    • Machine Learning Datasets
    • Types of Machine Learning
    • Machine Learning Methods
    • Machine Learning Software
    • Machine Learning Techniques
    • Machine Learning Feature Selection
    • Ensemble Methods in Machine Learning
    • Decision Making Techniques
    • Restricted Boltzmann Machine
    • Regularization Machine Learning
    • What is Regression?
    • What is Linear Regression?
    • What is Decision Tree?
    • What is Random Forest
  • Algorithms
    • Machine Learning Algorithms
    • Types of Machine Learning Algorithms
    • Bayes Theorem
    • AdaBoost Algorithm
    • Classification Algorithms
    • Clustering Algorithm
    • Gradient Boosting Algorithm
    • Mean Shift Algorithm
    • Hierarchical Clustering Algorithm
    • What is a Greedy Algorithm?
    • What is Genetic Algorithm?
    • Random Forest Algorithm
    • Nearest Neighbors Algorithm
    • Weak Law of Large Numbers
    • Ray Tracing Algorithm
    • SVM Algorithm
    • Naive Bayes Algorithm
    • Neural Network Algorithms
    • Boosting Algorithm
    • XGBoost Algorithm
    • Pattern Searching
    • Loss Functions in Machine Learning
    • Decision Tree in Machine Learning
    • Hyperparameter Machine Learning
    • Unsupervised Machine Learning
    • K- Means Clustering Algorithm
    • KNN Algorithm
    • Monty Hall Problem
  • Classification
    • Kernel Methods in Machine Learning
    • Clustering in Machine Learning
    • Machine Learning Architecture
    • Machine Learning C++ Library
    • Machine Learning Frameworks
    • Data Preprocessing in Machine Learning
    • Data Science Machine Learning
    • Classification of Neural Network
    • Neural Network Machine Learning
    • What is Convolutional Neural Network?
    • Single Layer Neural Network
    • Kernel Methods
    • Forward and Backward Chaining
    • Forward Chaining
    • Backward Chaining
  • Deep Learning
    • What Is Deep learning
    • Deep Learning
    • Application of Deep Learning
    • Careers in Deep Learnings
    • Deep Learning Frameworks
    • Deep Learning Model
    • Deep Learning Algorithms
    • Deep Learning Technique
    • Deep Learning Networks
    • Deep Learning Libraries
    • Deep Learning Toolbox
    • Types of Neural Networks
    • Convolutional Neural Networks
    • Create Decision Tree
    • Deep Learning for NLP
    • Caffe Deep Learning
    • Deep Learning with TensorFlow
  • RPA
    • What is RPA
    • What is Robotics?
    • Benefits of RPA
    • RPA Applications
    • Types of Robots
    • RPA Tools
    • Line Follower Robot
    • What is Blue Prism?
    • RPA vs BPM
  • Pytorch
    • PyTorch Versions
    • Single Layer Perceptron
    • PyTorch vs Keras
    • torch.nn Module
  • UiPath
    • What is UiPath
    • UiPath Careers
    • UiPath Architecture
    • UiPath Orchestrator
    • Uipath Reframework
    • UiPath Studio
  • Interview Questions
    • Machine Learning Interview Questions
    • Deep Learning Interview Questions And Answer
    • Machine Learning Cheat Sheet

Related Courses

Machine Learning Training

Deep Learning Training

Artificial Intelligence Training

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2020 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you
Book Your One Instructor : One Learner Free Class

Let’s Get Started

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA Login

Forgot Password?

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

Special Offer - Statistical Analysis Training (10 Courses, 5+ Projects) Learn More