EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Machine Learning Tutorial Simple Linear Regression
Secondary Sidebar
Machine Learning Tutorial
  • Supervised
    • What is Supervised Learning
    • Supervised Machine Learning
    • Supervised Machine Learning Algorithms
    • Perceptron Learning Algorithm
    • Simple Linear Regression
    • Polynomial Regression
    • Multivariate Regression
    • Regression in Machine Learning
    • Hierarchical Clustering Analysis
    • Linear Regression Analysis
    • Support Vector Regression
    • Multiple Linear Regression
    • Linear Algebra in Machine Learning
    • Statistics for Machine Learning
    • What is Regression Analysis?
    • Clustering Methods
    • Backward Elimination
    • Ensemble Techniques
    • Bagging and Boosting
    • Linear Regression Modeling
    • What is Reinforcement Learning
  • Basic
    • Introduction To Machine Learning
    • What is Machine Learning?
    • Uses of Machine Learning
    • Applications of Machine Learning
    • Naive Bayes in Machine Learning
    • Dataset Labelling
    • DataSet Example
    • Deep Learning Techniques
    • Dataset ZFS
    • Careers in Machine Learning
    • What is Machine Cycle?
    • Machine Learning Feature
    • Machine Learning Programming Languages
    • What is Kernel in Machine Learning
    • Machine Learning Tools
    • Machine Learning Models
    • Machine Learning Platform
    • Machine Learning Libraries
    • Machine Learning Life Cycle
    • Machine Learning System
    • Machine Learning Datasets
    • Machine Learning Certifications
    • Machine Learning Python vs R
    • Optimization for Machine Learning
    • Types of Machine Learning
    • Machine Learning Methods
    • Machine Learning Software
    • Machine Learning Techniques
    • Machine Learning Feature Selection
    • Ensemble Methods in Machine Learning
    • Support Vector Machine in Machine Learning
    • Decision Making Techniques
    • Restricted Boltzmann Machine
    • Regularization Machine Learning
    • What is Regression?
    • What is Linear Regression?
    • Dataset for Linear Regression
    • Decision tree limitations
    • What is Decision Tree?
    • What is Random Forest
  • Algorithms
    • Machine Learning Algorithms
    • Apriori Algorithm in Machine Learning
    • Types of Machine Learning Algorithms
    • Bayes Theorem
    • AdaBoost Algorithm
    • Classification Algorithms
    • Clustering Algorithm
    • Gradient Boosting Algorithm
    • Mean Shift Algorithm
    • Hierarchical Clustering Algorithm
    • Hierarchical Clustering Agglomerative
    • What is a Greedy Algorithm?
    • What is Genetic Algorithm?
    • Random Forest Algorithm
    • Nearest Neighbors Algorithm
    • Weak Law of Large Numbers
    • Ray Tracing Algorithm
    • SVM Algorithm
    • Naive Bayes Algorithm
    • Neural Network Algorithms
    • Boosting Algorithm
    • XGBoost Algorithm
    • Pattern Searching
    • Loss Functions in Machine Learning
    • Decision Tree in Machine Learning
    • Hyperparameter Machine Learning
    • Unsupervised Machine Learning
    • K- Means Clustering Algorithm
    • KNN Algorithm
    • Monty Hall Problem
  • Classification
    • Kernel Methods in Machine Learning
    • Clustering in Machine Learning
    • Machine Learning Architecture
    • Automation Anywhere Architecture
    • Machine Learning C++ Library
    • Machine Learning Frameworks
    • Data Preprocessing in Machine Learning
    • Data Science Machine Learning
    • Classification of Neural Network
    • Neural Network Machine Learning
    • What is Convolutional Neural Network?
    • Single Layer Neural Network
    • Kernel Methods
    • Forward and Backward Chaining
    • Forward Chaining
    • Backward Chaining
  • Deep Learning
    • What Is Deep learning
    • Overviews Deep Learning
    • Application of Deep Learning
    • Careers in Deep Learnings
    • Deep Learning Frameworks
    • Deep Learning Model
    • Deep Learning Algorithms
    • Deep Learning Technique
    • Deep Learning Networks
    • Deep Learning Libraries
    • Deep Learning Toolbox
    • Types of Neural Networks
    • Convolutional Neural Networks
    • Create Decision Tree
    • Deep Learning for NLP
    • Caffe Deep Learning
    • Deep Learning with TensorFlow
  • RPA
    • What is RPA
    • What is Robotics?
    • Benefits of RPA
    • RPA Applications
    • Types of Robots
    • RPA Tools
    • Line Follower Robot
    • What is Blue Prism?
    • RPA vs BPM
  • Interview Questions
    • Deep Learning Interview Questions And Answer
    • Machine Learning Cheat Sheet

Related Courses

Machine Learning Training

Deep Learning Training

Artificial Intelligence Training

Simple Linear Regression

By Priya PedamkarPriya Pedamkar

simple linear regressions

Introduction to Simple Linear Regression

The following article provides an outline for Simple Linear Regression.

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (85,992 ratings)

From Dictionary: A return to a former or less developed state.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

In Statistics: A measure of the relation between the mean value of one variable and corresponding values of the other variables.

The regression, in which the relationship between the input variable (independent variable) and target variable (dependent variable) is considered linear, is called Linear regression. Simple Linear Regression is a type of linear regression where we have only one independent variable to predict the dependent variable. Simple Linear Regression is one of the machine learning algorithms. Simple linear regression belongs to the family of Supervised Learning. Regression is used for predicting continuous values.

Model of Simple Linear Regression

Let’s make it simple. How it all started?

It all started in 1800 with Francis Galton. He studied the relationship in height between fathers and their sons. He observed a pattern: Either the son’s height would be as tall as his father’s height, or the son’s height would be closer to all people’s overall avg height. This phenomenon is nothing but regression.

Example:

Shaq O’Neal is a very famous NBA player and is 2.16 meters tall. His sons Shaqir and Shareef O’neal are 1.96 meters and 2.06 meters tall, respectively. The average population height is 1.76 meters. Son’s height regress (drift toward) the mean height.

How we do Regression?

Calculating a regression with only two data points:

Simple Linear Regression1

We want to find the best regression to draw a line that is as close to every dot as possible. In the case of two data points, it’s easy to draw a line; just join them.

Now, if we have a number of data points now, how to draw a line that is as close as possible to each data point.

Simple Linear Regression2

In this case, our goal is to minimize the vertical distance between the line and all the data points. In this way, we predict the best line for our Linear regression model.

What Simple Linear Regression does?

Below is a detailed explanation of Simple Linear Regression:

  • It draws lots and lots of possible lines of lines and then does any of this analysis.
  • Sum of squared errors.
  • Sum of absolute errors.
  • Least square method…etc.
  • For our analysis, we will be using the least square method.
  • We will make a difference of all points and calculate the sum of all the points. Whichever line gives the minimum sum will be our best line.

Example: By doing this, we could take multiple men and their son’s height and do things like telling a man how tall his son could be. Before he was even born.

Simple 3
Google Image

The above figure shows a simple linear regression. The line represents the regression line. Given by: y = a + b * x.

Where y is the dependent variable (DV): For e.g., how the salary of a person changes depending on the number of years of experience that the employee has. So here, the salary of an employee or person will be your dependent variable. The dependent variable is our target variable, the one we want to predict using linear regression.

x is our independent variable (IV): The dependent variable is the cause of the change independent variable. In the above example, the number of years of experience is our dependent variable because the number of years of experience is causing the change in the salary of the employee.

simple 4

  • b is the coefficient variable for our independent variable x. This coefficient plays a crucial role. It says how a unit change in x (IV) is going to affect y (DV). It is referred to as the coefficient of proportionate also. In terms of mathematics, it is up to you as the line’s slope, or you can say steep of the line.
  • In our example, if slope (b) is less, which means the number of years will yield less increment in salary; on the other hand, if the slope (b) is more will yield a high increase in salary with an increase in the number of years of experience.
  • a is a constant value. It is referred to as intercept also, which is where the line is intersecting the y-axis or DV axis. In another way, we can say when an employee has zero years of experience (x), then the salary (y) for that employee will be constant (a).

How does Least Square Work?

Below are the points for least square work:

  • It draws an arbitrary line according to the data trends.
  • It takes data points and draws vertical lines. It considers vertical distance as a parameter.
  • These vertical lines will cut the regression line and gives the corresponding point for data points.
  • It will then find the vertical difference between each data point and its corresponding data point on the regression line.
  • It will calculate the error that is the square of the difference.
  • It then calculates the sum of errors.
  • Then again, it will draw a line and will repeat the above procedure once again.
  • It draws a number of lines in this fashion, and the line which gives the least sum of error is chosen as the best line.
  • This best line is our simple linear regression line.

Application of Simple Linear Regression

Regression analysis is performed to predict the continuous variable. Regression analysis has a wide variety of applications.

Some examples are as follows:

  • Predictive analytics
  • Effectiveness of marketing
  • Pricing of any listing
  • Promotion prediction for a product

Here we are going to discuss one application of linear regression for predictive analytics. We will do modelling using python.

Steps we are going to follow to build our model as follows:

  • We will import the libraries and datasets.
  • We will pre-process the data.
  • We will divide the data into the test set and the training set.
  • We will create a model which will try to predict the target variable based on our training set.
  • We will predict the target variable for the test set.
  • We will analyze the results predicted by the model

For our Analysis, we are going to use a salary dataset with the data of 30 employees.

# Importing the libraries

Code:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset (Sample of data is shown in table)

Code:

dataset = pd.read_csv('Salary_Data.csv')

Years of Experience Salary
1.5 37731
1.1 39343
2.2 39891
2 43525
1.3 46205
3.2 54445
4 55749

# Pre-processing the dataset, here we will divide the data set into the dependent variable and independent variable. x as independent and y as dependent or target variable

Code:

X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set

Code:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)

Here test size 1/3 shows that from the total data, 2/3 part is for training the model, and the rest 1/3 is used for testing the model.

# Let’s Fit our Simple Linear Regression model to the Training set

Code:

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

The Linear Regression model is trained now. This model will be used for predicting the dependent variable.

# Predicting the Test set results

Code:

y_pred = regressor.predict(X_test)

# Visualising the Test set results

Code:

plt.scatter(X_test, y_test, color = 'blue')
plt.plot(X_train, regressor.predict(X_train), color = 'red')
plt.title('Salary of Employee vs Experience (Test set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

simple 5

# Parameter of model

Code:

print(regressor.intercept_)
print(regressor.coef_)
26816.19224403119
[9345.94244312]

So the interceptor (a) value is 26816. This suggests that any fresher (zero experience) would be getting around 26816 amount as salary.

The coefficient for our model came out as 9345.94. It suggests that keeping all the other parameters constant, the change in one unit of the independent variable (years of exp.) will yield a change of 9345 units in salary.

Regression Evaluation Metrics

There are basically 3 important evaluation metrics methods available for regression analysis:

  • Mean Absolute Error (MAE): It shows the mean of the absolute errors, which is the difference between predicted and actual.
  • Mean Squared Error (MSE): It shows the mean value of squared errors.
  • Root Mean Squared Error (RMSE): It shows the square root of the mean of the squared errors.

We can compare the above methods:

  • MAE: It shows the average error and the easiest of all three methods.
  • MSE: This one is more popular than MAE because it enhances the larger errors, which in a result shows more insights.
  • RMSE: This one is better than MSE because we can interpret the error in terms of y.

These 3 are nothing but the loss functions.

# Evaluation of model

Code:

from sklearn import metrics
print('MAE:', metrics.mean_absolute_error(y_test, y_pred))
print('MSE:', metrics.mean_squared_error(y_test, y_pred))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
MAE: 3426.4269374307123
MSE: 21026037.329511296
RMSE: 4585.4157204675885

Conclusion

Linear Regression analysis is a powerful tool for machine learning algorithms used to predict continuous variables like salary, sales, performance, etc. Linear regression considers the linear relationship between independent and dependent variables. Simple linear regression has only one independent variable based on which the model predicts the target variable. We have discussed the model and application of linear regression with an example of predictive analysis to predict the salary of employees.

Recommended Articles

This is a guide to Simple Linear Regression. Here we discuss the introduction, what simple linear regression does, working, application & evaluation metrics. You can also go through our other related articles to learn more –

  1. Dataset for Linear Regression
  2. Matlab linear regression
  3. NumPy linear regression
  4. Linear Regression Analysis
Popular Course in this category
Statistical Analysis Training (15 Courses, 10+ Projects)
  15 Online Courses |  10 Hands-on Projects |  140+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course

Related Courses

Machine Learning Training (20 Courses, 29+ Projects)4.9
Deep Learning Training (18 Courses, 24+ Projects)4.8
Artificial Intelligence AI Training (5 Courses, 2 Project)4.7
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more