EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials R Programming Tutorial Simple Linear Regression in R
Secondary Sidebar
R programming Tutorial
  • Regression in R
    • Simple Linear Regression in R
    • Linear Regression in R
    • Multiple Linear Regression in R
    • Logistic Regression in R
    • Poisson Regression in R
    • OLS Regression in R
    • P-Value in Regression
  • Basic
    • What is R Programming Language
    • Careers in R Programming
    • Install R
    • List of R Packages
    • Introduction of R Tools Technology
    • R Programming Language
    • DataSet in R
    • What is RStudio?
    • R-studio-Functions
    • R Packages
    • Time series?in R
    • R Data Types
    • R for data science
    • R Operators
    • R Data Frame
    • R Analytics Tool
    • R Tree Package
    • Vectors in R
  • Control statement
    • If Statement in R
    • If Else Statement in R
    • Else if in R
    • Switch Statement in R
  • Loops
    • Loops in R
    • For Loop in R
    • Nested For Loop in R
    • While Loop in R
    • Next in R
  • Chart/graphs
    • Graphs in R
    • Bar Charts in R
    • Pie Chart in R
    • Histogram in R
    • Line Graph in R
    • Plot Function in R
    • Scatterplot in R
    • R Boxplot labels
  • Anova in R
    • ANOVA in R
    • One Way ANOVA in R
    • Two Way ANOVA in R
  • Data Structure
    • R list
    • Arrays in R
    • Data Frames in R
    • Factors in R
    • R Vectors
  • Advanced
    • Statistical Analysis with R
    • R String Functions
    • Data Exploration in R
    • R CSV Files
    • KNN Algorithm in R
    • Sorting in R
    • lm Function in R
    • Hierarchical Clustering in R
    • R Normal Distribution
    • Binomial Distribution in R
    • Decision Tree in R
    • GLM in R
    • Arima Model in R
    • Linear Model in R
    • Predict Function in R
    • Survival Analysis in R
    • Standard Deviation in R
    • Statistical Analysis in R
    • Predictive Analysis?in R
    • T-test in R
    • Database in R
  • Programs
    • Functions in R
    • Boxplot in R
    • R Program Functions
    • Factorial in R
    • Random Number Generator in R
  • Interview question
    • R Interview Questions

Related Courses

R Programming Certification Course

Statistical Analysis Course Training

All in One Data Science Courses

Simple Linear Regression in R

By Priya PedamkarPriya Pedamkar

Simple Linear Regression in R

Overview of Simple Linear Regression in R

A statistical concept that involves in establishing the relationship between two variables in such a manner that one variable is used to determine the value of another variable is known as simple linear regression in R.
The relationship is established in the form of a mathematical equation obtained through various values of the two variables through complex calculations. Just establishing a relationship is not enough, but data must also meet certain assumptions. R programming offers an effective and robust mechanism to implement the concept.

Advantages of Simple Linear Regression in R

Advantages of using Linear Regression Model:

  • Robust statistical model.
  • Helps us to make forecast and prediction.
  • It helps us to make better business decisions.
  • It helps us to take a rational call on the logical front.
  • We can take corrective actions for the errors left out in this model.

Equation of Linear regression model

The Equation of Linear regression model is given below:

Y = β1 + β2X + ϵ
  • Independent Variable is X
  • Dependent Variable is Y
  • β1 is an intercept of the regression model
  • β2 is a slope of the regression model
  • ϵ is the error term

graph

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

We will work on the “cars” dataset which comes inbuilt with Rstudio.

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,650 ratings)

Let see how the structure of the cars dataset looks like.

For this, we will use the Str() code.

str(cars)

simple linear regression in r

Here we can see that our dataset contains two variables Speed and Distance.

Speed is an independent variable and Distance is a dependent variable.

Let’s take the statistical view of the cars dataset.

For this, we will use the Summary() code.

summary(cars)

simple linear regression in r

In speed variable we have maximum observation is 25 whereas in distance variable the maximum observation is 120. Similarly, minimum observation in speed is 4 and distance is 2.

The Plot Visualization

To understand more about data we will use some visualization:

  • A scatter plot: Helps to identify whether there is any type of correlation is present between the two variables.
  • Box plot: Helps us to display the distribution of data. Distribution of data based on minimum, first quartile, median, third quartile and maximum.
  • The density plot: Helps us to show the probability density function graphically.

1. Scatter plot

It will help to visualize any relationships between the X variable and the Y variable.

#Scatterplotscatter.smooth(x=cars$speed, y=cars$dist, main="Dist ~ Speed", xlab = "Speed", ylab = "Distance" )

simple linear regression in r

A Scatter Plot here signifies that there is a linearly increasing relationship between the dependent variable (Distance) and the independent variable (Speed).

2. Box Plot

Box plot helps us to identify the outliers in both X and Y variables if any.

Code for box plot looks like this:

#Scatterplot
scatter.smooth(x=cars$speed, y=cars$dist, main="Dist ~ Speed", xlab = "Speed", ylab = "Distance" )
#Divide graph area in 2 columns
par(mfrow=c(1, 2))
#Boxplot of Distance
boxplot(cars$dist, main="Distance", sub=paste("Outlier rows: ", boxplot.stats(cars$dist)$out))
#Boxplot for Speed
boxplot(cars$speed, main="Speed", sub=paste("Outlier rows: ",
boxplot.stats(cars$speed)$out))

simple linear regression in r

3. Density Plot

This plot helps to see the normality of the distribution

#Divide graph area in 2 columns
par(mfrow=c(1, 2))
#Density plot for Speed variable
plot(density(cars$speed), main="Density Plot: Speed", ylab="Frequency", sub=paste("Skewness:",
+     +round(e1071::skewness(cars$speed), 2)))
polygon(density(cars$speed), col="red")
#Density plot for Distance
plot(density(cars$dist), main="Density Plot: Distance", ylab="Frequency", sub=paste("Skewness:",
+     +round(e1071::skewness(cars$dist), 2)))
polygon(density(cars$dist), col="red")

simple linear regression in r

Types of Correlation Analysis

This analysis helps us to find the relationship between the variables.

Types of correlation analysis:

  1. Weak Correlation (a value closer to 0)
  2. Strong Correlation (a value closer to ± 0.99)
  3. Perfect Correlation
  4. No Correlation
  5. Negative Correlation (-0.99 to -0.01)
  6. Positive Correlation (0.01 to 0.99)

#Correlation between speed and distance
cor(cars$speed, cars$dist)

Correlation Between Speed

0.8068949 signifies that there is a strong positive correlation between the two variables (Speed and Distance).

Linear Regression model

Now we will start the most important part of the model.

The formula used for linear regression is lm(Dependent Variable ~ Independent Variable)

#linear regression model
linear_regression <- lm(dist ~ speed, data=cars)
print(linear_regression)

Model

We will fit these output in our regression analysis equation

Y = β1 + β2X + ϵ

dist = −17.579 + 3.932∗speed

To get a summary of the linear regression model, we will use code Summary()

linear_regression <- lm(dist ~ speed, data=cars)
summary(linear_regression)

 Linear Regression

Now we will understand what these outcomes mean.

  • R squared (R2) also known as the coefficient of determination. It will tell us what proportion of change in the dependent variable caused by the independent variable. It is always between 0 and 1. The higher the value of the R squared the better the model is.
  • Adjusted R Squared, it is a better statistic to consider if we want to see the credibility of our model. Adjusted R2 helps us to check the goodness of the model also and it will also penalize the model if we add a variable that does not improve our existing model.
  • As per our model summary, Adjusted R squared is 0.6438 or we can say that 64% of the variance in the data is being explained by the model.
  • Further, there are many statistics to check the credibility of our model like t-statistic, F-statistic, etc.

Recommended Articles

This is a guide to Simple Linear Regression in R. Here we discuss the introduction, advantages of Simple Linear Regression in R, Some of the Plot visualization, and types with respective examples. You may also look at the following articles to learn more –

  1. Polynomial Regression
  2. Linear Regression Modeling
  3. Scatterplots in R | How to Create?
  4. What is Linear Regression?
Popular Course in this category
R Programming Training (13 Courses, 20+ Projects)
  13 Online Courses |  20 Hands-on Projects |  120+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course

Related Courses

Statistical Analysis Training (15 Courses, 10+ Projects)4.9
All in One Data Science Bundle (360+ Courses, 50+ projects)4.8
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more