EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials R Programming Tutorial R Squared Regression
Secondary Sidebar
Clustering Algorithms

Pie Chart in R

Data Science vs Data Engineering

What is MapReduce in Hadoop?

Virtualization in Cloud Computing

Bias-Variance

R Squared Regression

By Priya PedamkarPriya Pedamkar

R squared regression

Introduction to R Squared Regression

R Squared is a statistical measure, which is defined by the proportion of variance in the dependent variable that can be explained from independent variables. In other words, in a regression model, the value of R squared test about the goodness of the regression model or the how well the data fits in the model. It is also called the coefficient of determination. The value of R squared can be between 0% to 100%.

What is R Squared?

To understand R squared analysis, let us consider n number of points in two-dimensional space say, (x1, y1), (x2, y2), … (xn, yn). Now we will plot them in two-dimensional space, and consider a line y = mx + b passing through them in such a way that the squared error is minimized.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

r squared regression

Now the total squared error between the points and the line can be calculated as below. We will consider error along the y-axis.

e1 = (y1 – (m x1  + b))2
e2 = (y2 – (m x2  + b))2
en = (yn – (m xn  + b))2

Let the squared error of line be SEline.
SEline = (y1 – (mx1  + b))2 + (y2 – (mx2  + b))2 + … (yn – (mxn  + b))2

Now to know how good the line fits in the data points, let us know “How much (or what percentage) of the variation in y is described by the variation in x?
Variance (y) = Total Variation in y / #n

variance

But the total variation in y that is not described by the regression line can be expressed by the following equation.
SEline / SEY (%) of variation is not described by the regression line.

So let us say if the above percentage stands out to be 25%, that is 25% of the variation is not described by the line then we can say that the variance described by the line can be equal to (1-25 = 75%). So the total variance described by the regression line can be described by the below formula.
(1 – SEline / SEY) %

So this gives what percentage of the total variation is described by the variation in x. This is called the coefficient of determination or R squared.
Coefficient of Determination = R2 = (1 – SEline / SEY)

Example to Implement R Squared Regression

Let us consider an example using Python. The library named sklearn contains the metrics named r2_score. And for the Linear Regression model, we will use LinerRegression from sklearn. We will use the matplotlib library for plotting the regression graph. Numpy library will be used to reshape the input array.

1. Let us first install the sklearn package, and then import the necessary packages.

pip install scikit-learn
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression from matplotlib import pyplot as plt
import numpy as np

2. Then, we need to download wage2.csv dataset which is available here: http://murraylax.org/datasets/wage2.csv

The fields present in the dataset are as below which can be found by wage.columns.

dataset

3. We will consider YearsEdu as x and MonthlyEarnings on y that is the dependent variable. We will scatter plot all the points and visualize the distribution.

y = wage["MonthlyEarnings"] x = wage["YearsEdu"] plt.scatter(x, y)

Output:

R Squared Regression

4. Now we will train the linear regression model on our data.

X = np.array(x).reshape(-1, 1)
regModel = LinearRegression()
regModel.fit(X, y)
y_pred = regModel.predict(X)

5. Now to view the best fitting line to our model, we will execute the following script.

plt.scatter(x,y)
plt.plot(X, y_pred, color='red')

Output:

R Squared Regression

6. To compute the R-squared value of the line, the following function can be used.

r2_score(y, y_pred)

Interpretation

  • If the squared error is very small then we can say that the line is a good fit. So if SEline is a small number then the whole fraction will be a very small value. And this will result in a larger number when subtracted from one. Thus if the squared error is small then R squared or the coefficient of determination will get larger, nearly equal to one. This shows that the line is a good fit.
  • Similarly in the opposite case if the squared error of line is huge that means a lot of error between data points and the line, then SEline this number will get large and hence resulting in a larger value of the fraction. So the R squared or the coefficient of determination will be a smaller value, showing a poorly fit regression line.

Limitation

We can get the relationship between independent variables movements and the movements of dependent variables. It is not a parameter to identify whether the selected model is good or not. And you do not get any information about if the data and predictions are biased or not. There are times when you can get high R squared value for the poor model and a low value for a well-fitted model. Thus R squared doesn’t help to identify the reliability of the model.

Conclusion

R squared value is not a metric to verify the good fit of the trained linear regression model. But it can give what amount of variance of the independent variable that can be described by the variance of the dependent variable.

Recommended Articles

This is a guide to R Squared Regression. Here we discuss what is R squared analysis along with its limitations, interpretation, and example. You can also go through our other related articles to learn more –

  1. Multiple Linear Regression
  2. List of R Packages
  3. Column Vector Matlab
  4. lm Function in R
  5. Guide to Multiple Linear Regression in R
Popular Course in this category
R Programming Training (13 Courses, 20+ Projects)
  13 Online Courses |  20 Hands-on Projects |  120+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course

Related Courses

Statistical Analysis Training (15 Courses, 10+ Projects)4.9
All in One Data Science Bundle (360+ Courses, 50+ projects)4.8
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2023 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more