EDUCBA Logo

EDUCBA

MENUMENU
  • Explore
    • EDUCBA Pro
    • PRO Bundles
    • Featured Skills
    • New & Trending
    • Fresh Entries
    • Finance
    • Data Science
    • Programming and Dev
    • Excel
    • Marketing
    • HR
    • PDP
    • VFX and Design
    • Project Management
    • Exam Prep
    • All Courses
  • Blog
  • Enterprise
  • Free Courses
  • Log in
  • Sign Up
Home Data Science Data Science Tutorials R Programming Tutorial R Squared Regression
 

R Squared Regression

Priya Pedamkar
Article byPriya Pedamkar

Updated July 6, 2023

R squared regression

 

 

Introduction to R Squared Regression

R Squared is a statistical measure defined by the proportion of variance in the dependent variable that can be explained from independent variables.

In other words, in a regression model, the value of R squared tests the goodness of the regression model or how well the data fits in the model. The coefficient of determination is another term for it. The value of R squared can be between 0% to 100%.

Watch our Demo Courses and Videos

Valuation, Hadoop, Excel, Mobile Apps, Web Development & many more.

What is R Squared?

To understand R squared analysis, let us consider n number of points in two-dimensional space, say, (x1, y1), (x2, y2), … (xn, yn). We will plot them in two-dimensional space and consider a line y = mx + b passing through them to minimize the squared error.

r squared regression

You can calculate the total squared error between the points and the line using the formula below. We will consider errors along the y-axis.

e1 = (y1 – (m x1  + b))2
e2 = (y2 – (m x2  + b))2
en = (yn – (m xn  + b))2

Let the squared error of the line be SEline.
SEline = (y1 – (mx1  + b))2 + (y2 – (mx2  + b))2 + … (yn – (mxn  + b))2

Now to know how well the line fits in the data points, let us know “How much (or what percentage) of the variation in y is described by the variation in x?
Variance (y) = Total Variation in y / #n

variance

But the total variation in y that is not described by the regression line can be expressed by the following equation.
The regression line does not describe the SEline / SEY (%) of variation.

So let us say if the above percentage stands out to be 25%, that is, the line does not describe 25% of the variation, then we can say that the variance explained by the line can be equal to (1-25 = 75%). So the formula below describes the total variance described by the regression line.
(1 – SEline / SEY) %

This indicates the percentage of the total variation that the variation in x can explain. This is called the coefficient of determination or R squared.
Coefficient of Determination = R2 = (1 – SEline / SEY)

Example to Implement R Squared Regression

Let us consider an example using Python. The library named sklearn contains the metrics named r2_score. And for the Linear Regression model, we will use LinerRegression from sklearn. We will use the matplotlib library for plotting the regression graph. The Numpy library will be used to reshape the input array.

1. Let us install the sklearn package and import the necessary packages.

pip install scikit-learn
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression from matplotlib import pyplot as plt
import numpy as np

2. Then, we need to download the wage2.csv dataset, which is available here: http://murraylax.org/datasets/wage2.csv

The fields in the dataset are below, which can be found in wage.columns.

dataset

3. We will consider YearsEdu as x and MonthlyEarnings on y, which is the dependent variable. We will scatter plot all the points and visualize the distribution.

y = wage["MonthlyEarnings"] x = wage["YearsEdu"] plt.scatter(x, y)

Output:

R Squared Regression

4. Now, we will train the linear regression model on our data.

X = np.array(x).reshape(-1, 1)
regModel = LinearRegression()
regModel.fit(X, y)
y_pred = regModel.predict(X)

5. Now, to view the best fitting line to our model, we will execute the following script.

plt.scatter(x,y)
plt.plot(X, y_pred, color='red')

Output:

R Squared Regression

6. To compute the R-squared value of the line, the following function can be used.

r2_score(y, y_pred)

Interpretation

  • If the squared error is very small, then we can say that the line is a good fit. So if SEline is a small number, the whole fraction will be a very small value. And this will result in a larger number when subtracted from one. Thus if the squared error is small, then R squared, or the coefficient of determination will get larger, nearly equal to one. This shows that the line is a good fit.
  • Similarly, in the opposite case, if the squared error of the line is huge, that means a lot of error between data points and the line, then SEline this number will get large, resulting in a larger fraction value. So the R squared or the coefficient of determination will be smaller, showing a poorly fit regression line.

Limitation

We can get the relationship between the independent and dependent variables’ movements. It is not a parameter to identify whether the selected model is good or not. You are not provided with any information regarding the potential bias in the data and predictions. It is possible to obtain a high R squared value for an inadequate model and a low value for a well-fitted model. Thus R squared doesn’t help to identify the reliability of the model.

Conclusion

R squared value is not a metric to verify the good fit of the trained linear regression model. But it can give what amount of variance of the independent variable that the variance of the dependent variable can describe.

Recommended Articles

This is a guide to R Squared Regression. Here we discuss R squared analysis and its limitations, interpretation, and example. You can also go through our other related articles to learn more –

  1. Multiple Linear Regression
  2. List of R Packages
  3. Column Vector Matlab
  4. lm Function in R
  5. Guide to Multiple Linear Regression in R

Primary Sidebar

Footer

Follow us!
  • EDUCBA FacebookEDUCBA TwitterEDUCBA LinkedINEDUCBA Instagram
  • EDUCBA YoutubeEDUCBA CourseraEDUCBA Udemy
APPS
EDUCBA Android AppEDUCBA iOS App
Blog
  • Blog
  • Free Tutorials
  • About us
  • Contact us
  • Log in
Courses
  • Enterprise Solutions
  • Free Courses
  • Explore Programs
  • All Courses
  • All in One Bundles
  • Sign up
Email
  • [email protected]

ISO 10004:2018 & ISO 9001:2015 Certified

© 2025 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

EDUCBA Login

Forgot Password?

🚀 Limited Time Offer! - 🎁 ENROLL NOW