EDUCBA Logo

EDUCBA

MENUMENU
  • Explore
    • EDUCBA Pro
    • PRO Bundles
    • Featured Skills
    • New & Trending
    • Fresh Entries
    • Finance
    • Data Science
    • Programming and Dev
    • Excel
    • Marketing
    • HR
    • PDP
    • VFX and Design
    • Project Management
    • Exam Prep
    • All Courses
  • Blog
  • Enterprise
  • Free Courses
  • Log in
  • Sign Up
Home Data Science Data Science Tutorials R Programming Tutorial P-Value in Regression
 

P-Value in Regression

Priya Pedamkar
Article byPriya Pedamkar

Updated March 24, 2023

P-Value in Regression

 

 

Introduction to P-Value in Regression

P-Value is defined as the most important step to accept or reject a null hypothesis. Since it tests the null hypothesis that its coefficient turns out to be zero i.e. for a lower value of the p-value (<0.05) the null hypothesis can be rejected otherwise null hypothesis will hold. In other words, the predictor that holds a lower p-value is likely to be more meaningful addition to the model as a change in the predictor values are related to the changes of the response variable. It is one of the important steps to reject or accept the null hypothesis.

Watch our Demo Courses and Videos

Valuation, Hadoop, Excel, Mobile Apps, Web Development & many more.

Before we start with P-value, we must have to decode what hypothesis testing is. Hypothesis testing is a test that suggests that interpretation based out on samples is right for the entire population or not. While doing hypothesis testing we have to specify null and alternative hypotheses beforehand.

Null Hypothesis: Suggests that there is no statistical significance between the two variables in the study which we are doing.

Alternative Hypothesis: Suggests that there is a statistical significance between the two variables.

Let’s understand these concepts with some examples.

Imagine we live in a world where the mean consultation time of the doctor for consulting a patient is 15 minutes or less. This is our initial belief that within 15 minutes doctors are able to take a wise call for the patient problem, hence we can say the average time for consultation is 15 minutes or less.

The null hypothesis and alternative hypothesis are:

Null Hypothesis: The average consultation time by the doctor is 15 minutes or less than that. We assume that the null hypothesis is a statement that is a general belief. Like every common and at the same time feels very natural.

Alternative Hypothesis: The average consultation time by the doctor is more than 15 minutes. We assume the alternative hypothesis as the statement which discards the initial belief.

Normal Distribution

Now we will discuss the normal distribution (also known as Gaussian distribution). This distribution is used to see the data distribution. Sometimes the researcher mentioned it as the probability density function also.

Features of Normal Distribution

  • It is a part of the central limit theorem
  • It produces a bell-shaped curved
  • In this distribution Mean = Median = Mode
  • Half of the data values are less than 50% and half of the data values are more than 50%
  • It has two parameters, one is mean and the other one in Standard deviation

Syntax

Syntax in R for normal distribution chart looks like:

# Create a sequence of numbers between -20 and 20 incrementing by 0.1.
x <- seq(-20, 20, by = .1)
# Choose the mean as 4.5 and standard deviation as 0.5.
y <- dnorm(x, mean = 4.5, sd = 0.5)
plot(x,y)

Where x is a vector of numbers. The mean value is the mean of the sample which we derive from a population.

Standard deviation is the amount of variation between the set of values and the mean

p-value diagram

Significant Level

A significant level tells us that x% is the probability of rejecting the null hypothesis when it is actually true. Here we meant to say that we will reject the null hypothesis which states that the average time of consultation is 15 minutes or less and for real the consultation is time is less than or equal to 15 minutes, and still we reject it. The significant level is also known as “alpha” and denoted as “α”.

P-Value in Regression

We didn’t discuss on what basis we can accept or reject the null hypothesis, let’s discuss that now.

To accept or reject the null hypothesis, we have to consider the P-value of the model. The model here can be regression analysis. Now, we will discuss how to calculate the P-value of a regression model and how to interpret it.

1. Dataset

We will use the “USArrest” dataset here, which is available in Rstudio.

Murder arrests (per 100,000) Assault arrests (per 100,000) Percent urban population Rape arrests (per 100,000)
Alabama 13.2 236 58 21.2
Alaska 10 263 48 44.5
Arizona 8.1 294 80 31
Arkansas 8.8 190 50 19.5
California 9 276 91 40.6
Colorado 7.9 204 78 38.7
Connecticut 3.3 110 77 11.1
Delaware 5.9 238 72 15.8
Florida 15.4 335 80 31.9
Georgia 17.4 211 60 25.8
Hawaii 5.3 46 83 20.2
Idaho 2.6 120 54 14.2
Illinois 10.4 249 83 24
Indiana 7.2 113 65 21
Iowa 2.2 56 57 11.3
Kansas 6 115 66 18
Kentucky 9.7 109 52 16.3
Louisiana 15.4 249 66 22.2
Maine 2.1 83 51 7.8
Maryland 11.3 300 67 27.8
Massachusetts 4.4 149 85 16.3
Michigan 12.1 255 74 35.1
Minnesota 2.7 72 66 14.9
Mississippi 16.1 259 44 17.1
Missouri 9 178 70 28.2
Montana 6 109 53 16.4
Nebraska 4.3 102 62 16.5
Nevada 12.2 252 81 46
New Hampshire 2.1 57 56 9.5
New Jersey 7.4 159 89 18.8
New Mexico 11.4 285 70 32.1
New York 11.1 254 86 26.1
North Carolina 13 337 45 16.1
North Dakota 0.8 45 44 7.3
Ohio 7.3 120 75 21.4
Oklahoma 6.6 151 68 20
Oregon 4.9 159 67 29.3
Pennsylvania 6.3 106 72 14.9
Rhode Island 3.4 174 87 8.3
South Carolina 14.4 279 48 22.5
South Dakota 3.8 86 45 12.8
Tennessee 13.2 188 59 26.9
Texas 12.7 201 80 25.5
Utah 3.2 120 80 22.9
Vermont 2.2 48 32 11.2
Virginia 8.5 156 63 20.7
Washington 4 145 73 26.2
West Virginia 5.7 81 39 9.3
Wisconsin 2.6 53 66 10.8
Wyoming 6.8 161 60 15.6

2. Problem

We have to find whether there is a significant relationship between speed and distance in the linear regression model and our significant level is .05

3. Solution

Now we will write the syntax for linear regression.

linear_regression<-lm(Assault ~ UrbanPop, data = USArrests)
print(linear_regression)

p- value in regression

As per the above outcome, our linear regression equation looks like this

Dist = 73.08 + (1.49)Speed

For the summary of the model, we will pass Summary() syntax

linear_regression<-lm(Assault ~ UrbanPop, data = USArrests)
summary(linear_regression)

Solution

P-value in our model is 0.06948 and it is more than the significant level which is 0.05. Hence, we can conclude that there is no relationship between the “Assault” and the “Urbanpop” variable and we can accept the null hypothesis.

Conclusion

P-value is introduced by Pearson in 1900. It is one of the preferred methods which researchers use to summarize the result of the problems they are dealing with. But taking decisions solely on P-value is not right, it is recommended to consider other contextual factors to derive scientific inferences. Not just P-value, everything from study design, logical assumptions, and quality of measurements are also important.

Recommended Articles

This is a guide to P-Value in Regression. Here we discuss the introduction to P-Value Regression along with the normal distribution, significant level and how to calculate and interpret the P-value of a regression model. You may also look at the following articles to learn more –

  1. Linear Regression in R
  2. Multivariate Regression
  3. Polynomial Regression
  4. Linear Regression Analysis

Primary Sidebar

Footer

Follow us!
  • EDUCBA FacebookEDUCBA TwitterEDUCBA LinkedINEDUCBA Instagram
  • EDUCBA YoutubeEDUCBA CourseraEDUCBA Udemy
APPS
EDUCBA Android AppEDUCBA iOS App
Blog
  • Blog
  • Free Tutorials
  • About us
  • Contact us
  • Log in
Courses
  • Enterprise Solutions
  • Free Courses
  • Explore Programs
  • All Courses
  • All in One Bundles
  • Sign up
Email
  • [email protected]

ISO 10004:2018 & ISO 9001:2015 Certified

© 2025 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

EDUCBA Login

Forgot Password?

🚀 Limited Time Offer! - 🎁 ENROLL NOW