EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials R Programming Tutorial P-Value in Regression
Secondary Sidebar
R programming Tutorial
  • Regression in R
    • Simple Linear Regression in R
    • Linear Regression in R
    • Multiple Linear Regression in R
    • Logistic Regression in R
    • Poisson Regression in R
    • OLS Regression in R
    • P-Value in Regression
  • Basic
    • What is R Programming Language
    • Careers in R Programming
    • Install R
    • List of R Packages
    • Introduction of R Tools Technology
    • R Programming Language
    • DataSet in R
    • What is RStudio?
    • R-studio-Functions
    • R Packages
    • Time series?in R
    • R Data Types
    • R for data science
    • R Operators
    • R Data Frame
    • R Analytics Tool
    • R Tree Package
    • Vectors in R
  • Control statement
    • If Statement in R
    • If Else Statement in R
    • Else if in R
    • Switch Statement in R
  • Loops
    • Loops in R
    • For Loop in R
    • Nested For Loop in R
    • While Loop in R
    • Next in R
  • Chart/graphs
    • Graphs in R
    • Bar Charts in R
    • Pie Chart in R
    • Histogram in R
    • Line Graph in R
    • Plot Function in R
    • Scatterplot in R
    • R Boxplot labels
  • Anova in R
    • ANOVA in R
    • One Way ANOVA in R
    • Two Way ANOVA in R
  • Data Structure
    • R list
    • Arrays in R
    • Data Frames in R
    • Factors in R
    • R Vectors
  • Advanced
    • Statistical Analysis with R
    • R String Functions
    • Data Exploration in R
    • R CSV Files
    • KNN Algorithm in R
    • Sorting in R
    • lm Function in R
    • Hierarchical Clustering in R
    • R Normal Distribution
    • Binomial Distribution in R
    • Decision Tree in R
    • GLM in R
    • Arima Model in R
    • Linear Model in R
    • Predict Function in R
    • Survival Analysis in R
    • Standard Deviation in R
    • Statistical Analysis in R
    • Predictive Analysis?in R
    • T-test in R
    • Database in R
  • Programs
    • Functions in R
    • Boxplot in R
    • R Program Functions
    • Factorial in R
    • Random Number Generator in R
  • Interview question
    • R Interview Questions

P-Value in Regression

By Priya PedamkarPriya Pedamkar

P-Value in Regression

Introduction to P-Value in Regression

P-Value is defined as the most important step to accept or reject a null hypothesis. Since it tests the null hypothesis that its coefficient turns out to be zero i.e. for a lower value of the p-value (<0.05) the null hypothesis can be rejected otherwise null hypothesis will hold. In other words, the predictor that holds a lower p-value is likely to be more meaningful addition to the model as a change in the predictor values are related to the changes of the response variable. It is one of the important steps to reject or accept the null hypothesis.

Before we start with P-value, we must have to decode what hypothesis testing is. Hypothesis testing is a test that suggests that interpretation based out on samples is right for the entire population or not. While doing hypothesis testing we have to specify null and alternative hypotheses beforehand.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

Null Hypothesis: Suggests that there is no statistical significance between the two variables in the study which we are doing.

Alternative Hypothesis: Suggests that there is a statistical significance between the two variables.

Let’s understand these concepts with some examples.

Imagine we live in a world where the mean consultation time of the doctor for consulting a patient is 15 minutes or less. This is our initial belief that within 15 minutes doctors are able to take a wise call for the patient problem, hence we can say the average time for consultation is 15 minutes or less.

The null hypothesis and alternative hypothesis are:

Null Hypothesis: The average consultation time by the doctor is 15 minutes or less than that. We assume that the null hypothesis is a statement that is a general belief. Like every common and at the same time feels very natural.

Alternative Hypothesis: The average consultation time by the doctor is more than 15 minutes. We assume the alternative hypothesis as the statement which discards the initial belief.

Normal Distribution

Now we will discuss the normal distribution (also known as Gaussian distribution). This distribution is used to see the data distribution. Sometimes the researcher mentioned it as the probability density function also.

Features of Normal Distribution

  • It is a part of the central limit theorem
  • It produces a bell-shaped curved
  • In this distribution Mean = Median = Mode
  • Half of the data values are less than 50% and half of the data values are more than 50%
  • It has two parameters, one is mean and the other one in Standard deviation

Syntax

Syntax in R for normal distribution chart looks like:

# Create a sequence of numbers between -20 and 20 incrementing by 0.1.
x <- seq(-20, 20, by = .1)
# Choose the mean as 4.5 and standard deviation as 0.5.
y <- dnorm(x, mean = 4.5, sd = 0.5)
plot(x,y)

Where x is a vector of numbers. The mean value is the mean of the sample which we derive from a population.

Standard deviation is the amount of variation between the set of values and the mean

p-value diagram

Significant Level

A significant level tells us that x% is the probability of rejecting the null hypothesis when it is actually true. Here we meant to say that we will reject the null hypothesis which states that the average time of consultation is 15 minutes or less and for real the consultation is time is less than or equal to 15 minutes, and still we reject it. The significant level is also known as “alpha” and denoted as “α”.

P-Value in Regression

We didn’t discuss on what basis we can accept or reject the null hypothesis, let’s discuss that now.

To accept or reject the null hypothesis, we have to consider the P-value of the model. The model here can be regression analysis. Now, we will discuss how to calculate the P-value of a regression model and how to interpret it.

1. Dataset

We will use the “USArrest” dataset here, which is available in Rstudio.

Murder arrests (per 100,000) Assault arrests (per 100,000) Percent urban population Rape arrests (per 100,000)
Alabama 13.2 236 58 21.2
Alaska 10 263 48 44.5
Arizona 8.1 294 80 31
Arkansas 8.8 190 50 19.5
California 9 276 91 40.6
Colorado 7.9 204 78 38.7
Connecticut 3.3 110 77 11.1
Delaware 5.9 238 72 15.8
Florida 15.4 335 80 31.9
Georgia 17.4 211 60 25.8
Hawaii 5.3 46 83 20.2
Idaho 2.6 120 54 14.2
Illinois 10.4 249 83 24
Indiana 7.2 113 65 21
Iowa 2.2 56 57 11.3
Kansas 6 115 66 18
Kentucky 9.7 109 52 16.3
Louisiana 15.4 249 66 22.2
Maine 2.1 83 51 7.8
Maryland 11.3 300 67 27.8
Massachusetts 4.4 149 85 16.3
Michigan 12.1 255 74 35.1
Minnesota 2.7 72 66 14.9
Mississippi 16.1 259 44 17.1
Missouri 9 178 70 28.2
Montana 6 109 53 16.4
Nebraska 4.3 102 62 16.5
Nevada 12.2 252 81 46
New Hampshire 2.1 57 56 9.5
New Jersey 7.4 159 89 18.8
New Mexico 11.4 285 70 32.1
New York 11.1 254 86 26.1
North Carolina 13 337 45 16.1
North Dakota 0.8 45 44 7.3
Ohio 7.3 120 75 21.4
Oklahoma 6.6 151 68 20
Oregon 4.9 159 67 29.3
Pennsylvania 6.3 106 72 14.9
Rhode Island 3.4 174 87 8.3
South Carolina 14.4 279 48 22.5
South Dakota 3.8 86 45 12.8
Tennessee 13.2 188 59 26.9
Texas 12.7 201 80 25.5
Utah 3.2 120 80 22.9
Vermont 2.2 48 32 11.2
Virginia 8.5 156 63 20.7
Washington 4 145 73 26.2
West Virginia 5.7 81 39 9.3
Wisconsin 2.6 53 66 10.8
Wyoming 6.8 161 60 15.6

2. Problem

We have to find whether there is a significant relationship between speed and distance in the linear regression model and our significant level is .05

3. Solution

Now we will write the syntax for linear regression.

linear_regression<-lm(Assault ~ UrbanPop, data = USArrests)
print(linear_regression)

p- value in regression

As per the above outcome, our linear regression equation looks like this

Dist = 73.08 + (1.49)Speed

For the summary of the model, we will pass Summary() syntax

linear_regression<-lm(Assault ~ UrbanPop, data = USArrests)
summary(linear_regression)

Solution

P-value in our model is 0.06948 and it is more than the significant level which is 0.05. Hence, we can conclude that there is no relationship between the “Assault” and the “Urbanpop” variable and we can accept the null hypothesis.

Conclusion

P-value is introduced by Pearson in 1900. It is one of the preferred methods which researchers use to summarize the result of the problems they are dealing with. But taking decisions solely on P-value is not right, it is recommended to consider other contextual factors to derive scientific inferences. Not just P-value, everything from study design, logical assumptions, and quality of measurements are also important.

Recommended Articles

This is a guide to P-Value in Regression. Here we discuss the introduction to P-Value Regression along with the normal distribution, significant level and how to calculate and interpret the P-value of a regression model. You may also look at the following articles to learn more –

  1. Linear Regression in R
  2. Multivariate Regression
  3. Polynomial Regression
  4. Linear Regression Analysis
Popular Course in this category
Statistical Analysis Training (15 Courses, 10+ Projects)
  15 Online Courses |  10 Hands-on Projects |  140+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course

Related Courses

R Programming Training (13 Courses, 20+ Projects)4.9
All in One Data Science Bundle (360+ Courses, 50+ projects)4.8
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2023 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more