EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

Statistical Analysis in R

By Priya PedamkarPriya Pedamkar

Home » Data Science » Data Science Tutorials » R Programming Tutorial » Statistical Analysis in R

Statistical Analysis in R

Introduction to Statistical Analysis in R

Statistical Analysis is the process of applying statistical techniques and models to analyze the data to derive meaningful patterns. There are several concepts, methods, and tools available for statistical analysis. The commonly used statistical analysis techniques include identifying the data distribution on a dataset. Some of the statistical terminologies and symbols used while applying statistical analysis for business and research works. Identifying the mean, median and mode of a given data set are some of the primary steps to analyze the data. Statistical analysis is the core comment for the data science projects. There are specific programming languages such as R language which is widely used for statistical analysis.

Statistical Analysis Using R

Statistical analysis is the initial step when analyzing the dataset. Statistics is the foundation on which data mining or any other data related operations are carried out. R statistical analysis can be carried out with the help of a built-in function which is the essential part of the R base package. Functions such as mean, median, mode, range, sum, diff, mean and max are few of the built-in functions for statistical analysis in R. When working on the big data it is critical to determine the central tendency of a data set i.e representing the whole dataset with one value. In this article, we will look at inbuilt statistical functions like mean, median and mode and see how they are used to determine the central tendency of a dataset.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

1. Mean

Mean is calculated to determine the average of all the numerical variables in a data set. Mean can be further classified as “Sum of all values in the collection/Total count of the values in that particular collection.”

For instance, for the sample mean of the dataset of size n, can be shown as:

Statistical Analysis in R

  • N= Size of the data set
  • X= sample mean
  • Xi= numbers in the sequence

Now let’s look at the basic syntax for determining the mean in R.

syntax:

Mean(X, na.rm= False/True,…)

In the above syntax, mean operation can be performed with the help of the mean() operator in R, X is the input vector where the data is stored, na.rm is the function to remove the null values from the data set. By default, R has NA values in the variables. Multiple variables such as trim for dropping some observations from both ends of the sorted vector can be included while determining the mean value.

Popular Course in this category
R Programming Training (12 Courses, 20+ Projects)12 Online Courses | 20 Hands-on Projects | 116+ Hours | Verifiable Certificate of Completion | Lifetime Access
4.5 (5,953 ratings)
Course Price

View Course

Related Courses
Statistical Analysis Training (10 Courses, 5+ Projects)All in One Data Science Bundle (360+ Courses, 50+ projects)

Example:

In the below example, we will create a vector named temp and then use the vector to determine the mean using the mean() function.

# Creating a vector
temp <- c(12,9,6,4.1,19, 3, 44,-23,8,-3)
# to determine the mean
result.mean <- mean(temp)
print(result.mean)

Output:

Measn

2. Median

The median is the value that defines below fifty percent of the observations. In order to determine the median value manually, one would require to isolate the lowest fifty percent from the highest 50 percent. For data sets with an odd number of observations, the middle value is the median. The median falls halfway between the two mid values for data sets with an even number of observations.

Syntax:

Median(X, na.rm = FALSE)

In the above syntax, a median operation can be performed with the help of the median() operator in R, X is the input vector where the data is stored, na.rm is the function to remove the null values from the data set. By default, R has NA values in the variables. Similar to the syntax of mean multiple further arguments for methods can be included.

Example:

x <- c(5,2,3,4,5,2,4,5,2,3,1,1,2,3,5,6) # our data set
median(x)

Output:

Median

3. Mode

The mode is a summary statistic that is rarely used in practice but generally included in any tool and median discussion. In case, the selected variable has discrete values, Mode is the value that has occurred most frequently.

Syntax:

Mode(x, na.rm= False,...)

In the above syntax Mode() operator is used to perform the mode operation and na.rm is used to remove the null values while performing the mode operation.

Example:

#function to estimate mode
est_mode <- function(x) {
den <- density(x)
den$x[which.max(den$y)] }
# creating a test data set
x <- c(5, 5, 6, 4, 4, 2, 3, 1, 5, 3)
est_mode(x)

Output:

Mode

Statistical Analysis on Dataset

In this section, we will look at how statistical analysis can be carried out on a dataset using R. For the purpose of illustration we will be using the inbuilt dataset known as AirQuality. This dataset consists of multiple variables and includes NULL values. We shall consider one of the variables and determine mean, median and mode using R built-in tools.

#Determining Mean, Median, and Mode using air quality dataset.
#To return the dimension of air quality dataset
dim(airquality)

Statistical Analysis in R eg1

# returning top 5 rows
head(airquality)

Statistical Analysis in R eg2

# to return the structure of the data
str(airquality)

Statistical Analysis in R eg3

# display dataframe Summary
summary(airquality)

Statistical Analysis in R eg4

# Determining the mean, median and mode from the Solar variable
x <- airquality$Solar.R
x

eg5

# Determining the mean, median and mode from the Solar variable
x <- airquality$Solar.R
x

eg6

# to determine mean Null values need to be removed from the variable
x <- airquality$Solar.R
mean(x, na.rm = TRUE)

Statistical Analysis in R eg7

# to determine the median
> x <- airquality$Solar.R
> median(x)

 eg6

x <- airquality$Solar.R
median(x, na.rm = TRUE)

eg9

# to find mode
x <- airquality$Solar.R
sort(table(x))

Conclusion

In this article, we have seen how statistical analysis can be performed with R language’s built-in tool which is mean, median and mode. We have individually discussed mean, median and mode along with their syntax and a simple example. We have further seen running examples of performing statistical analysis on air quality datasets.

Recommended Articles

This is a guide to Statistical Analysis in R. Here we discuss the statistical analysis using R such as mean, median, and mode with example and code implementation. You may also look at the following articles to learn more-

  1. Linear Model in R
  2. How to Create Scatter plots in R?
  3. Implementation of OLS
  4. Implementing Poisson Regression

R Programming Training (12 Courses, 20+ Projects)

12 Online Courses

20 Hands-on Projects

116+ Hours

Verifiable Certificate of Completion

Lifetime Access

Learn More

0 Shares
Share
Tweet
Share
Primary Sidebar
R programming Tutorial
  • Advanced
    • Statistical Analysis with R
    • R String Functions
    • Data Exploration in R
    • R CSV Files
    • KNN Algorithm in R
    • Sorting in R
    • lm Function in R
    • Hierarchical Clustering in R
    • R Normal Distribution
    • Binomial Distribution in R
    • Decision Tree in R
    • GLM in R
    • Linear Model in R
    • Predict Function in R
    • Survival Analysis in R
    • Standard Deviation in R
    • Statistical Analysis in R
    • T-test in R
    • Database in R
  • Basic
    • What is R Programming Language
    • Careers in R Programming
    • Install R
    • List of R Packages
    • Introduction of R Tools Technology
    • R Programming Language
    • What is RStudio?
    • R-studio-Functions
    • R Packages
    • R Data Types
    • R Operators
    • Vectors in R
  • Control statement
    • If Statement in R
    • If Else Statement in R
    • Else if in R
    • Switch Statement in R
  • Loops
    • Loops in R
    • For Loop in R
    • Nested For Loop in R
    • While Loop in R
    • Next in R
  • Chart/graphs
    • Graphs in R
    • Bar Charts in R
    • Pie Chart in R
    • Histogram in R
    • Line Graph in R
    • Plot Function in R
    • Scatterplot in R
    • R Boxplot labels
  • Regression in R
    • Simple Linear Regression in R
    • Linear Regression in R
    • Multiple Linear Regression in R
    • Logistic Regression in R
    • Poisson Regression in R
    • OLS Regression in R
    • P-Value in Regression
  • Anova in R
    • ANOVA in R
    • One Way ANOVA in R
    • Two Way ANOVA in R
  • Data Structure
    • R list
    • Arrays in R
    • Data Frames in R
    • Factors in R
  • Programs
    • R Program Functions
    • Factorial in R
    • Random Number Generator in R
  • Interview question
    • R Interview Questions

Related Courses

R Programming Certification Course

Statistical Analysis Course Training

All in One Data Science Courses

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2020 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA Login

Forgot Password?

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you
Book Your One Instructor : One Learner Free Class

Let’s Get Started

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

Special Offer - R Programming Training (12 Courses, 20+ Projects) Learn More