EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials R Programming Tutorial Survival Analysis in R
Secondary Sidebar
R programming Tutorial
  • Advanced
    • Statistical Analysis with R
    • R String Functions
    • Data Exploration in R
    • R CSV Files
    • KNN Algorithm in R
    • Sorting in R
    • lm Function in R
    • Hierarchical Clustering in R
    • R Normal Distribution
    • Binomial Distribution in R
    • Decision Tree in R
    • GLM in R
    • Arima Model in R
    • Linear Model in R
    • Predict Function in R
    • Survival Analysis in R
    • Standard Deviation in R
    • Statistical Analysis in R
    • Predictive Analysis?in R
    • T-test in R
    • Database in R
  • Basic
    • What is R Programming Language
    • Careers in R Programming
    • Install R
    • List of R Packages
    • Introduction of R Tools Technology
    • R Programming Language
    • DataSet in R
    • What is RStudio?
    • R-studio-Functions
    • R Packages
    • Time series?in R
    • R Data Types
    • R for data science
    • R Operators
    • R Data Frame
    • R Analytics Tool
    • R Tree Package
    • Vectors in R
  • Control statement
    • If Statement in R
    • If Else Statement in R
    • Else if in R
    • Switch Statement in R
  • Loops
    • Loops in R
    • For Loop in R
    • Nested For Loop in R
    • While Loop in R
    • Next in R
  • Chart/graphs
    • Graphs in R
    • Bar Charts in R
    • Pie Chart in R
    • Histogram in R
    • Line Graph in R
    • Plot Function in R
    • Scatterplot in R
    • R Boxplot labels
  • Regression in R
    • Simple Linear Regression in R
    • Linear Regression in R
    • Multiple Linear Regression in R
    • Logistic Regression in R
    • Poisson Regression in R
    • OLS Regression in R
    • P-Value in Regression
  • Anova in R
    • ANOVA in R
    • One Way ANOVA in R
    • Two Way ANOVA in R
  • Data Structure
    • R list
    • Arrays in R
    • Data Frames in R
    • Factors in R
    • R Vectors
  • Programs
    • Functions in R
    • Boxplot in R
    • R Program Functions
    • Factorial in R
    • Random Number Generator in R
  • Interview question
    • R Interview Questions

Related Courses

R Programming Certification Course

Statistical Analysis Course Training

All in One Data Science Courses

Survival Analysis in R

By Priya PedamkarPriya Pedamkar

survival analysis in r

Introduction to Survival Analysis in R

Survival Analysis in R is used to estimate the lifespan of a particular population under study. It is also called ‘​ Time to Event Analysis’ as the goal is to predict the time when a specific event is going​ to occur. It is also known as the time to death analysis or failure time analysis. As an example, we can consider predicting a time of death of a person or predict the lifetime of a machine. For any company perspective, we can consider the birth event as the time when an employee or customer joins the company and the respective death event as the time when an employee or customer leaves that company or organization.

Data: Survival datasets are Time to event data that consists of distinct start and end time. In real-time datasets, all the samples do not start at time zero. A sample can enter at any point of time for study. So subjects are brought to the common starting point at time t equals zero (t=0). When the data for survival analysis is too large, we need to divide the data into groups for easy analysis.

Survival analysis is of major interest for clinical data. With the help of this, we can identify the time to events like death or recurrence of some diseases. It is useful for the comparison of two patients or groups of patients. The data can be censored. The term “censoring” means incomplete data. Sometimes a subject withdraws from the study and the event of interest has not been experienced during the whole duration of the study. In this situation, when the event is not experienced until the last study point, that is censored.

Necessary Packages

The necessary packages for survival analysis in R are “survival” and “survminer”. For these packages, the version of R must be greater than or at least 3.4.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

  • Survival: For computing survival analysis
  • Survminer​: For summarizing and visualizing the results of survival analysis.

The package names “survival” contains the function Surv(). This function creates a survival object. The function survfit() is used to create a plot for analysis.

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,527 ratings)

The basic syntax in R for creating survival analysis is as below:

Surv(time,event)
survfit(formula)

Where,

Time​ is the follow-up time until the event occurs. the event​ indicates the status of the occurrence of the expected event. the formula​ is the relationship between the predictor variables.

Types of Survival Analysis in R

There are two methods mainly for survival analysis:

  • Kaplan-Meier Method and Log Rank Test: This method can be implemented using the function survfit()​​ and plot()​​ is used to plot the survival object. The function ggsurvplot()​​ can also be used to plot the object of survfit.
  • Cox Proportional Hazards Models coxph(): This function is used to get the survival object and ggforest()​​ is used to plot the graph of survival object. This is a forest plot.

Implementation of Survival Analysis in R

First, we need to install these packages.

install.packages(“survival”)
install.packages(“survminer”)

To fetch the packages, we import them using the library() function.

library(“survival”)
library(“survminer”)

Let’s load the dataset and examine its structure. For survival analysis, we will use the ovarian dataset. To load the dataset we use data() function in R.

data(“ovarian”)

The ovarian dataset comprises of ovarian cancer patients and respective clinical information. It also includes the time patients were tracked until they either died or were lost to follow-up, whether patients were censored or not, patient age, treatment group assignment, presence of residual disease and performance status.

To inspect the dataset, let’s perform head(ovarian), which returns the initial six rows of the dataset.

head(ovarian)​​

Output:

Survival Analysis in R-1.1

Here, the columns are- futime​ – survival times fustat​ – whether survival time is censored or not age ​- age of patient rx​ – one of two therapy regimes resid.ds​ – regression of tumors ecog.ps​ – performance of patients according to standard ECOG criteria. Here as we can see, age is a continuous variable. So this should be converted to a binary variable. What should be the threshold for this? Let’s compute its mean, so we can choose the cutoff.

mean(ovarian$age)

Its value is equal to 56. Here taking 50 as a threshold. We will consider for age>50 as “old” and otherwise as “young”.

ovarian <- ovarian %>% mutate(ageGroup = ifelse(age >=50, "old","young"))
ovarian$ageGroup <- factor(ovarian$ageGroup)

Now we will use Surv() function and create survival objects with the help of survival time and censored data inputs.

survObj <- Surv(time = ovarian$futime, event = ovarian$fustat)
survObj

Output:

Survival Analysis in R-1.2

Here the “+” sign appended to some data indicates censored data.

Now to fit Kaplan-Meier curves to this survival object we use function survfit(). We can stratify the curve depending on the treatment regimen ‘rx’ that were assigned to patients. summary() of survfit object shows the survival time and proportion of all the patients.

survFit1 <- survfit(survObj ~ rx, data = ovarian)
summary(survFit1)

Output:

Survival Analysis in R-1.3

To view the survival curve, we can use plot() and pass survFit1 object to it. legend() function is used to add a legend to the plot.

plot(survFit1, main = "K-M plot for ovarian data", xlab="Survival time", ylab="Survival probability", col=c("red", "blue"))
legend('topright', legend=c("rx = 1","rx = 2"), col=c("red","blue"), lwd=1)

Output:

Output-1.4

Now let’s take another example from the same data to examine the predictive value of residual disease status.

survFit2 <- survfit(survObj ~ resid.ds, data = ovarian)
plot(survFit2, main = "K-M plot for ovarian data", xlab="Survival time", ylab="Survival probability", col=c("red", "blue"))
legend('topright', legend=c("resid.ds = 1","resid.ds = 2"), col=c("red", "blue"), lwd=1)

Output:

Output-1.5

Here as we can see, the curves diverge quite early. Here considering resid.ds=1 as less or no residual disease and one with resid.ds=2 as yes or higher disease, we can say that patients with the less residual disease are having a higher probability of survival. Now let’s do survival analysis using ​the Cox Proportional Hazards method. Using coxph()​​ gives a hazard ratio (HR). If HR>1 then there is a high probability of death and if it is less than 1 then there is a low probability of death.

survCox <- coxph(survObj ~ rx + resid.ds + age_group + ecog.ps, data = ovarian)
ggforest(survCox, data = ovarian)

First, we need to change the labels of columns rx, resid.ds, and ecog.ps, to consider them for hazard analysis.

ovarian$rx <- factor(ovarian$rx, levels = c("1", "2"), labels = c("A", "B"))
ovarian$resid.ds <- factor(ovarian$resid.ds, levels = c("1", "2"),
labels = c("no", "yes"))
ovarian$ecog.ps <- factor(ovarian$ecog.ps, levels = c("1", "2"), labels = c("good", "bad"))

Output:

Output-1.6

Here we can see that the patients with regime 1 or “A” are having a higher risk than those with regime “B”. Similarly, the one with younger age has a low probability of death and the one with higher age has higher death probability.

Recommended Articles

This is a guide to Survival Analysis in R. Here we discuss the basic concept with necessary packages and types of survival analysis in R along with its implementation. You may also look at the following articles to learn more –

  1. Multiple Linear Regression in R
  2. Example of Two Way ANOVA in R
  3. How Else if Statement Works in R?
  4. Examples of lm Function in R
Popular Course in this category
R Programming Training (13 Courses, 20+ Projects)
  13 Online Courses |  20 Hands-on Projects |  120+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course

Related Courses

Statistical Analysis Training (15 Courses, 10+ Projects)4.9
All in One Data Science Bundle (360+ Courses, 50+ projects)4.8
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more