EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials R Programming Tutorial DataSet in R
Secondary Sidebar
R programming Tutorial
  • Basic
    • What is R Programming Language
    • Careers in R Programming
    • Install R
    • List of R Packages
    • Introduction of R Tools Technology
    • R Programming Language
    • DataSet in R
    • What is RStudio?
    • R-studio-Functions
    • R Packages
    • Time series?in R
    • R Data Types
    • R for data science
    • R Operators
    • R Data Frame
    • R Analytics Tool
    • R Tree Package
    • Vectors in R
  • Control statement
    • If Statement in R
    • If Else Statement in R
    • Else if in R
    • Switch Statement in R
  • Loops
    • Loops in R
    • For Loop in R
    • Nested For Loop in R
    • While Loop in R
    • Next in R
  • Chart/graphs
    • Graphs in R
    • Bar Charts in R
    • Pie Chart in R
    • Histogram in R
    • Line Graph in R
    • Plot Function in R
    • Scatterplot in R
    • R Boxplot labels
  • Regression in R
    • Simple Linear Regression in R
    • Linear Regression in R
    • Multiple Linear Regression in R
    • Logistic Regression in R
    • Poisson Regression in R
    • OLS Regression in R
    • P-Value in Regression
  • Anova in R
    • ANOVA in R
    • One Way ANOVA in R
    • Two Way ANOVA in R
  • Data Structure
    • R list
    • Arrays in R
    • Data Frames in R
    • Factors in R
    • R Vectors
  • Advanced
    • Statistical Analysis with R
    • R String Functions
    • Data Exploration in R
    • R CSV Files
    • KNN Algorithm in R
    • Sorting in R
    • lm Function in R
    • Hierarchical Clustering in R
    • R Normal Distribution
    • Binomial Distribution in R
    • Decision Tree in R
    • GLM in R
    • Arima Model in R
    • Linear Model in R
    • Predict Function in R
    • Survival Analysis in R
    • Standard Deviation in R
    • Statistical Analysis in R
    • Predictive Analysis?in R
    • T-test in R
    • Database in R
  • Programs
    • Functions in R
    • Boxplot in R
    • R Program Functions
    • Factorial in R
    • Random Number Generator in R
  • Interview question
    • R Interview Questions

Related Courses

R Programming Certification Course

Statistical Analysis Course Training

All in One Data Science Courses

DataSet in R

DataSet in R

Definition of DataSet in R

Dataset in R is defined as a central location in the package in RStudio where data from various sources are stored, managed and available for use. In today’s world of big data, it has always been a challenge to find data that is clean, reliable and the metadata of the dataset is easy to interpret. RStudio is an Integrated Development Environment that enables developers to build statistical models for graphics and statistical computing through programming.

Dataset in R are present within the format of the RStudio application that provides the required usability for the required use case. There are 2 formats available in the market, one being the RStudio Desktop and the other being RStudio Server. The description of the dataset though is format agnostic and hence suitable for any version that one is using.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

How to Read DataSet into R?

The dataset can be of 2 types, each having their individual way of reading the dataset. The first being the dataset that is pre stored in the package within RStudio from where the developer can access directly whereas on the other hand there is another form of dataset that can be present in raw format viz. excel, csv, database etc. Here we will look into the individual ways one by one. In the context of the dataset that is present in the RStudio package, we will see at limited number of examples but not limiting ourselves to the domain of dataset. Essentially, we will look into datasets which cater to the problem of classification and regressions individually.

From the pre-defined dataset in the package:

Most of the datasets are already available with the RStudio package exists in the repository named as “UCI Machine Learning”. The reason that these datasets are so popular is because of the following properties:

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,171 ratings)
  • One can download the dataset fast.
  • The datasets are small and hence can fit into memory.
  • The datasets are mostly cleaned and hence data cleaning process can be avoided, and one can quickly jump to running the algorithms quickly on them.

These packages are present in place that makes developers to download and use them in the projects conveniently through the bridge of Comprehensive R Archive Network (CRAN) which allows these third party libraries to download and keep the modules stored in the RStudio package.

Let us see at some of the datasets that are most famous for data science practitioner.

1. Datasets Library

This library comprises of comes in loaded with base version of the RStudio and hence there is no requirement of loading the library. There are various libraries that comes as a part of this bundle. One way to look into the various datasets are available in this library is by executing the following command.

Code:

library(help = "datasets")

2. Iris Dataset

This dataset contains the variety of an Iris flowers based on the different feature set and measurements of the flower. There are 3 types of varieties, that is categorized through 4 features set namely Sepal length, Sepal width, Petal length and Petal width. Loading the dataset can be performed by executing the following command.

Code:

data(iris)

This data is widely used for trying algorithms that cater to the genre of multi-class classification problem.

3. Longley’s Economic Dataset

This dataset contains the % people that were employed during a particular year on the basis of the various economic indicators. There are 6 different attributes that explains provides the % people employed in the column named as “Employed” and in future one can predict the % people that might be employed on the basis of the economic indicators in some defined year. Loading the dataset can be performed by executing the following command.

Code:

data(longley)

This data is widely used for trying algorithms that cater to the genre of regression problem.

4. mlbench Library

This library comprises of data regarding to the various real-world benchmark problems. One can install the library by executing the command.

Code:

install.packages("mlbench")

Loading the library can be done by executing the command.

Code:

library(mlbench)

Similar to the datasets library, one can execute the following code to get list of all the datasets in the library mlbench.

Code:

library(help = "mlbench")

5. Boston Housing Dataset

This dataset contains the prices of houses in the city of Boston on the basis of 13 features that are available in this dataset. Loading the dataset can be performed by executing the following command.

Code:

data(BostonHousing)

This data is widely used for trying algorithms that cater to the genre of regression problem.

6. Diabetes Dataset for Pima Indians (Female)

This dataset contains the presence of the diabetes in Pima Indians through 8 personal attributes like glucose, pressure, etc. Loading the dataset can be performed by executing the following command.

Code:

data(PimaIndiansDiabetes)

This data is widely used for trying algorithms that cater to the genre of binary classification problem.

7. AppliedPredictiveModelinglibrary

This library comprises of data that are present in one of the famous books of applied predictive modelling. One can install the library by executing the command.

Code:

install.packages("AppliedPredictiveModeling")

Loading the library can be done by executing the command:

Code:

library(AppliedPredictiveModeling)

Similar to the datasets library, one can execute the following code to get list of all the datasets in the library mlbench:

Code:

library(help = "AppliedPredictiveModeling")

From Raw Format Data File

The datasets are mostly present in some raw format like csv, excel.

Below we will see into the way how we load the dataset from.

CSV File:

<- read.csv(“<name and extension of file>”)

Excel files (Most popular way):

df_excel <- read.xlsx(“<name and extension of file>”, sheetIndex = <index of the sheet that needs to be loaded>)

Conclusion

With the end of this article we have looked at most popular datasets available in the context of RStudio. One can easily look into the other datasets that are mentioned in the libraries by looking into the documentation of the corresponding ones.

Recommended Articles

This is a guide to DataSet in R. Here we discuss the introduction, how to read DataSet into R? and from raw format data file respectively. You may also have a look at the following articles to learn more –

  1. What is R Programming Language?
  2. Data Link Layer Protocol
  3. R Program Functions
  4. Data Science Applications
Popular Course in this category
R Programming Training (13 Courses, 20+ Projects)
  13 Online Courses |  20 Hands-on Projects |  120+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course

Related Courses

Statistical Analysis Training (15 Courses, 10+ Projects)4.9
All in One Data Science Bundle (360+ Courses, 50+ projects)4.8
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more