EDUCBA Logo

EDUCBA

MENUMENU
  • Explore
    • EDUCBA Pro
    • PRO Bundles
    • Featured Skills
    • New & Trending
    • Fresh Entries
    • Finance
    • Data Science
    • Programming and Dev
    • Excel
    • Marketing
    • HR
    • PDP
    • VFX and Design
    • Project Management
    • Exam Prep
    • All Courses
  • Blog
  • Enterprise
  • Free Courses
  • Log in
  • Sign Up
Home Data Science Data Science Tutorials R Programming Tutorial DataSet in R
 

DataSet in R

Updated March 15, 2023

DataSet in R

 

 

Definition of DataSet in R

Dataset in R is defined as a central location in the package in RStudio where data from various sources are stored, managed and available for use. In today’s world of big data, it has always been a challenge to find data that is clean, reliable and the metadata of the dataset is easy to interpret. RStudio is an Integrated Development Environment that enables developers to build statistical models for graphics and statistical computing through programming.

Watch our Demo Courses and Videos

Valuation, Hadoop, Excel, Mobile Apps, Web Development & many more.

Dataset in R are present within the format of the RStudio application that provides the required usability for the required use case. There are 2 formats available in the market, one being the RStudio Desktop and the other being RStudio Server. The description of the dataset though is format agnostic and hence suitable for any version that one is using.

How to Read DataSet into R?

The dataset can be of 2 types, each having their individual way of reading the dataset. The first being the dataset that is pre stored in the package within RStudio from where the developer can access directly whereas on the other hand there is another form of dataset that can be present in raw format viz. excel, csv, database etc. Here we will look into the individual ways one by one. In the context of the dataset that is present in the RStudio package, we will see at limited number of examples but not limiting ourselves to the domain of dataset. Essentially, we will look into datasets which cater to the problem of classification and regressions individually.

From the pre-defined dataset in the package:

Most of the datasets are already available with the RStudio package exists in the repository named as “UCI Machine Learning”. The reason that these datasets are so popular is because of the following properties:

  • One can download the dataset fast.
  • The datasets are small and hence can fit into memory.
  • The datasets are mostly cleaned and hence data cleaning process can be avoided, and one can quickly jump to running the algorithms quickly on them.

These packages are present in place that makes developers to download and use them in the projects conveniently through the bridge of Comprehensive R Archive Network (CRAN) which allows these third party libraries to download and keep the modules stored in the RStudio package.

Let us see at some of the datasets that are most famous for data science practitioner.

1. Datasets Library

This library comprises of comes in loaded with base version of the RStudio and hence there is no requirement of loading the library. There are various libraries that comes as a part of this bundle. One way to look into the various datasets are available in this library is by executing the following command.

Code:

library(help = "datasets")

2. Iris Dataset

This dataset contains the variety of an Iris flowers based on the different feature set and measurements of the flower. There are 3 types of varieties, that is categorized through 4 features set namely Sepal length, Sepal width, Petal length and Petal width. Loading the dataset can be performed by executing the following command.

Code:

data(iris)

This data is widely used for trying algorithms that cater to the genre of multi-class classification problem.

3. Longley’s Economic Dataset

This dataset contains the % people that were employed during a particular year on the basis of the various economic indicators. There are 6 different attributes that explains provides the % people employed in the column named as “Employed” and in future one can predict the % people that might be employed on the basis of the economic indicators in some defined year. Loading the dataset can be performed by executing the following command.

Code:

data(longley)

This data is widely used for trying algorithms that cater to the genre of regression problem.

4. mlbench Library

This library comprises of data regarding to the various real-world benchmark problems. One can install the library by executing the command.

Code:

install.packages("mlbench")

Loading the library can be done by executing the command.

Code:

library(mlbench)

Similar to the datasets library, one can execute the following code to get list of all the datasets in the library mlbench.

Code:

library(help = "mlbench")

5. Boston Housing Dataset

This dataset contains the prices of houses in the city of Boston on the basis of 13 features that are available in this dataset. Loading the dataset can be performed by executing the following command.

Code:

data(BostonHousing)

This data is widely used for trying algorithms that cater to the genre of regression problem.

6. Diabetes Dataset for Pima Indians (Female)

This dataset contains the presence of the diabetes in Pima Indians through 8 personal attributes like glucose, pressure, etc. Loading the dataset can be performed by executing the following command.

Code:

data(PimaIndiansDiabetes)

This data is widely used for trying algorithms that cater to the genre of binary classification problem.

7. AppliedPredictiveModelinglibrary

This library comprises of data that are present in one of the famous books of applied predictive modelling. One can install the library by executing the command.

Code:

install.packages("AppliedPredictiveModeling")

Loading the library can be done by executing the command:

Code:

library(AppliedPredictiveModeling)

Similar to the datasets library, one can execute the following code to get list of all the datasets in the library mlbench:

Code:

library(help = "AppliedPredictiveModeling")

From Raw Format Data File

The datasets are mostly present in some raw format like csv, excel.

Below we will see into the way how we load the dataset from.

CSV File:

<- read.csv(“<name and extension of file>”)

Excel files (Most popular way):

df_excel <- read.xlsx(“<name and extension of file>”, sheetIndex = <index of the sheet that needs to be loaded>)

Conclusion

With the end of this article we have looked at most popular datasets available in the context of RStudio. One can easily look into the other datasets that are mentioned in the libraries by looking into the documentation of the corresponding ones.

Recommended Articles

This is a guide to DataSet in R. Here we discuss the introduction, how to read DataSet into R? and from raw format data file respectively. You may also have a look at the following articles to learn more –

  1. What is R Programming Language?
  2. Data Link Layer Protocol
  3. R Program Functions
  4. Data Science Applications

Primary Sidebar

Footer

Follow us!
  • EDUCBA FacebookEDUCBA TwitterEDUCBA LinkedINEDUCBA Instagram
  • EDUCBA YoutubeEDUCBA CourseraEDUCBA Udemy
APPS
EDUCBA Android AppEDUCBA iOS App
Blog
  • Blog
  • Free Tutorials
  • About us
  • Contact us
  • Log in
Courses
  • Enterprise Solutions
  • Free Courses
  • Explore Programs
  • All Courses
  • All in One Bundles
  • Sign up
Email
  • [email protected]

ISO 10004:2018 & ISO 9001:2015 Certified

© 2025 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA Login

Forgot Password?

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

🚀 Limited Time Offer! - ENROLL NOW