Introduction to Relational Database
A Relational Database is an assortment of relational data that are arranged in the form of tables, columns and rows, along with their relationship properties. Each table will have a uniquely identified key value, called as Primary key, which can be connected to other related tables with the help of a Foreign key from those tables. The data from the Relational database can be accessed, modified or managed by querying with the SQL (structured query language) programming language.
Accessing SQL data with R
R objects are managed within a single memory. With data analysis in R, data needs to be available in the form of data frames. The data frame is a two-dimensional array of heterogeneous data.
R can also help import data if it exists in the form of .csv. We can use functions like read.csv () and to export, we use to write.xxx (). It has the ability to save the data frames to the disk-like saveRDS (), save (), etc.
Why use a Relational database?
Relational databases in R can handle large amounts of data, maintains data consistency. By default, R will read all the data into memory before performing any analysis. Database servers will determine an efficient method to obtain results through queries.
As we seen above, we use data frame as a table, observation as row (tuple), variable as column (attribute), various ( subset, order, sort) as select statements.
R can connect to relational databases like MYSQL, Oracle and sql server. We use the data frames to fetch records. Once, we fetch the records it will be a dataset in R.
We can analyze and manipulate data according to our needs using powerful packages and functions.
R has an inbuilt package named “RMySQL” to provide connectivity to the MySQL database.
After the package is installed, we can create a connection and connect to the database.
mysqlconnection = dbConnect(MySQL(), user = 'root', password = '', dbname = 'table1',
host = 'localhost')
# listing the tables in DB
With dbListTables, it displays all the tables in that particular database.
querying and getting the data
result = dbSendQuery(mysqlconnection, "select * from names")
Store the result in an R data frame object. n = 3 is used to fetch the first 3 rows.
data.frame = fetch(result, n = 3)
|Id||First _name||Last _name|
The output displays the first 3 results from the table ‘names’.
Filter and querying results:
We can filter the results by any column name and display results.
Let’s suppose, we want to display all the rows with last name being ‘Taylor’
result = dbSendQuery(mysqlconnection, "select * from actor where Last_name = 'Taylor'")
Fetch all the records and store them as a data frame.
data.frame = fetch(result, n = -1)
Updating the Rows:
It is quite easy to modify data in RMySQL.Let’s look at the query below to understand the same.
dbSendQuery(mysqlconnection, "update names set last_name = Cooper where First_name = Jenna")
Dropping Tables in RMySQL:
We can also drop tables using the drop query.
dbSendQuery(mysqlconnection, 'drop table if exists names')
We can do such several operations with the RMySQL package and the only requirement is to install the package and connect to the particular database. It is then very similar to SQL queries.
Using the Dpylr Package
This dplyr package is particularly useful when there is already data in the database or in-memory data.
If the data fits in the memory, then if that data is stored in the database. It becomes slower in accessing and retrieving data compared to in-memory data.
If we have to do data analysis in R, most of the data is present in the database. We can import the data into the local database through ‘dpylr’ tools.
In order to use the dpylr package. We need to install the package first.
In the below example, Let’s see how to install dpylr package.
This package allows us to connect to backend databases according to our needs.
|RMySQL||My SQL and Maria|
|RPostgreSQL||Postgres and Redshift|
Connecting to the database:
After installing dpylr, we need to connect to the specific db first.
To connect to a DB,we use dbconnect().
conn <- DBI::dbConnect(RSQLite::SQLite(), path = ":dbname:")
Dplyr makes it easier to write queries using verbs. Most of them can write SQL queries using the dplyr package. Dplyr also automatically generates sql queries.
names_db %>% select(Id,First_name,Last_name)
|Id||First _name||Last _name|
DBI is used to separate connectivity between frontend and backend through DBMS. Backend communicates with specific DBMS like SQLite, MySQL, etc. Also, it provides drivers to be invoked using methods.
This is a guide to the Relational Database. Here we discuss the Accessing SQL data with R and Why use a Relational database. You may also look at the following article to learn more –