Introduction to Scatterplot in R
- R is an open-source programming language used for data statistics and data analysis. With the increasing popularity of data science, R has also gained popularity. It is mainly used by data statisticians and data miners for extracting valuable information from data. R is an interpreted language and has a command-line interface but there are many graphical user interfaces available for making developer’s jobs easier. R offers a large variety of libraries for implementing statistics and graphical techniques. R offers static graphics; it lets the user build a layered graph. Thus, it produces publication-quality graphs and provides a better representation of information.
- R offers a huge set of libraries for graphical implementation, but most popular is “ggplot2”. GGPlot2 an implementation of “Grammar of graphics” which makes the creation of complex graphs simple. It provides a programmatic interface for specifying variables, their position, the color of the graph, types of graph and other visualization properties. It lets you build graphs step by step, allowing you to create layers for extensive flexibility and publication quality.
- One such type of graph is Scatterplot in R. Scatterplot in R, also called a scatter chart, which is a type of graph that shows the correlation between two variables. It shows the data points in the form of dots. It can be drawn between a continuous independent variable and another variable that depends on the previous variable or two continuous independent variables. Correlation can be positive, negative or null. If the slope of the graph is from lower left to upper right, the correlation is positive. If the slope is from the upper left to lower right, the correlation is negative or in other words increase in the value of one variable will decrease in the value of another variable.
Syntax: There are many packages in R for graphs, therefore there are many functions for creating a Scatterplot in R. The most basic and simple function is
x denotes the horizontal axis or the independent continuous variable.
y denotes the vertical axis or the dependent variable.
There are many other parameters to plot function to make the graph easy to understand.
Below are some with a definition:
- main: adds a title to the graph
- xlab: add a label to the x-axis
- ylab: adds a label to the y-axis
- xlim: specifies the range of the x-axis
- ylim: specifies the range of the y-axis
- pch: indicates the shape of points in scatter plot
- cex: indicates the size of points
- col: defines the color of points
A Scatterplot in R can be created using the ggplot2 package as well. For this, we first need to install and load the ggplot2 package. After adding the package to the current session below command can be used to create a Scatterplot in R.
ggplot(dataset, aes(x, y, color, shape)) + geom_poin() + labs(x ,y, title)
- the dataset is the dataset for which scatterplot needs to be created.
- aes() is aesthetic mapping in a graph. It describes how variables are mapped on the graph.
- x is the horizontal axis or the independent continuous variable.
- y is the vertical axis or the dependent variable.
- color is to add color to points based on grouping variable.
- the shape is used to set shape based on grouping variable.
- + sign indicates that the command continues.
- geom_point() is function for scatter plot.
- labs(x, y, title): add x label, y label and title to graph.
Create Scatterplot In R
To create a Scatterplot in R, we first need to load the dataset. Here we are using dataset (mtcars) provided by R. First load the dataset into current session by using below command
4.5 (2,656 ratings)
Once the dataset loaded, view the data to get a basic understanding of type of data and columns in it using below command.
After getting a basic understanding of data, lets create a simple scatterplot using plot function
plot(iris$Sepal.Length, iris$Sepal.Width, xlim = c(4.0, 9.0), ylim = c(2.0, 5.0))
Adding labels to make graph readable
plot(iris$Sepal.Length, iris$Sepal.Width, xlim = c(4.0, 9.0), ylim = c(2.0, 4.0),xlab = “Sepal Length”, ylab = “Sepal Width”, main = “Width vs Length”)
Adding some more parameter to make graph more attractive
plot(iris$Sepal.Length, iris$Sepal.Width, xlim = c(4.0, 9.0), ylim = c(2.0, 4.0),xlab = “Sepal Length”, ylab = “Sepal Width”, main = “Width vs Length”, pch =8, cex =1.5,col =6)
Apart from these 2-D plots, matrix plots and 3-D plots can also be created in R.
When we have more than two variables in a dataset and we want to find a correlation of each variable with all other variables, then the scatterplot matrix is used. The most basic and simple command for scatterplot matrix is:
pairs(~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, data= iris, main =”Scatterplot Matrix”)
The above graph shows the correlation between weight, mpg, dsp, and cyl.
Sometimes a 3-dimensional graph gives a better understanding of data. For this R provides multiple packages, one of them is “scatterplot3d”. Below are the commands to install “scatterplot3d” into the R workspace and load it in the current session
After loading the library, the execution of the below commands will create a 3-D scatterplot.
scatterplot3d(Sepal.Length, Sepal.Width, Petal.Length, main = “3D Scatterplot”)
Apart from this, there are many other ways to create a 3-Dimensional. Users can also add details like color, titles to make the graph better. User can also create interactive 3D scatterplot by using “plot3D(x,y,z)” function provided by “rgl” package. This function creates a spinning 3D scatterplot that can be rotated using a mouse. Thus, giving a full view of the correlation between the variables.
R is one of the most famous languages for implementation of graphical techniques used by data scientists. It provides a wide range of packages and libraries for graphics and a better understanding of data. “gglpot2”, ”ggvis”, “rgl”, “plot3d”, “lattice”, “animation”, “gganimate”, “cairo” are some of the packages provided by R.
A scatter plot is the simplest way to get a better understanding of data. Using this visualization user can get to know how variables are related to each other, how changing value of one variable will change the value of other variables etc. The slope of the chart tells about the positive and negative relationship between the variables.
This is a guide to Scatterplot in R. Here we discuss an introduction, scatterplot matrices, scatterplot 3D, how to create scatterplot? along with appropriate examples. You can also go through our other suggested articles to learn more –