Updated March 20, 2023
Introduction to Scatterplot in R
- R is an open-source programming language used for data statistics and data analysis. With the increasing popularity of data science, R has also gained popularity. It is mainly used by data statisticians and data miners for extracting valuable information from data. R is an interpreted language and has a command-line interface, but there are many graphical user interfaces available for making developer’s jobs easier. R offers a large variety of libraries for implementing statistics and graphical techniques. R offers static graphics; it lets the user build a layered graph. Thus, it produces publication-quality graphs and provides a better representation of information.
- R offers a huge set of libraries for graphical implementation, but the most popular is “ggplot2”. GGPlot2 is an implementation of “Grammar of graphics”, which makes the creation of complex graphs simple. It provides a programmatic interface for specifying variables, their position, the color of the graph, types of graph and other visualization properties. It lets you build graphs step by step, allowing you to create layers for extensive flexibility and publication quality.
- One such type of graph is Scatterplot in R. Scatterplot in R, also called a scatter chart, which is a type of graph that shows the correlation between two variables. It shows the data points in the form of dots. It can be drawn between a continuous independent variable and another variable that depends on the previous variable or two continuous independent variables. Correlation can be positive, negative or null. If the slope of the graph is from lower left to upper right, the correlation is positive. If the slope is from the upper left to the lower right, the correlation is negative or, in other words, an increase in the value of one variable will decrease in the value of another variable.
Syntax: There are many packages in R for graphs; therefore, there are many functions for creating a Scatterplot in R. The most basic and simple function is
plot(x, y)
where
x denotes the horizontal axis or the independent continuous variable.
y denotes the vertical axis or the dependent variable.
There are many other parameters to plot function to make the graph easy to understand.
Below are some with a definition:
- main: adds a title to the graph
- xlab: add a label to the x-axis
- ylab: adds a label to the y-axis
- xlim: specifies the range of the x-axis
- ylim: specifies the range of the y-axis
- pch: indicates the shape of points in scatter plot
- cex: indicates the size of points
- col: defines the color of points
A Scatterplot in R can be created using the ggplot2 package as well. For this, we first need to install and load the ggplot2 package. After adding the package to the current session below command can be used to create a Scatterplot in R.
ggplot(dataset, aes(x, y, color, shape)) + geom_poin() + labs(x ,y, title)
where
- the dataset is the dataset for which a scatterplot needs to be created.
- aes() is aesthetic mapping in a graph. It describes how variables are mapped on the graph.
- x is the horizontal axis or the independent continuous variable.
- y is the vertical axis or the dependent variable.
- color is to add color to points based on grouping variables.
- the shape is used to set shape based on grouping variables.
- + sign indicates that the command continues.
- geom_point() is function for scatter plot.
- labs(x, y, title): add x label, y label and title to graph.
Create Scatterplot In R
To create a Scatterplot in R, we first need to load the dataset. Here we are using the dataset (mtcars) provided by R. First, load the dataset into the current session by using the below command
data(iris)
Once the dataset is loaded, view the data to get a basic understanding of the type of data and columns in it using the below command.
iris
After getting a basic understanding of data, lets create a simple scatterplot using plot function
plot(iris$Sepal.Length, iris$Sepal.Width, xlim = c(4.0, 9.0), ylim = c(2.0, 5.0))
Adding labels to make graph readable
plot(iris$Sepal.Length, iris$Sepal.Width, xlim = c(4.0, 9.0), ylim = c(2.0, 4.0),xlab = “Sepal Length”, ylab = “Sepal Width”, main = “Width vs Length”)
Adding some more parameter to make graph more attractive
plot(iris$Sepal.Length, iris$Sepal.Width, xlim = c(4.0, 9.0), ylim = c(2.0, 4.0),xlab = “Sepal Length”, ylab = “Sepal Width”, main = “Width vs Length”, pch =8, cex =1.5,col =6)
Apart from these 2-D plots, matrix plots and 3-D plots can also be created in R.
Conclusion
R is one of the most famous languages for the implementation of graphical techniques used by data scientists. It provides a wide range of packages and libraries for graphics and a better understanding of data. “gglpot2”, ”ggvis”, “rgl”, “plot3d”, “lattice”, “animation”, “gganimate”, “cairo” are some of the packages provided by R.
A scatter plot is the simplest way to get a better understanding of data. Using this visualization, users can get to know how variables are related to each other, how changing the value of one variable will change the value of other variables etc. The slope of the chart tells about the positive and negative relationship between the variables.
Recommended Articles
This is a guide to Scatterplot in R. Here, we discuss an introduction, scatterplot matrices, scatterplot 3D, how to create scatterplot? along with appropriate examples. You can also go through our other suggested articles to learn more –