Updated March 27, 2023
Introduction to Database in R
R can easily connect to a number of relational databases such as MySQL, Oracle, SQL Server, etc. and retrieve records from them as a data frame. Once the data exists in the R environment, it becomes a normal R data set and can be controlled or analyzed using all efficient packages and functions. Data are Relational database systems which are kept in a structured format.
Further, we will be using MySql as our reference database for connecting to R.
Manipulation of Datasets in R
- Data can be imported using .csv files
- Data can be built inside workspace itself
- Data can be imported from any database
Data fetched from any source of import into the R environment is changed to normal data sets or data frames. These data frames can be analyzed and modeled to solve statistical problems. The database used here is MySQL database.
Package for Data Import Database in R
The package used for importing data from the database is the “RMySQL” package. This package can be downloaded from the CRAN itself using a command,
Code:
install.packages (“RMySQL”)
Output:
So, the packages get downloaded in your R library space. It can be easily loaded anytime to the workspace if the package is available already, otherwise, we need to download it for further use.
Load the Package Database in R
The installed package must be loaded in order to work further. To load the package, the following command is used,
Code:
library(RMySQL)
Output:
So, the required package is loaded into R workspace.
How to Read the DB Table?
The steps followed allows us to import the information of the DB table to R workspace. The tables from the DB are converted as data frames in R. Here are the steps to be followed.
Step 1: Connect the DB to R workspace
The database from the MySQL platform must be connected to R for the data import.
- Here “trainset_3” is the variable stores the DB connection to the global environment of the workspace. It can be named according to convenience
- The syntax explains as “dbconnect()” is the function used to connect the database and it contains parameters to be filled mandatorily. It connects the R environment to MySQL database using the parameters given.
- “MySQL()” mentions the type of database used for connection
- “user” asks for username of the database to get connected. Basically it is the credentials used to connect the DB environment. The same username is given here to connect R workspace.
- “password” asks for the password of the database to get connected. Basically it is the credentials used to connect the DB environment. The same password is given here to connect R workspace.
- “dbname” asks for the name of the database created in the MYSQL environment.
- “host” is the localhost of your computer machine.
- “port” is the port number used for MYSQL application. Basically that port listens to the application.
By executing this command, the database gets connected to R. And the variable is created in the global environment.
So, it holds the information like a class, which can be called anytime and the variables inside the class can be initiated. If the icon like the magnifying glass is clicked, it expands like
Step 2: DB data to R Data Frames
The problems handled in the R tool have train and test data. This data allows us to interpret and solve the problem. The R model learns patterns from train data and test data is used for evaluating the pattern learned from test data. Here the commands used to import the data from DB tables.
Code 1 – Train Data
- Here, “rs_trainset_prev” & “rs_trainset_curr” is the variable created for fetching the database tables. It can be named according to convenience.
- “dbSendQuery()” is the function containing parameter for data load into R. It fetches the data using database connection through SQL query.
- “trainset_3” the class created in the first step, containing all the information of database connection. Here this imprints where the data needs to be taken.
- “select * from prev_view” & “select * from curr_view” is the same select query used in MySQL to read the table information.
This is created as a class in the global environment.
If the icon like the magnifying glass is clicked near rs_trainset_prev, it expands like
Code 2 – Test Data
- Here, “train” & “test” is the variable name used to store the complete DB table into R data frame. Basically it is the training data for modeling.
- “fetch()” is the function used for fetching the data from DB using the given SQL query. It contains parameters to instruct the amount of data and where to be fetched.
- “rs_trainset_prev” is class inherited here to fetch the DB details
- “n= -1” represents the number of rows in the DB table to be fetched for R data frame. This “-1” indicates that the row needs to be fetched until the end. And it is a very important parameter in fetch().
After executing the code 2, two data frames are created, containing the rows and columns. Here Observations represent no.of.rows and variables represent no.of.columns from DB table.
When the test and train icon is clicked, it expands the data frame to a window.
Train DataFrame
Test Dataframe
How to write R Dataframes to DB?
The dataframe in R workspace can be exported as a table in DB:
- “library(RMySQL)” loads the package if it’s not available in the environment. If the packages are not loaded, the rest code results in an error.
- “trainset_3” loads the DB connection as mentioned above
- “dbWriteTable” is a function used for writing or exporting the dataframe to DB as a table. It has the following parameters and needs to give correctly.
- “trainset_3” is the DB connection class
- “resultant” is the name of the table exported to the database. It can be named according to convenience
- “export” is the dataframe available in R that to be exported. It can be named according to convenience
- “append” is the parameter with values TRUE/FALSE. TRUE – appends the rows of exported table & FALSE – does not append the table.
- “overwrite” is the parameter with values TRUE/FALSE. TRUE – overwrites the existing table & FALSE – does not overwrite the table.
These are the steps followed using the DB package in R.
Recommended Articles
This is a guide to Database in R. Here we discuss the manipulation of datasets, steps on how to read DB table and how to write R Dataframes to DB? with respective examples. You can also go through our other related articles to learn more –