Updated March 4, 2023
Introduction to Sqoop Export
Sqoop export is a command provided in sqoop used to export data to any relational database. Basically when there is a need to load data from the file into any table sqoop export command is used. Records can be ingested into the table from multiple data file present on HDFS directory delimiter by any delimiter unique in the file and must be known to the user. Also, the table which needs to get ingested must exist in the database and also schema of the table must be known to the user.
Syntax: Below is the generic sqoop export command syntax:
Sqoop export (generic-args) (export-control-args)
There are two types of arguments mentioned in the above command:
- Generic-args
- Export-control-args
Generic-Args: Generic arguments are types that are common and used for establishing the connection between sqoop and relational database like connect, username, etc. Below are a few generic arguments and their significance:
Connect: JDBC url will be provided using connect.
Username: Username to connect relational database will be provided using username.
Password: Password to connect relational database will be provided using a password.
Password-file: As providing a password using sqoop command can b risky for security reasons, it will be stored on HDFS in encrypted form and then the path to HDFS file will be passed to sqoop using password-file.
-p: command will be used to pass a file from the console.
-verbose: Command will be used to print job information in detail.
Export control arguments: These arguments as the name suggest used to control export like columns to export, export-dir where the file is present to export, input-null-string, input-not-null-string, etc. These are export control arguments. Either insert or update will be decided using export control argument update-key, update-mode. Few of commonly used export arguments are mentioned below with their description:
columns: A list of columns needs to be exported to the table.
export-dir: Directory on HDFS where data files are present.
m, –num-mapper: number of mappers used to export data to tables.
table: a table that needs to be exported.
update-mode: Update mode can be update-only or allow-insert. This parameter will be used to update values when update-key gets matched or records get inserted when update-key does not get matched.
update-key: columns that used to update a record. Key can be a single column or can be multiple columns separated by a comma.
input-null-string: String to be inserted or updated in the case when a column value is null.
input-null-non-string: value to be inserted or updated in the case when a column value is null.
How Does Export Command Work in Scoop?
Sqoop firstly reads the data from the source file and will divide the data into multiple blocks based on block size. This is internally managed by sqoop. Then sqoop establishes the connection between sqoop and relational database with respect to sqoop command. After establishing a connection Sqoop basically prepares to insert statement to load data into the table. As mentioned above there are two parameters update-mode and update key, if update mode is true the sqoop prepares an upsert statement based on the update-key otherwise it prepares to insert statement onset of tables. Sqoop first prepares a set of statements and the shoots on the database to update the records.
Examples of Sqoop Export
Here are the following examples mention below
Creating a database and table for demo purpose:
Create database if not exists sqoop_demo;
Use sqoop demo;
Create table if not exists students( roll_no int, name varchar(50));
Sample created table:
Sample data on HDFS:
Before sqoop export: The table is empty.
select * from students;
Sqoop Command to export data:
sqoop export
--connect "jdbc:mysql://quickstart.cloudera:3306/sqoop_demo" \
--username root \
--password cloudera \
--table students \
--export-dir /user/sqoop_demo \
--input-fields-terminated-by '|' \
--input-lines-terminated-by '\n'
--num-mappers 1
After Sqoop export: 3 records are exported successfully.
Screenshot of mysql:
select * from students;
Advantages of Export in Sqoop
- Sqoop export is a very proficient tool to export data from any format of file that can be either csv, tsv, orc or parquet, etc.
- Data can be loaded to any relational database using a JDBC connection. It loads the data to tables separated using a delimiter.
- Sqoop can also handle null characters while loading string or non-string. It exports data using by launching map-reduce in a distributed manner from the on-premise cluster.
- HDFS directory needs to be provided in export-dir parameter, the selected column can be exported using sqoop.
- Data can be merged into the table using sqoop if the update-mode is provided. Update-key can be used to update existing data using upsert.
- The number of mappers can be declared for exporting the data with respect to cluster configurations.
- The multi-row insert is possible using sqoop.
Conclusion
To ingest data into relational database tables from on-premise cluster sqoop is a commonly used tool. It can ingest data from almost all different file formats supported in Hadoop. Also, commands are simple to learn and execute. No need to write complex sql queries for data ingestion. Also, it inserts data by launching a map-reduce task for quick data ingestion.
Recommended Articles
This is a guide to Sqoop Export. Here we discuss the Examples of Sqoop Export along with the advantages and how does it work. You may also have a look at the following articles to learn more –