Updated March 14, 2023
Introduction to Redshift UNLOAD
Redshift UNLOAD is the process that works just opposite to that of the copy command in the Amazon redshift data warehouse platform. The unload command transfers all the data that is retrieved by the query in its output and stores the same in the Amazon S3 data lakes. Therefore, we can manipulate a large amount of data with great performance while report generation and analysis using the redshift data warehouse platform. The work of the copy command is to transfer all the data that is present in lakes or buckets of amazon S3 to the tables of the warehouse, while unload command does the exact opposite function of getting the data retrieved from the queries and store the same in buckets of lakes of Amazon S3.
Working of Redshift UNLOAD
- Most of the times, when we need to perform the analysis of the data in a way which cannot be done inside the Amazon redshift platform, such as in the case of machine learning or when we need our data to be used by multiple applications, then we will have to export the data from the tables of the redshift and move them to the Amazon S3 buckets.
- For this, the first step will be to try multiple queries of select and find the appropriate one that suits your requirement of exportation. Then, you can try different queries until you find the correct data retrieved in the query’s result. The next step will be to perform the unload command and transfer the data to S3 buckets.
Syntax of Redshift UNLOAD
Given below is the syntax of the redshift UNLOAD command:
UNLOAD ('query that retrieves proper data')
TO 'path of s3 object'
authorization
[ other parameters[ ... ] ]
Where other parameters can be:
{ [ FORMAT [ AS ] ] CSV | PARQUET
| ENCRYPTED [ AUTO ]
| ZSTD
| PARALLEL [ { ON | TRUE } | { OFF | FALSE } ]
| HEADER
| MAXFILESIZE [AS] maximum size [ MB | GB ]
| FIXEDWIDTH [ AS ] 'specification of fixed width'
| ADDQUOTES
| NULL [ AS ] 'string which has the value as NULL'
| BZIP2
| GZIP
| PARTITION BY ( name of column[, ... ] ) [ INCLUDE ]
| MANIFEST [ VERBOSE ]
| ESCAPE
| ALLOWOVERWRITE
| DELIMITER [ AS ] 'character to be delimited'
| REGION [AS] 'region of Amazon web service' }
Parameters used in UNLOAD command of redshift:
Let us see some of the most frequently used parameters from the above-mentioned syntax of unload command in Amazon Redshift:
- Query that retrieves proper data: This is the standard form of SELECT query, which will result in fetching those rows and columns having the data we want to transfer to the Amazon S3 cloud from the Redshift data warehouse.
- Authorization: In order to perform the unloading of data from the data warehouse of redshift to Amazon S3, the user who is executing the command should have the privilege to access and modify the data of S3. For this, we need to authorize the user firstly.
- PARTITION BY name of the column: While unloading process, if we mention the partition keys on the basis of which the partitions are to be made and the output files are to be stored in their respective folders are done automatically internally by the Redshift. While doing this partitions, Amazon redshift follows the same conventions as that of Apache Hive for partition creation and storage of data.
- Header: Whenever the output file containing the tabular data is generated, if we mention the header parameter, all the column names that act as a header for the tabular data are exported in output along with its data.
- MANIFEST [VERBOSE]: If we specify this parameter, output files containing the data and a detailed list of details of this output data files are created when the process of unload is being performed. This manifest file is written in JSON text format, which includes all the URLs of each output data file copied from a Redshift data warehouse and stored at Amazon S3.
- Character to be delimited: This delimiter helps in the specification of an ASCII character that is to be considered as a separator of fields when written in output files while unloading. The most commonly used delimiters are a comma (,), tab (t) or a pipeline symbol (|).
- Region [AS] “region of amazon web service”: This parameter helps in specifying the location of the S3 bucket in the Amazon AWS region where the destination of output files is located while unloading. Whenever there is a difference in the AWS regions of Amazon S3 and Redshift warehouse, we will have to specify the AWS region where the destination S3 bucket of Amazon exists.
Example of Redshift UNLOAD
Given below is the example of RedShift UNLOAD:
The use of the unload command, and its purpose can vary depending on the scenario where they have to be used. Now let us consider one example where we have a scenario that we have to unload a table present in redshift to a CSV file in the S3 bucket. For example, we have a table named “EDUCBA_Articles”. We have to unload this to a CSV file at a location in the S3 bucket with S3://EducbaBucket/myUnloadFolder/.
Hence, we will use the following command of unload for our use case.
Code:
unload ('select * from EDUCBA_Articles')
to 'S3://EducbaBucket/myUnloadFolder/'
‘name of my role’ ‘arn:aws:iam::154895623154582:role/MyRoleNameInRedshift’
CSV;
Our source table in the Redshift data warehouse looks like as shown below:
Output:
After executing the above command of unloading in redshift, we get the below output file in the specified location of the S3 bucket.
Conclusion
We can use the Redshift UNLOAD command to get the data from the data warehouse of redshift and move it to the S3 cloud to the path we have specified. While using this command, we can make the use of other parameters specified in syntax to mention additional details.
Recommended Articles
This is a guide to Redshift UNLOAD. Here we discuss the introduction, working of Redshift UNLOAD and example, respectively. You may also have a look at the following articles to learn more –