Updated March 27, 2023
Introduction to Dynamic Partitioning in Hive
Partitioning is an important concept in Hive that partitions the table based on data by rules and patterns. Dynamic partition is a single insert to the partition table. We don’t need explicitly to create the partition over the table for which we need to do the dynamic partition. Lots of sub-directories are made when we are using the dynamic partition for data insertion in Hive.
Syntax
To Enable the dynamic partition, we use the following HIVE Commands:
set hive.exec.dynamic.partition = true;
This will set the dynamic partitioning for our hive application.
set hive.exec.dynamic.partition.mode = nonstrict;
This will set the mode to non-strict. The non-strict mode means it will allow all the partition to be dynamic.
It can also be called as variable partitioning. Variable partitioning means the partitions are not configured before execution else it is made during run time depending on the size of file or partitions required. It ensures the best way of the utilization of RAM and the distribution of memory.
In a dynamic partition, every row data is read and partitioned with a Map-reduce job. By default, the dynamic partitioning is disabled in a hive to prevent accidental partitions.
To use this, we need to set some property in a hive or the hive configuration XML file.
<property>
<name>hive.exec.dynamic.partition</name>
<value>true</value>
</property>
<property>
This is used to enable the dynamic Partition in Hive
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
Non strict mode means the table will not have any static partition
<name>hive.exec.max.dynamic.partitions</name>
<value>1000</value>
</property>
<property>
Maximum no of partitions that can be created with dynamic partition with one statement
<name>hive.exec.max.dynamic.partitions.pernode</name>
<value>100</value>
</property>
<property>
This is the maximum number of partitions created by each mapper and reducer
So basically with these values, we tell hive to dynamically partition the data based on the size of data and space available. Generally, as compared to static, dynamic partition takes more time to load the data, and the data load is done from a non-partitioned table. We can perform the partitioning in both managed as well as an external table.
How Dynamic Partition Works?
Let us look for an Example of how Dynamic Partition works:
- We need to create a non-partitioned table to store the data may be a staging table.
- We will take an EMP table for our reference:
Query:
Create table stud_demo ( id int , name string , age int , institute string , course string)
row format delimited fields terminated by “,”;
- Load the Data in Table from any external source say it a text file: –
LOAD DATA local inpath ‘path name’ into table employee_np;
- Now Create a partitioned table where we want to insert the data with dynamic partition.
Query:
Create table student_part ( id int , name string , age int , institute string)
Partitioned by (course string)
Row format delimited fields terminated by “,”;
- Once this table is created, we can check for the partition where the partition is done in the right way or not with the following commands:
SHOW PARTITIONS student_part;
- Insert the data we want to insert with the partition needed:
Insert into student_part partition(course)
Select id,name,age,institute,course from stud_demo;
- With this Query, we can insert data with the dynamic partition of Table over column course.
Advantages of Dynamic Partition
- Good for loading huge files in tables.
- Row wise data is read.
- Partition is based on memory and RAM available, so resources are utilized well all over.
- Generally used to load data from the non-partitioned table.
- If columns count is unknown and we want to partition data based on columns, a dynamic partition is used.
- Data load is distributed horizontally.
- Generally, the query processing time is reduced.
- The column values over which partition is to be done are known at RUN TIME.
- We can use to load data from the table that is not partitioned.
- Both external and managed tables can be used for dynamic partition.
Disadvantages of Dynamic Partition
- It generally takes more time in loading data as compared to static partition.
- We cannot perform alter on Dynamic Partition.
- Having large no of partition makes the possibility of creating overhead for NameNode.
- Query processing sometimes can take more time to execute.
- It can sometimes be a costly operation.
Conclusion
From the above article, we saw how it is used in the hive and how to create it. We also check the advantage of having a dynamic partition over the hive and how to use it. So from this article, we can have a fair idea of how it works in the hive and its advantage.
Recommended Articles
This is a guide to Dynamic Partitioning in Hive. Here we discuss the basic concept, how dynamic partition works, and the advantages and disadvantages of Partitioning in Hive. You can also go through our other suggested articles to learn more –