Introduction to Dynamic Partitioning in Hive
Partitioning is an important concept in Hive that partitions the table based on data by a set of rules and patterns. Dynamic partition is a single insert to the partition table. We don’t need explicitly to create the partition over the table for which we need to do the dynamic partition. Lots of sub-directories are made when we are using the dynamic partition for data insertion in Hive.
To Enable the dynamic partition we use the following HIVE Commands:
set hive.exec.dynamic.partition = true;
This will set the dynamic partitioning for our hive application.
set hive.exec.dynamic.partition.mode = nonstrict;
This will set the mode to non-strict. The non-strict mode means it will allow all the partition to be dynamic.
Dynamic partitions can also be called as variable partitioning. Variable partitioning means the partitions are not configured before execution else it is made during run time depending on the size of file or partitions required. It ensures the best way of the utilization of RAM and the distribution of memory.
In the case of dynamic partition, every row data is read and is partitioned with a Map-reduce job. By default, the dynamic partitioning is disabled in hive just to prevent accidental partitions.
To use this we need to set some property in a hive or the hive configuration xml file.
4.5 (2,698 ratings)
This is used to enable the dynamic Partition in Hive
Non strict mode means the table will not have any static partition
Maximum no of partitions that can be created with dynamic partition with one statement
This is the maximum number of partitions created by each mapper and reducer
So basically with these values, we are telling hive to dynamically partition the data based on the size of data and space available. Generally as compared to static dynamic partition takes more time to load the data and the data load is done from a non-partitioned table. We can perform the partitioning in both managed as well as an external table.
How Dynamic Partition Works?
Let us look for an Example of how Dynamic Partition works:
- We need to create a non-partitioned table to store the data may be a staging table.
- We will take an EMP table for our reference:
Create table stud_demo ( id int , name string , age int , institute string , course string)
row format delimited fields terminated by “,”;
- Load the Data in Table from any external source say it a text file : –
LOAD DATA local inpath ‘path name’ into table employee_np;
- Now Create a partitioned table where we want to insert the data with dynamic partition.
Create table student_part ( id int , name string , age int , institute string)
Partitioned by (course string)
Row format delimited fields terminated by “,”;
- Once this table is created we can check for the partition where the partition is done in the right way or not with the following commands:
SHOW PARTITIONS student_part;
- Insert the data we want to insert with the partition needed:
Insert into student_part partition(course)
Select id,name,age,institute,course from stud_demo;
- With this Query, we can insert data with the dynamic partition of Table over column course.
Advantages of Dynamic Partition
- Good for loading huge files in tables.
- Row wise data is read.
- Partition is based on memory and RAM available so resources are utilized well all over.
- Generally used to load data from the non-partitioned table.
- If columns count is unknown and we want to partition data based on columns dynamic partition is used.
- Data load is distributed horizontally.
- Generally, the query processing time is reduced.
- The column values over which partition is to be done are known at RUN TIME.
- We can use to load data from the table that is not partitioned.
- Both external and managed tables can be used for dynamic partition.
Disadvantages of Dynamic Partition
- It generally takes more time in loading data as compared to static partition.
- We cannot perform alter on Dynamic Partition.
- Having large no of partition makes the possibility of creating overhead for NameNode.
- Query processing sometimes can take more time to execute.
- It can be sometimes a costly operation.
From the above article, we saw how dynamic partition is used in the hive and what are the ways of creating it. We also check the advantage of having a dynamic partition over the hive and also the ways to use it. So from this article, we can have a fair idea of how dynamic partitioning works in the hive and what is the advantage of having it.
This is a guide to Dynamic Partitioning in Hive. Here we discuss the introduction, how dynamic partition works, advantages and disadvantages of partitioning in hive. You can also go through our other suggested articles to learn more –