Updated March 21, 2023
Introduction on Hive Drop Table
The keyword “DROP” refers to deletion. For the deletion of data, we require data to be present in the hive.
In Hadoop, we have two functionalities:
- Data Storage
- Data Processing
For data storage, HDFS (Hadoop Distributed File System) comes into the picture. Now when we say we have data in hive table it means two things:
- Data is in HDFS
- We have a hive table created over that HDFS file, and we load that HDFS file’s data into the hive table.
Basically, for the hive drop table to have the data, the data file is a prerequisite. In this article, we will see how to drop tables in the hive, what happens when the table is dropped and all things related to the drop table in the hive.
Types of Drop Table in Hive
In the hive, there are two types of tables:
- Internal Table or Managed Table
- External Table or Unmanaged Table
Managed Table/Internal Table
- In Hive,” user/hive/warehouse” is the default directory. Internal tables are stored in this directory by default. We do not have to provide the location manually while creating the table.
- “Drop table” command deletes the data permanently.
- Hive manages all the security for managed tables.
I do have a table already present in the “user/hive/warehouse” directory called “codes”.
To check if the existing table is managed or unmanaged, we could use the below command:
Describe formatted table_name;
Let us see the data presented in the table “codes”.
First, using hive command-
Second, using Hue (Hadoop User Experience a Web UI)
Delete command: Drop table table_name;
Now, if I want to select the data from “codes” it will give me an error because the table is deleted.
Also, it will not be able to see this table in the default directory which is
Unmanaged table/ External table
- External tables, we are required to provide the path where we need to store that table using the keyword ‘location’ in create table command.
CREATE EXTERNAL TABLE stg_s2_json.products
( product no string, product name string, description string, active string, created date string, updated date string) row format delimited fields terminated by ‘,’
- Hive only deletes the metadata. Data is permanent.
- These tables could be used by anyone who has access to HDFS, so they need to manage security at the folder level.
For understanding the dropping of the external table, we will use the table “products”.
Let’s check if the table is internal or external. Again, “describe formatted table_name” command.
Observe “limit 10” in the select command. Table Products contains the below data:
To check it in Hue, it looks like this:
Let’s see what happens when we drop this table:
Drop table table_name;
Now, if trying to retrieve the table’s data, It throws an error.
First, using the “select” command on the terminal, it is going to throw me an error which means the metadata for the external table is deleted.
Observe Error here:
Second, checking on hue the state of data, the file “products.json” is still present in HDFS, which means the data is permanent.
I am going to make it easy and provide you with key points for both kinds of tables. You decide which type will suit your requirements.
|Also called “Managed Table.”||Also called “Unmanaged table.”|
|No need to provide location, Hive default directory manages this data.||Need to provide location|
|Deletes table’s metadata as well as (Data is temporary)||Hive will leave the data untouched(Data is permanent)|
|The hive itself controls the security of the table.||Need to manage security at the folder level|
This is a guide to Hive Drop Table. Here we discuss a brief overview with types of Drop Table in Hive along with Syntax respectively. You can also go through our other suggested articles to learn more –