Updated September 2, 2023

Introduction
How to Create?
How to Modify or Alter?
How to Drop?
Conclusion
FAQs

Introduction to Hive Database

Hive Databases provides the facility to store and manage huge records or datasets on top of a distributed Hadoop platform. It can store structured and semi-structured data. With JDBC Driver’s help, the end-user can connect with the Hive database and fetch the data/records with the support of SQL Query. In the Hive, the database framework is developed by Facebook to analyze structured data or semi-structured data. The database framework supports all the regular commands like create, drop, alter, and use database.

Popular Course in this category

HIVE - Specialization | 7 Course Series

We can also use the database layer with different Hadoop components like Hbase, Kafka, Spark, various streaming tools, etc.

Note: The Hive server or service contains the default database. The default database name is “default”. While creating the table, we forget to specify which hive database we need to create the database. Then, the tables will be created in the “default” database.

Syntax

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database name
[COMMENT 'database information | database comment' ]
[LOCATION ' HDFS Path' ]
[WITH DATABASE PROPERTIES (property name= property value, ...)];

How to Create a Database in Hive?

We can create a database similar to the SQL one. But the major thing is whether the tables are created in the internal table format or external table format. If all the tables are created in the internal table format, the database is also called an internal database. If all the tables are created in an external table format, the database is also called an external database. In the database, the data is stored in a tabular manner. The default storage location of the hive database varies from the hive version. From HDP 3.0, we are using version 3.0 and more. The default database location was changed. The location for the external hive database is “/warehouse/tablespace/external/hive/” and the location for the manage database is “/warehouse/tablespace/managed/hive”.

In the older version of the Hive, the Hive database’s default storage location is “/apps/hive/warehouse/”.

Query to Create Hive Database

To create a database, you can utilize the CREATE DATABASE instruction in HiveQL. Here’s an example:

Query

create database organization;

Output

When accessing the Hive DB, you must establish the connection between the Hive client and server. To establish the connection, we need to configure the Hive JDBC/ODBC connection string in the hive client and establish the hive server’s connection. We can connect with the database with the help of Hive View, Hive CLI, Beeline(interactive and non-interactive), or different third-party client software.

Note

JDBC and ODBC are two different ways to connect with Hive. Few applications like MSTR support ODCB connection only. We need to configure the MSTR “ini” file. Few applications, like Zeppelin and Squirrel client, support the JDBC connection.
With the help of Knox Gateway, we can connect with the Database.
Hive View is only available in Ambari 2.6 or less. From version 2.7 and above, Hive view is not available.

How to Modify or Alter Hive Database?

We can modify the database, including comments and properties. We can also alter the database as below.

Alter in Database Properties

When using Hive, you can change various configuration settings related to a database by altering its properties. This includes modifying its description, location, and ownership. By altering database properties, you can effectively manage and organize your databases while keeping metadata and access control intact.

Example

ALTER(DATABASE | SCHEMA) database name SET database properties (property name = property value, ...);

Alter the Owner

In a database or data management system, altering the owner refers to changing the user or entity with administrative control and permissions over a particular database or object. This process allows for transferring ownership rights, enabling a different user or entity to manage and access the database’s contents, often for administrative or security reasons.

Example

ALTER (DATABASE| SCHEMA) database name SET OWNER [USER|ROLE ] user or role;

Alter the HDFS Location

If you need to change the HDFS location of a Hive table, you can use the ALTER TABLE statement with the SET LOCATION clause. Include the new HDFS path where you want to store the table’s data. This will move the data to the new location and update the table’s location metadata. You can then reorganize or relocate the data as needed, and Hive will query the table from the updated location.

Example

ALTER (DATABASE|SCHEMA ) database name SET LOCATION HDFS Path;

Hive Query to Alter Database

In Hive, you cannot directly alter a database using an ALTER DATABASE statement as you would with tables. Instead, you typically work with tables within a database. To modify a database’s properties or permissions, you must use administrative actions outside of Hive, such as modifying the database directory or changing the Hadoop file system-level permissions. It primarily focuses on altering tables, partitions, and other objects within a database.

Query

ALTER DATABASE organization SET DBPROPERTIES ('Date' = '2020-03-31');

Output

How to Drop Database in Hive?

We can drop or delete the database form from the Hive system in the Hive. By default, there is a restriction on the drop database command. We cannot drop the existing database with the subject to the database is empty. If the database is empty, then only we can drop the database. If we have the tables in the database, then there is a restriction on the drop. There are two ways to drop the database having the tables in it.

First Way: We must drop all the tables in the current database. Then, we can drop the current hive database.
Second Way: If we want to drop the hive database without the current tables. Then, we can use the “CASCADE” keyword in the drop query. With the help of “CASCADE”, we can drop the hive database without touching the database’s internal tables.

Hive Drop Query with Empty Database

In Hive, when you issue a DROP DATABASE query for an empty database, the database is deleted without any issues. Hive will remove the empty database and all associated metadata, leaving no data or objects behind. This straightforward operation doesn’t result in data loss or complications since there’s no data to remove.

Query

drop database organization;

Output

The above shows that the hive database “organization” is empty. We can easily drop the database using a hive query.

Hive Drop Query with Table Contain

A “DROP TABLE” query permanently removes a table and its associated data from the database in Hive. Executing a “DROP TABLE” query with a table name deletes the table, metadata, and data files, freeing up storage space. This operation is irreversible; It’s essential to exercise caution when using this function and ensure that the necessary permissions are in place before execution.

Query

drop database company CASCADE;

Output

As per the above screenshot, we can see the “company” DB has two tables. (table1: products and table2: sales).
Suppose we use the drop command “drop database company;” to drop the company DB. Then the command will not work due to tables are present in it.

We can drop the database when we use the “CASCADE” keyword in the query.

Conclusion

We have seen the uncut “Hive Database” concept with the proper example, explanation, syntax, and SQL Query with different screenshots. This framework resides on top of Hadoop. It will summarize the huge datasets/records and do querying on them. Helpful for easy analysis on top of it. If the datasets increase, we can add the number of machines without impacting the performance.

FAQs

Q1. Is Hive a SQL language?

Ans: Hive uses a SQL-like query language called HiveQL. While not identical to traditional SQL, it provides a familiar syntax for querying and managing data stored in Hive tables. Users can write queries using HiveQL to interact with large-scale data in the Hadoop Distributed File System (HDFS) without requiring a predefined schema, making it accessible to those familiar with SQL.

Q2. Is Hive an ETL tool?

Ans: Hive is mainly used for data warehousing and querying large datasets in a Hadoop environment, using SQL-like operations to manipulate data. However, it does not offer the advanced data transformation and workflow management features in specialized ETL tools. Hive is best for querying and reporting on data rather than carrying out complete ETL processes.

Quiz Result
Total Questions	Correct Answers	Wrong Answers	Percentage

Table of Contents

Introduction to Hive Database

Syntax

How to Create a Database in Hive?

Query to Create Hive Database

Query

Output

Note

How to Modify or Alter Hive Database?

Alter in Database Properties

Example

Alter the Owner

Example

Alter the HDFS Location

Example

Hive Query to Alter Database

Query

Output

How to Drop Database in Hive?

Hive Drop Query with Empty Database

Query

Output

Hive Drop Query with Table Contain

Query

Output

Conclusion

FAQs

Q1. Is Hive a SQL language?

Q2. Is Hive an ETL tool?

Recommended Articles

Follow us!

APPS

Blog

Courses

Email