Introduction to Hive Database
Hive Databases provides the facility to store and manage the huge records or datasets on top of a distributed Hadoop platform. It is having the capability to store the structure and semi-structured data. With JDBC Driver’s help, the end-user can connect with the Hive database and fetch the data/records with the support of SQL Query. In the Hive, the database framework is developed by Facebook to analyze structured data or semi-structured data. The database framework supports all the regular commands like create, drop, alter, use database.
We can also use the database layer with different Hadoop components like Hbase, Kafka, spark, different streaming tools, etc.
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database name
[COMMENT ‘database information | database comment’ ] [LOCATION ‘ HDFS Path’ ] [WITH DATABASE PROPERTIES (property name= property value, ...)];
How to Create Database in Hive?
We can create the database as similar to the SQL one. But the major thing is whether the tables are created in the internal table format or external table format. If all the tables are created in the internal table format, the database is also called an internal database. If all the tables are created in an external table format, the database is also called an external database. In the database, the data is stored in a tabular manner. The default storage location of the hive database varies from the hive version. From HDP 3.0, we are using version 3.0 and more. The default database location was changed. The location for external hive database is “/warehouse/tablespace/external/hive/” and the location for manage database is “/warehouse/tablespace/managed/hive”.
In the older version of the hive, the hive database’s default storage location is “/apps/hive/warehouse/”.
Query to Create Database
create database organization;
When you need to access the hive DB, we need to establish the connection between the hive client and the hive server. To establish the connection, we need to configure the hive JDBC / ODBC connection string in the hive client and establish the hive server’s connection. With the help of Hive view, Hive CLI, beeline (interactive and non-interactive), or different third party client software, we can connect with the database.
- JDBC and ODBC are two different ways to connect with Hive. Few applications like MSTR support ODCB connection only. We need to configure the MSTR “ini” file. Few applications like Zeppelin, squirrel client, support the JDBC connection.
- With the help of Knox gateway, we can able to connect with Database.
- Hive View is only available in Ambari 2.6 or less. From version 2.7 and above Hive view is not available.
How to Modify or Alter Database?
We have the facility to alter or modify the existing database. We can also change the existing database comments, database properties, etc., of the database.
We can also alter the database as below.
1. Alter in Database Properties
ALTER ( DATABASE | SCHEMA ) database name SET database properties ( property name = property value, ...);
2. Alter the Owner
ALTER (DATABASE| SCHEMA) database name SET OWNER [USER|ROLE ] user or role;
3. Alter the HDFS Location
ALTER (DATABASE|SCHEMA ) database name SET LOCATION HDFS Path;
Hive Query to Alter Database
ALTER DATABASE organization SET DBPROPERTIES ('Date' = '2020-03-31');
How to Drop Database in Hive?
In the hive, we can drop or delete the database form from the hive system. By default, there is a restriction on the drop database command. We cannot drop the exiting database with subject to the database is empty. If the database is empty, then only we can drop the database. If we have the tables in the database, then there is a restriction on the drop. There are two ways to drop the database having the tables in it.
- First Way: We need to drop all the tables that are present in the current database. Then we can drop the current hive database.
- Second Way: If we want to drop the hive database without dropping the current database tables. Then we can use the “CASCADE” keyword in the drop query. With the help of “CASCADE”, we can drop the hive database without touching the database’s internal tables.
Hive Drop Query with Empty Database
drop database organization;
In the above screenshot 1, we can see the hive database “organization” is empty. We can easily drop the database using a hive query.
Hive Drop Query with Table Contain
drop database company CASCADE;
As per the above screenshot, we can see the “company” DB has two tables. (table1: products and table2: sales).
Suppose we will directly use the drop command “drop database company;” to drop the company DB. Then the command will not work due to tables are present in it. Please refer screenshot 3
When we use the “CASCADE” keyword in the query, we can drop the database. Please refer to screenshot 4.
We have seen the uncut concept of “Hive Database” with the proper example, explanation, syntax, SQL Query with different screenshots. This framework resides on top of Hadoop. It will summarize the huge datasets/records and do querying on it. Helpful for easy analysis on top of it. If the datasets increase, we can add the number of machines to it without impacting the performance.
This is a guide to Hive Database. Here we discuss the Introduction and how to Create a Hive Database and practical examples and different subquery expressions. You can also go through our related articles to learn more –