Introduction to Impala Database
Database is a logical collection of n number of tables, views or functions which are related to each other. There can be a separate or common database of different application but common practice is to use different databases for different applications. Data for tables are stored on HDFS in directories which are in the form of a tree. Operations performed on HDFS can be performed on these table directories. The database should be selected before starting any operations. Default database will be selected if the database is not changed before shooting any command.
How to Create a Database?
Creating Database in Impala is almost similar to creating a Database in MySQL or SQL Server. Database in impala is created using below command:
Create database [if not exists] <database_name> [comment "any comment related to database"] [location of database]
Creating a Table inside the Database:
Parameters of Impala Database
Command consists of few optional parameters in square brackets as shown below:
1. [if not exists]: As the parameter itself suggests, the database will be created only if it is not already created. If we don’t include the parameter while creating a database and the database is already present in impala then it will throw an error saying the database already exists. So we should take care of accordingly.
2. [comment “any comment related to database”]: Using this parameter comment can be written while creating a database like an author information, any specific database property, or any specific reason which is related to application and database.
3. [location of database]: Database location i.e. HDFS directory can be given while creating a database so it will save the files in those directories. The database directory will be created with the database_name.db extension. Directories inside the Database directory will be created and data will be saved in table directories
4. Databases that are created in impala can also be accessed in the hive. To access the hive database in impala invalidate metadata commands need to be executed so that impala will be aware of the new database.
5. A database can also be created to point to the amazon s3 storage location. If the database is pointing to s3 location then directories of tables, database, or partitions will be created on amazon s3 only.
Selecting Impala Database from Multiple Databases
As we know there can be multiple databases present in impala. So if we want to make some changes in any table first we need to select a database to do so we need to run below command:
Use <database name>;
Before Selecting Database Impala Points to the Default Database:
Run Use database:
Run Show tables:
We can see that impala points to impala_demo Database:
This command will change the current selection of the database and whatever command we run will be operated on the selected database. If any command like create a table or insert into runs without prefixing database name prior to table name then it will be executed on the current selected database. By default when impala-shell starts it will select default as the current database. But it is possible to specify database name while starting impala-shell by using -d db_name option.
How to Drop the Impala Database?
Drop database command is used to remove any database from a system. This operation will delete the physical directory from the HDFS location and also will delete the metadata related to the database stored in metastore. Below is the command used to drop any database:
Drop database [if exists] <database_name> [Restrict|cascade]
Dropping database without cascade: Throwing an exception as one or more tables exists because by default restrict is applied and the database cannot be dropped before dropping all tables.
drop database imapal_demo;
Dropping Database with Cascade:
Optional Parameters of Impala Database
The above command consist of some optional parameter in square bracket explained below:
1. [if exists]: This command will only drop the database if the database is present and if doesn’t exist it will not throw any error. But if we don’t include this parameter it will throw an error while dropping the database as the database doesn’t exist.
2. [Restrict]: Restrict mode is the default mode when the database is created. It will restrict the user to drop all the tables from the database before it drops the table. This parameter helps in preventing like accidentally deleting any database when data is present in tables inside the database. So if the database needs to be dropped first it is mandatory to drop the tables inside the database.
3. [Cascade]: Cascade is not the default mode. If the tables present inside any database and drop command is issued on the database it will throw an error and will not allow to drop the database. But when cascade option is used it will allow to drop the database even if tables are present in the database. First, it will drop the tables then the database will be dropped.
This is a guide to Impala Database. Here we discuss how to Select and Drop the Impala Database and its Commands along with the different Parameters. You can also go through our suggested articles to learn more –
- Introduction to Types of Data Analysis Techniques
- Data Analytics vs Data Analysis – Top Differences
- What is Data Integration?
- Hive vs Impala – Top Differences
- Guide to Python SQLite Create Database