Introduction To Hive Interview Questions and Answers
In new data era Hive is an open-source petabyte-scale ETL and Data warehousing infrastructure tool package to store structured and unstructured data build upon Distributed File System (HDFS)
for analyzing, querying and mining huge volume data sets by enabling SQL-like language called HiveQL (HQL) and easy query execution by planning Hadoop MapReduce.
Hive is built on top of Hadoop to process and analyze Big Data and makes querying easy. The hive was initially created by Facebook, later it was further enhanced and developed as an open-source by Apache Software Foundation and named it as Apache Hive. There are many companies now that make use of Apache Hive for their Big Data solutions.
If you are looking for a job that is related to Hive, you need to prepare for the 2018 Hive Interview Questions. Though every interview is different and the scope of a job is also different, we can help you out with the top 2018 Hive Interview Questions and Answers, which will help you take the leap and get you success in your interview.
Below is the top list of Hive Interview Questions that are mostly asked in an interview. These Questions are divided into two parts are as follows:
Part 1 – Hive Interview Questions (Basic)
This first part covers basic Interview Questions and Answers.
1. List out the different components of Hive architecture?
There are five core components in Hive architecture are listed below:
- User Interface (UI): It acts as a communicator between users and drivers when the user writes the queries the UI accepts it and runs it on the driver, there are two types of interface available they are Command line and GUI interface.
- Driver: It maintains the life cycle of the HiveQL query. It receives the queries from the user interface and creates the session to process the query.
- Compiler: It receives the query plans from the driver and gets the required information from Metastore in order to execute the plan.
- Metastore: It stores the information about the data like a table; it can be of an internal or external table. It sends the metadata information to the compiler to execute the query.
- Execute Engine: Hive service will execute the result in an execution engine; it executes the query in MapReduce to process the data. It is responsible for controlling each stage for all these components.
2. Which are the different types of modes that Hive can operate?
This is the common Hive Interview Questions asked in an interview. Hive can operate on two modes based on the size of data. These modes are:
- Map-reduce Mode
- Local Mode
3. Which are the scenarios where Hive can be used and cannot be used?
When you’re creating Data warehouse applications when your data is Static when your application does not need high response time, when the data volume is huge, when the data is not changing rapidly and when you are using queries instead of scripting. Hive supports only OLAP transactions it is not suitable for OLTP transactions.
4. What are the file formats that Hive supports? List the type of applications that are supported by HIVE?
By default, Hive supports Text File format and it also supports the binary file format such as Sequence file, ORC files, Parquet files, Avro Data files.
- Sequence file: It is generally a binary format file, which can be compressed and is splittable.
- ORC file: Optimized Row Columnar file is a recorded column-based file and column-oriented storage file.
- Parquet file: It is a column-oriented binary file it is highly efficient for large-scale queries.
- Avro Data file: It is the same as a sequence file format which is a splittable, compressible and row-oriented file.
The maximum size of string data type allowed in Hive is 2 GB.
Hive is a data warehouse framework that is suitable for those applications that are written in Java, C++, PHP, Python or Ruby.
5. What are the different types of tables that are available in Hive?
There are two types of a table in Hive application, they are:
- Managed Tables: The data and schema are in control of the Hive.
- External Tables: Only the schema is in control of the Hive.
Part 2 – Hive Interview Questions (Advanced)
Let us now have a look at the advanced Interview Questions.
6. What is a Metastore in Hive? List and explain the different types of Hive Meta stores configuration?
Metastore in Hive is used to store the metadata information, it is a central repository in Hive. It allows storing the metadata information in an external database. By default, Hive stores Metadata information in Derby database but it can also be stored in other databases such as Oracle, MySql, etc. There are three types of Metastore configuration, they are:
- Embedded metastore: It is a default mode; it can locally access the Hive library, all the command line operations are done in embedded mode. The Hive service, the metastore service, and the database run in the same JVM.
- Local metastore: It stores data in an external database such as MySql or Oracle. The Hive service and metastore service runs in the same JVM, it connects to the database that is running in separate JVM.
- Remote metastore: It uses the remote mode to run queries, here the metastore service and hive service runs in a separate JVM. You can have multiple metastore servers to increase availability.
7. What is a Hive Query Processor? What are the different components of the Hive Query Processor?
Hive Query Processor is used to convert SQL to MapReduce jobs. Based on the order of dependencies the jobs are executed. The components of Hive Query Processor are listed below:
- Semantic Analyser
- UDF’s and UDAF’s
- Execution Engine
- Type Checking
- Logical Plan Generation
- Physical Plan Generation
8. What is the functionality of Object-Inspector in Hive?
It is composed of Hive that is used to identify the structure of the individual columns and internal structure of row objects. The complex objects that are stored in multiple formats can be accessed using Object-Inspector in Hive.
Object-Inspector will identify the structure of an object and ways to access the internal fields inside the object.
9. What are the different ways to connect the applications to Hive Server?
There are three ways to connect the applications to the Hive server, they are:
- Thrift Client: This is used to run all the hive commands using a different programming language such as Java, C++, PHP, Python or Ruby.
- ODBC Driver: This will support the ODBC protocol
- JDBC Driver: This will support the JDBC protocol
10. What is the default read and write classes in Hive?
Below is the read and write classes available in Hive:
- TextInputFormat: This class is used to read data in plain text format.
- HiveIgnoreKeyTextOutputFormat: This class is used to write data in plain text format.
- SequenceFileInputFormat: This class is used to read data in the Hadoop Sequence file format.
- SequenceFileOutputFormat: This class is used to write data in the Hadoop Sequence file format.
This has been a guide to List Of Hive Interview Questions and answers so that the candidate can crackdown these Interview Questions easily. You may also look at the following articles to learn more –