Difference Between Hive and HBase
Before going directly going into hive and hbase we now first try to understand what is Hadoop and MapReduce.Hadoop uses to process a large set of data using MapReduce programming model and it is open source framework, whereas MapReduce uses to analyze the given datasets by dividing them into two phases i.e, map phase and reduce phase.
Apache Hive and HBase are Hadoop based big data technologies.They both used to query data.Hive and HBase run on top of Hadoop and they differ in their functionality.Let’s try to understand the differences between them.
Head to Head differences between Hive vs HBase (Infographics)
Below is the Top 8 Difference between Hive vs HBase
Key differences between Hive vs HBase
- Hive is map-reduce based SQL dialect whereas HBase supports only MapReduce.
- HBase stores data in the form of key/value or column family pairs whereas Hive doesn’t store data.
- Hbase is an ACID Compliant whereas Hive is not.
- Hive supports partitioning and filter criteria based on the date format whereas HBase supports automated partitioning.
- Hive doesn’t support update statements whereas HBase supports them.
- Hbase is faster when compared to Hive in fetching data.
- Hive is used to process structured data whereas HBase since it is schema-free, can process any type of data.
- Hbase is highly(horizontally) scalable when compared to Hive.
- Hive analyze the data on the HDFS with the support of SQL Queries and then they convert that into a map and reduce jobs whereas in Hbase since it is real-time streaming it directly performs its operations on the database by partitioning to tables and column families.
- when coming to the querying of data hive uses a shell known as Hive shell to issue the commands whereas HBase since it is database we will use a command to process the data in HBase.
- To go to the Hive shell we will use the command hive.After giving this it will appear like hive>.In HBase, we simply give as Use HBase.
Hive vs HBase Comparision Table
|Basis for comparison
|Database type||It is not database||It supports NoSQL database|
|Type of processing||It supports Batch processing i.e OLAP||It supports real-time data streaming i.e OLTP|
|Database model||Hive supports to have schema model||Hbase is schema-free|
|Latency||Hive has low latency||Hbase have high latency|
|Cost||It is more costly when compared to HBase||It is cost effective|
|when to use||Hive can be used when we do not want to write complex MapReduce code||HBase can be used when we want to have random access to read and write a large amount of data|
|Use cases||It should be used to analyze data that is stored over a period of time||It should be used to analyze real-time processing of data.|
|Examples||Hubspot is example for Hive||Facebook is the best example for Hbase|
Differences in coding between Hive vs HBase
Let us now discuss basic differences between Hive and HBase in coding.
|Basis for comparison
|To Create a database||CREATE DATABASE[IF NOT EXISTS] DATABASE-NAME;||Since Hbase is database we need not to create specific database|
|To Drop a Database||DROP DATABASE[IF EXISTS] DATABASE-NAME[RESTRICT OR CASCADE];||NA|
|To Create a Table||CREATE [TEMPORARY OR EXTERNAL] TABLE [IF NOT EXISTS] TABLE-NAME [(column-name data_type [Comment column-comment],….)] [Comment table_comment] [ROW FORMAT row-format] [Stored as File-format]||CREATE ‘<table-name>’,'<column-family>’|
|To Alter a Table||ALTER TABLE name RENAME TO new-name
ALTER TABLE name DROP [COLUMN] column-name
ALTER TABLE name ADD COLUMNS(col-spec[,col-spec..])
ALTER TABLE name CHANGE column-name new-name new-type
ALTER TABLE name REPLACE COLUMNS(col-spec[,col-spec..])
|ALTER ‘TABLE-NAME’,NAME=>’COLUMN-NAME’,VERSIONS=><new version number>|
|Disabling a Table||NA||disable ‘TABLE-NAME’ ->to disable specified table-name
disable_all ‘r*’ ->to disable all the tables which matches the regular expression
|Enabling a Table||NA||enable ‘TABLE-NAME’|
|To Drop a Table||DROP TABLE IF EXISTS table-name||If we want to Drop a table then first we need to disable it
Similarly,we can use disable_all and drop_all to delete the tables which matches the specified regular expression.
|To list Databases||show databases;||NA|
|To list Tables in Database||show tables;||list|
|To describe schema of a table||describe table-name;||describe ‘table-name’|
Integration of Hive vs HBase
- Install and configure Hive.
- Install and configure HBase.
- For Integration of both Hive and HBase, we use STORAGE HANDLERS in Hive.
- Storage Handlers is a combination of SERDE, InputFormat, OutputFormat that accepts any external entity as a table in Hive.
- So this feature helps a user to issue SQL queries, whether the table present in Hadoop or in the NOSQL based database such as HBase, MongoDB, Cassandra, Amazon DynamoDB.
- Now we will look into one example for connecting Hive with HBase using HiveStorageHandler:
- First, we need to create Hbase table by using the command.
create ‘Student’,’personalinfo’,’dept info’
->Personalinfo and dept info create two different column families in Student table.
- We need to insert some data into Student table.For example,as mentioned below.
->Similarly,we can create data for sid02,sid03…
- Now we need to create Hive table pointing to HBase table.
- For each column in the Hbase, we will create one particular table for that column in Hive.In this case, we will create 2 tables in Hive
create external table student_hbase(sid String,name String,mailid String)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler with serdeproperties("hbase.columns.mapping"=":key,personalinfo:name,personalinfo:mailid")
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
->Similarly, we need to create dept info details table in hive.
- Now we can write SQL query in a hive as mentioned below.
select * from student_hbase;
In this way, we can integrate Hive with HBase.
Conclusion – Hive vs HBase
As discussed, they both are different technologies which provide different functionalities where Hive works by using SQL language and it can also be called as HQL and HBase use key-value pairs to analyze the data.Hive and HBase work better if they are combined because Hive have low latency and can process a huge amount of data but cannot maintain up-to-date data and HBase doesn’t support analysis of data but supports row-level updates on a large amount of data.
This has been a guide to Hive vs HBase, their Meaning, Head to Head Comparison, Key Differences, Comparision Table, and Conclusion. You may also look at the following articles to learn more –
- Apache Pig vs Apache Hive – Top 12 Useful Differences
- Find Out The 7 Best Differences Between Hadoop vs HBase
- Top 12 Comparison of Apache Hive vs Apache HBase (Infographics)
- Hadoop vs Hive – Find Out The Best Differences