Difference Between Hadoop and Hive
Hadoop is a Framework or Software which was invented to manage huge data or Big Data. Hadoop is used for storing and processing the large data distributed across a cluster of commodity servers.
Hadoop stores the data using Hadoop distributed file system and process/query it using Map Reduce programming model.
Figure 1, a Basic architecture of a Hadoop component.
Hadoop’s Major Components:
Hadoop Base/Common: Hadoop common will provide you one platform to install all its components.
HDFS (Hadoop Distributed File System): HDFS is a major part of Hadoop framework it takes care of all the data in Hadoop Cluster. It works on Master/Slave Architecture and stores the data using replication.
Master/Slave Architecture & Replication:
- Master Node/Name Node: Name node store the metadata of each block/file stored in HDFS, HDFS can have only one Master Node (In case of HA another Master Node will work as Secondary Master Node).
- Slave Node/Data Node: Data nodes contains actual data files in blocks. HDFS can have multiple Data Nodes.
- Replication: HDFS stores its data in by dividing it into blocks. Default block size is 64 MB. Due to replication data gets stored into 3 (Default Replication factor, can be increased as per requirement) different Data Nodes hence there is the least possibility of losing the data in case of any node failure.
YARN (Yet Another Resource Negotiator): It is basically used for managing Hadoop resources also it plays important role in the scheduling of users’ application.
MR (Map Reduce): This is the basic programming model of Hadoop. It is used to process/query the data within Hadoop framework.
Hive is an application which runs over Hadoop framework and provides SQL like interface for processing/query the data. Hive is designed and developed by Facebook before becoming part of the Apache-Hadoop project.
4.8 (676 ratings)
Hive runs its query using HQL (Hive query language). Hive is having the same structure as RDBMS and almost same commands can be used in Hive.
Hive can store the data in external tables so it’s not mandatory to used HDFS also it support file formats such as ORC, Avro files, Sequence File and Text files etc.
Figure 2, Hive’s Architecture & It’s major components.
Hive’s Major Component:
Hive Clients: Not only SQL, Hive also supports programming languages like Java, C, Python using various drivers such as ODBC, JDBC, and Thrift. One can write any hive client application in other languages and can run in Hive using these Clients.
Hive Services: Under Hive services, execution of commands and queries take place. Hive web Interface is having five sub-components.
- CLI: Default command line interface provided by Hive for execution of Hive queries/commands.
- Hive Web Interfaces: It is a simple graphical user interface. It is an alternative to Hive command line and used to run the queries and commands in Hive application.
- Hive Server: It is also called as Apache Thrift. It is responsible to take commands from different- different command line interfaces and submit all the commands/queries to Hive also it retrieves the final result.
- Apache Hive Driver: It is responsible for taking the inputs from the CLI, the web UI, ODBC, JDBC or Thrift interfaces by a client and pass the information to metastore where all the file information is stored.
- Metastore: Metastore is a repository to store all Hive metadata information. Hive’s metadata stores the information such as structure of tables, partitions & column type etc…
Hive Storage: It is the location where actual task gets performed, All the queries that run from Hive performed the action inside Hive storage.
Head to Head Comparison Between Hadoop vs Hive (Infographics)
Below is the top 8 difference between Hadoop vs Hive
Key Differences between Hadoop vs Hive:
Below are the lists of points, describe about the key differences between Hadoop and Hive:
1) Hadoop is a framework to process/query the Big data while Hive is an SQL Based tool which builds over Hadoop to process the data.
2) Hive process/query all the data using HQL (Hive Query Language) it’s SQL-Like Language while Hadoop can understand Map Reduce only.
3) Map Reduce is an integral part of Hadoop, Hive’s query first get converted into Map Reduce than processed by Hadoop to query the data.
4) Hive works on SQL Like query while Hadoop understands it using Java-based Map Reduce only.
5) In Hive, earlier used traditional “Relational Database’s” commands can also be used to query the big data while in Hadoop, have to write complex Map Reduce programs using Java which is not similar as tradition Java.
6) Hive can only process/query the structured data while Hadoop is meant for all type of data whether it is Structured, Unstructured or Semi-Structured.
8) One side Hadoop frameworks need 100s line for preparing Java-based MR program another side Hadoop with Hive can query the same data using 8 to 10 lines of HQL.
9) In Hive, it’s very difficult to insert the output of one query as input of another one while the same query can be done easily using Hadoop with MR.
10) It’s not mandatory to have Metastore within Hadoop cluster While Hadoop stores all its metadata inside HDFS (Hadoop Distributed File System).
Hadoop vs Hive Comparision Table
Design and Development
|Data Storage Location||
Data can be stored in External
Table, HBase or in HDFS.
|Strictly HDFS only.|
|Language Support||HQL (Hive Query Language)||
It can use multiple programming languages such as Java, Python, Scala and many more.
|Data Types||It can work on Structured Data only.||
It can process Structured, Un-Structured and Semi-Structure data.
|Data Processing Framework||
HQL (Hive Query Language)
|Using Java written Map Reduce program only.|
|SQL-Like language.||SQL and No-SQL.|
Derby (default) also support MYSQL, Oracle…
|HBase, Cassandra etc….|
SQL based programming framework.
|Java-based programming framework.|
Conclusion – Hadoop vs Hive
Hadoop and Hive both are used to process the Big data. Hadoop is a framework which provides platform for other applications to query/process the Big Data while Hive is just an SQL based application which processes the data using HQL (Hive Query Language)
Hadoop can be used without Hive to process the big data while it’s not easy to use Hive without Hadoop.
As a conclusion, we can’t compare Hadoop and Hive anyhow and in any aspect. Both Hadoop and Hive are completely different. Running both of the technology together can make Big Data query process much easier and comfortable for Big Data Users.
This has been a guide to Hadoop vs Hive, their Meaning, Head to Head Comparison, Key Differences, Comparision Table, and Conclusion. You may also look at the following articles to learn more –