Introduction to Hive Versions
Hive is the data warehousing tool built on top of Hadoop for data processing for large data sets. It is released in 2010 and the most widely used tool for ETL and processing of data. After the release, Hive came over with lots of improvement and bug fixes with its versions upgraded. The latest stable release for hive came in 2019 August that is in use. There are various versions which were released for HIVE, making it more durable and function enhancement.
Top 7 Versions of Hive
The various ways of running Hive using these versions are described below.
1. Version 0.9.0
Hive version 0.9.0 fixed the bug for timestampWritable.java that was causing data corruption. this was the initial version with the release date as 3 Dec 2013. Working with Map Reduce version 2.x,3.x a lot more functionalities were introduced, and the bug was solved. The get schema() function was not returning the correct schema for Explain queries. Hdfs was hardcoded that came to be as the barrier for the use of other file systems. So there were lots of these types of bugs that were introduced in this version of the hive.
2. Version 0.10.0
This version of hive fixed several noted down issues as the release came sooner after the 0.9 version. Several bugs as described as syntax was not supported, memory leakage was there, and the null pointer except by nested UDF’s so these bugs were resolved, and the newer version of Hive 0.10 was introduced.
3. Version 0.11.0
Released in Aug 2014, this version of hive fixed the major bugs for a hive that was encountered have 0.10 version released. The server was going on a SHUTDOWN state every time the incorrect queries were executed, that was a major change introduced over this version. Map joins were not giving the correct results, under java 7 hive compiler showed issue these were the major bug that was handled in this version.
While mixing Avro and snappy null values was returned, this bug was also solved in this release. Also inserting multiple overwrite into multiple tables was storing the same results in all the tables that were also handled in this release over.
4. Version 0.12.0
This release for Hive added a new Data Type Date. This data type made it easy for a date to be inserted in the hive columns. The contents in the scratch directories were cleaned up now. We had the ability to define functions with HQL. Exchange partition was introduced in the Hive version.
Spark SQL was introduced to be working with HIVE; we can also use the hive with HBASE. Format_number started supporting the decimal data type. Earlier Hive continuously retries to delete a lock even when Zookeeper responds to the non-existence of a node, so this bug was resolved in this release of the hive as it no longer tries for this job. Varchar data type was introduced now in Hive, and in-memory PTF transactions were supported.
5. Version 0.13.0
Hive came upon with another update over mid of 2014 adding some extra features again over Hive’s functionalities and making it much stronger. Some of the extra features like char data type, Support over subqueries, Permanent UDF’s, ORC file footer, hive precision were introduced in this release. The permission while creating or dropping a database was modified as the same, can be done only by checking the user’s permission. It started skipping the Header and Footer rows while reading the data file. A UDF was introduced to calculate the distance between geographical coordinates.
Now parquet Text data was readable, and we were able to determine the version of the hive we are using HIVE API or CLI. We can now submit multiple queries at a time separated by a; The decision for adjusting the mapper and reducer based on input file size was also available from this release version. The secure mode was enabled for starting in the Hive Meta store and was Kerberos enabled. Daylight savings were handled correctly.
6. Version 1.0.0
Hive came up with a new version with some improvements over performance and other features. We can connect to JDBC with a hive server using the SASL QOP parameter. Hive came with the property of (hive.optimize.insert.dest.volume=true) that created the hive scratch directory in the same volume as the target table.
Authentication support was well handled as when the cluster was secure hive server accepted both MAPR SASL AND PAM for in-bound authentication after then PAM was used as the default for this. Permission issues while inserting/overwriting a table were resolved in this release. Many optimization techniques were introduced based on the cost that made hive query processing cost-effective for big data applications. Join Queries were handled wisely now, and they were no longer producing incorrect data.
7. Version 1.2.1
This version on Hive was released in mid of 2015 were hive odbc drivers were introduced. The hive shells were no longer giving an error after the start, so that was a major change over this version. Hive meta store started making tables correctly, and no other exceptions ( such as lock exception) came while running a query. This was a stable version of the hive, and major bugs were resolved over this version.
Here from the above article, we saw how the hive originated and the various versions introduced in the Hive, making it much more stable and powerful for big data processing. All the changes introduced in the hive versions were as par the challenges we started facing for big data processing. All of these were resolved substantially, making hive more powerful for data processing.
This is a guide to Hive Versions. Here we discuss the Introduction and various hive versions that make hive much more stable and powerful for big data processing. You may also look at the following articles to learn more –
- Bucketing in Hive | Features and Examples
- Newly Introduced Spark Versions
- What are the Apache Hadoop Versions?
- What are the Different Versions of Java?
- Hive Inner Join | Versions | Working and Examples