Introduction to Hive Versions
Hive being the data warehousing tool build on top of Hadoop is used for data processing for large data sets. It is released in 2010 and the most widely used tool for ETL and processing of data. After the release, Hive came over with lots of improvement and bug fixes with its versions upgraded. The latest stable release for hive came in 2019 August that is in use. There are various versions which was released for HIVE, making it more stable and function enhancement.
Top 7 Versions of Hive
The various ways of running Hive using these versions are described below.
1. Version 0.9.0
Hive version 0.9.0 fixed the bug for timestampWritable.java that was causing data corruption. this was the initial version with the release date as 3 Dec 2013. Working with Map Reduce version 2.x,3.x there were lot more functionalities that were introduced and the bug was solved. The get schema() function was not returning the correct schema for Explain queries. Hdfs was hardcoded that came to be as the barrier for the use of other file systems. So there were lots of these types of bugs that were introduced in this version of the hive.
2. Version 0.10.0
This version of hive fixed several issues that were noted down in hive as the release came sooner after the 0.9 version. There were several bugs as describe syntax was not supported, memory leakage was there, null pointer exception by nested UDF’s so these bugs were resolved and the newer version of Hive 0.10 was introduced.
3. Version 0.11.0
Released in Aug 2014, this version of hive fixed the major bugs for a hive that was encountered have 0.10 version released. The server was going on a SHUTDOWN state every time the incorrect queries were executed, that was a major change introduced over this version. Map joins were not giving the correct results, under java 7 hive compiler was showing issue these were the major bug that was handled in this version.
While mixing avro and snappy null values was returned this bug was also solved in this release. Also inserting multiple overwrite into multiple tables was storing the same results in all the tables that was also handled in this release over.
4. Version 0.12.0
This release for Hive added a new Data Type Date. This data type made it easy for a date to be inserted in the hive columns. The contents in the scratch directories were cleaned up now. We had the ability to define functions with HQL. Exchange partition was introduced in the Hive version.
Spark SQL was introduced to be working with HIVE, also we can use the hive with HBASE too. Format_number started supporting the decimal data type. Earlier Hive continuously retries to delete a lock even when after Zookeeper responds to the non-existence of a node, so this bug was resolved in this release of the hive as it no longer tries for this job. Varchar data type was introduced now in Hive and in-memory PTF transactions were supported.
5. Version 0.13.0
Hive came upon with another update over mid of 2014 adding some extra features again over the functionalities of Hive and making it much more stronger. Some of the extra features like char data type, Support over subqueries, Permanent UDF’s, ORC file footer, hive precision were introduced in this release. The permission while creating or dropping a database was modified as the same can be done only by checking the permission for the user. It started skipping the Header and Footer rows while reading the data file. A UDF was introduced to calculate the distance between geographical coordinates.
Now parquet Text data was readable and we were able to determine the version of the hive we are using HIVE API or CLI. We can now submit multiple queries at a time separated by a ; .The decision for adjusting the mapper and reducer based on input file size was also available from this release version. The secure mode was enabled for starting in the Hive Meta store and was Kerberos enabled. Daylight savings were handled correctly.
6. Version 1.0.0
Hive came up with a new version with some improvements over performance and other features. We can connect to JDBC with a hive server using the SASL QOP parameter. Hive came with the property of (hive.optimize.insert.dest.volume=true) that created the hive scratch directory in the same volume as the target table.
Authentication support was well handled as when the cluster was secure hive server accepted both MAPR SASL AND PAM for in-bound authentication after then PAM was used as the default for this. Permission issues while inserting/overwriting a table were resolved in this release. Many optimization techniques were introduced based on the cost that made hive query processing cost-effective for big data applications. Join Queries were handled wisely now and they were no longer producing incorrect data.
7. Version 1.2.1
This version on Hive was released in mid of 2015 were hive odbc drivers were introduced. The hive shells were no longer giving an error after the start so that was a major change over this version. Hive meta store started making tables correctly and no other exceptions ( such as lock exception) came while running a query. This was a stable version of the hive and major bugs were resolved over this version.
Here from the above article, we saw how hive originated and what are the various version that was introduced in Hive, making it much more stable and powerful for big data processing. All the changes that were introduced in the versions of hive were as par the challenges we started facing for big data processing and all of these were resolved substantially making hive more powerful for data processing.
This is a guide to Hive Versions. Here we discuss the Introduction and various hive versions that make hive much more stable and powerful for big data processing. You may also look at the following articles to learn more –