Updated July 10, 2023

Introduction to Hive Versions

Hive is the data warehousing tool built on top of Hadoop for data processing for large data sets. It was released in 2010 and is the most widely used ETL and data processing tool. After the release, Hive came over with many improvements and bug fixes with its versions upgraded. The latest stable release for Hive came in August 2019 that is in use. HIVE has undergone several updates, resulting in increased durability and improved functionality.

Top 7 Versions of Hive

The various ways of running a hive using these versions are described below.

1. Version 0.9.0

Hive version 0.9.0 fixed the bug for timestampWritable.java that was causing data corruption. This was the initial version with the release date as 3 Dec 2013. Map Reduce versions 2.x and 3.x introduced various new functionalities and resolved existing bugs. The get schema() function was not returning the correct schema for Explain queries. Hdfs were hard coded, which was the barrier to using other file systems. The Hive’s latest version included numerous bugs of this kind.

2. Version 0.10.0

This hive version fixed several noted issues as the release came sooner after the 0.9 version. The software had some problems with bugs that caused syntax errors, memory leakage, and null pointer replacement within nested UDFs. However, these bugs have been fixed in the updated version of Hive 0.10.

3. Version 0.11.0

They released a new version of Hive, known as version 0.10, in August 2014 to fix major bugs they encountered. This version includes a significant modification that will cause the server to go into a SHUTDOWN state if it executes any incorrect queries. Map joins were not giving the correct results; under java 7 hive compiler showed issues; these were the major bug handled in this version.

While mixing Avro and snappy null values was returned, this bug was also solved in this release. Also, inserting multiple overwrite into multiple tables stored the same results in all the tables handled in this release.

4. Version 0.12.0

This release for Hive added a new Data Type Date. This data type made inserting a date in the hive columns easy. The system now cleans up the contents in the scratch directories. We could define functions with HQL. The development team introduced the exchange partition functionality in the Hive version.

Spark SQL was introduced to work with HIVE; we can also use the Hive with HBASE. Format_number started supporting the decimal data type. In this release of the Hive, the development team resolved the bug that caused Hive to continuously retry deleting a lock even when Zookeeper responded that the node did not exist. As a result, Hive no longer attempts this operation unnecessarily. The Hive has introduced the Varchar data type and supports in-memory PTF transactions.

5. Version 0.13.0

Hive came upon another update over mid of 2014, adding some extra features to Hive’s functionalities and making it much stronger. This release introduced several additional features, including the char data type, subquery support, permanent UDFs, ORC file footer support, and improved hive precision. While creating or dropping a database, the permission was modified as the same, which can be done only by checking the user’s permission. It started skipping the Header and Footer rows while reading the data file. You can use an introduced UDF to calculate the distance between geographical coordinates.

Now parquet Text data was readable, and we could determine the version of the Hive we are using HIVE API or CLI. We can now submit multiple queries at a time separated by a; The decision for adjusting the mapper and reducer based on input file size was also available from this release version. They enabled secure mode and Kerberos when starting the Hive Meta store. Daylight savings were handled correctly.

6. Version 1.0.0

Hive came up with a new version with some improvements in performance and other features. We can connect to JDBC with a hive server using the SASL QOP parameter. Hive came with the property of (Hive.optimize.insert.dest.volume=true) that created the hive scratch directory in the same volume as the target table.

The handling of authentication support in this release was excellent. The Hive server accepted inbound authentication for MAPR SASL and PAM once they secured the cluster. Subsequently, PAM became the default authentication method. This release resolved permission issues encountered when inserting or overwriting a table. The introduction of many optimization techniques based on cost has made hive query processing cost-effective for big data applications. Now, the handling of join queries is wise, ensuring that they no longer produce incorrect data.

7. Version 1.2.1

The release of this version of Hive took place in the middle of 2015, alongside the launch of Hive ODBC drivers. The hive shells no longer gave an error after the start, a major change over this version. Hive meta store started making tables correctly, and no other exceptions ( such as lock exception) came while running a query. The development team released a stable version of the Hive and resolved major bugs in it.

Conclusion

Here from the above article, we saw how the Hive originated and the various versions introduced in the Hive, making it much more stable and powerful for big data processing. All the changes introduced in the hive versions were as par with the challenges we started facing for big data processing. They resolved these substantially, making the Hive more powerful for data processing.