EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Hive Tutorial Indexes in Hive
Secondary Sidebar
Hive Tutorial
  • Basics
    • Hive JDBC Driver
    • What is a Hive
    • Hive Architecture
    • Hive Installation
    • How To Install Hive
    • Hive Versions
    • Hive Commands
    • Hive Data Types
    • Hive Built-in Functions
    • Hive Function
    • Hive String Functions
    • Date Functions in Hive
    • Hive Table
    • Hive Drop Table
    • Hive Show Tables
    • Hive Group By
    • Hive Order By
    • Hive Cluster By
    • Joins in Hive
    • Hive Inner Join
    • Map Join in Hive
    • Hive nvl
    • Hive UDF
    • Dynamic Partitioning in Hive
    • HiveQL
    • HiveQL Queries
    • HiveQL Group By
    • Partitioning in Hive
    • Bucketing in Hive
    • Views in Hive
    • Indexes in Hive
    • External Table in Hive
    • Hive TimeStamp
    • Hive Database
    • Hive Interview Questions
    • Hive insert into

Related Courses

Hive Certification Course

Hadoop Course Training

All in One Data Science Course

Indexes in Hive

By Priya PedamkarPriya Pedamkar

Indexes in Hive

Introduction to Indexes in Hive

Indexes are a pointer or reference to a record in a table as in relational databases. Indexing is a relatively new feature in Hive. In Hive, the index table is different than the main table. Indexes facilitate in making query execution or search operation faster. However, storing indexes require disk space and creating an index involves cost. So, the use of indexes may not always be of any benefit. “EXPLAIN” query must be checked to evaluate the benefit through a query execution plan. Indexing in hive makes large dataset analysis relatively quicker by better query performance on operations.

Why do we need Indexes?

With the petabytes of data that needs to be analyzed, querying Hive tables with millions of records and hundreds of columns becomes time-consuming. Indexing a table helps in performing any operation faster. First, the index of the column is checked and then the operation is performed on that column only. Without an index, queries involving filtering with the “WHERE” clause would load an entire table and then process all the rows. Indexing in Hive is present only for ORC file format, as it has a built-in index with it.

There are two types of indexing in Hive:

  • Bitmap Indexing: This is used with columns having a few distinct values. It is known to store both the indexed column’s value and the list of rows as a bitmap. From Hive V0.8.0 onwards, the bitmap index handler is built-in in Hive.
  • Compact Indexing: This type of indexing is known to store the column value and storage blockid.

Configuration Properties

There is some configuration property that enables better execution of indexing.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

  • hive.optimize.index.filter: The default value for this property is false. Setting this property to TRUE will enable automatic use of indexes.
  • hive.index.compact.query.max.entries: The default value for this property is 10000000. This property is used to set the maximum number of index entries that use compact indexing during query execution.
  • hive.optimize.index.filter.compact.minsize: The default value for this property is 5368709120.

This property is used to set the minimum bytes of input, on which compact indexing can be automatically triggered.

Different Operations to Perform on Hive

Different Operations to Perform on HIVE indexes are:

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,354 ratings)

1. Create an Index

The general syntax for creating an index for a column of a table

CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS index_type
[WITH DEFERRED REBUILD] [IDXPROPERTIES (property_name=property_value, ...)] [IN TABLE index_table_name] [ [ ROW FORMAT ...] STORED AS ...
| STORED BY ... ] [LOCATION hdfs_path] [TBLPROPERTIES (...)] [COMMENT "index comment"]

Here,

  • index_name will be the name of the table’s index name.
  • Base_table_name and the columns in bracket is the table for which index is to be created.
  • Index_type will specify the type of indexing to use. If we want to use the built-in compact index handler, below clause will replace index_type

org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler

If we want to create a bitmap index, then index_type will be “bitmap”. It specifies the java class for implementing indexing.

It is important to note that we can create any number and types of indexes for one table. But if the same column has multiple indexes created for it, then the index created first will be considered.

  • WITH DEFERRED REBUILD: This statement must be present while creating an index because while altering the index later, we will be using this clause.

Let’s consider an example of table mark_sheet with 3 columns: roll_id, class, days_attended.

We are creating an index on the column days_attended using compact indexing with below query:

create index index_attendance on table mark_sheet(days_attended)
As ‘org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler’
WITH DEFERRED REBUILD

Output in HUE editor is:

HUE editor

2. Show an Index

The command used to see the details of an index created on a base table is using below, which shows the details like index table name, base table and column name on which index is created, and also the type of index handler used ( whether compact or Bitmap)

Show formatted index on base_table_name

Output

Show an Index

3. Alter an Index

The syntax for altering an index is as seen below.

ALTER INDEX index_name ON table_name [PARTITION partition_spec] REBUILD

This command ALTER INDEX….REBUILD is used to rebuild an index, which was already built on a table. Partition details should also be provided if the base table has partitions. Indexes need to be rebuilt if the underlying table is overwritten or appended.

Output

Alter an Index

4. Drop an Index

The below command is used to drop the index and also the index table storing indexes.

DROP INDEX IF EXISTS index_name ON base_table_name

The index and the indexed table is deleted automatically if the table on which index was built is dropped. Similarly, if a partitioned table is indexed then on dropping the partitions, the indexes are also automatically deleted.

Output

Drop

Advantages of Indexes in Hive

  • Indexing in Hive provides a good replacement for partitioning, when the number of partitions or logical sub-segments will be too many and small in size, to be of any worth.
  • Indexes reduce the query execution time.
  • Indexing in Hive helps in case of traversing large data sets and also while building a data model.
  • Indexing can be done for external tables or views, except when data is present in S3.

Conclusion

In this tutorial, we got to learn about when and how to use indexes in the Hive. We also came across the advantage it provides over conventional query building in Hive. However, it is important to note that indexes should be preferred if a frequent operation is performed on any column. Indexes can also degrade the query performance if used recklessly, because of the disk space involved in creating an index table. In RDMS, the primary key acts as an index to a table, but Hive indexes have their inherent advantages.

Recommended Articles

This is a guide to Indexes in Hive. Here we discuss the basic concept, why do we need Indexes, different operations perform on Hive and advantages. You can also go through our other related articles to learn more–

  1. Bucketing in Hive
  2. Map Join in Hive
  3. Hive Group By
  4. Linux Route
  5. Complete Guide to Hive Inner Join
Popular Course in this category
Hive Training (2 Courses, 5+ Projects)
  2 Online Courses |  5 Hands-on Projects |  25+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course

Related Courses

Hadoop Training Program (20 Courses, 14+ Projects, 4 Quizzes)4.9
All in One Data Science Bundle (360+ Courses, 50+ projects)4.8
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more