EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Hive Tutorial HiveQL Queries
Secondary Sidebar
Hive Tutorial
  • Basics
    • Hive JDBC Driver
    • What is a Hive
    • Hive Architecture
    • Hive Installation
    • How To Install Hive
    • Hive Versions
    • Hive Commands
    • Hive Data Types
    • Hive Built-in Functions
    • Hive Function
    • Hive String Functions
    • Date Functions in Hive
    • Hive Table
    • Hive Drop Table
    • Hive Show Tables
    • Hive Group By
    • Hive Order By
    • Hive Cluster By
    • Joins in Hive
    • Hive Inner Join
    • Map Join in Hive
    • Hive nvl
    • Hive UDF
    • Dynamic Partitioning in Hive
    • HiveQL
    • HiveQL Queries
    • HiveQL Group By
    • Partitioning in Hive
    • Bucketing in Hive
    • Views in Hive
    • Indexes in Hive
    • External Table in Hive
    • Hive TimeStamp
    • Hive Database
    • Hive Interview Questions
    • Hive insert into

Related Courses

Hive Certification Course

Hadoop Course Training

All in One Data Science Course

HiveQL Queries

By Priya PedamkarPriya Pedamkar

HiveQL Queries

Introduction to HiveQL Queries

HiveQL has a SQL like a dialect used for summarizing and querying large chunks of data through Hadoop environment. Hive is widely used to alter, create and drop tables, databases, views or user-defined functions by big data professionals. Some of the data definition language (DDL) used to load data and modify it in the database are Create, Alter, Show, describe, describe formatted, drop, truncate.

Data Types in Hive

  • As in relational databases, Hive supports most of the primitive data types and also three collection data types.
  • Primitive data types are Integer, Boolean, float, double, String, Timestamp and Binary.
  • Within Integer, Hive supports varying sizes like tinyint, smallint, int and bigint.
  • The collection data types are structs, maps and array.
  • A struct is analogous to a C struct.
  • “Dot” notation is used to access the fields with this data type.
  • A map is a collection of key-value tuples.
  • Array notation(e.g. [‘key’ ]) is used to access the fields with this data type.
  • An array is a collection of elements with the same data type which can be accessed using zero-based integer indexes.

Types of HiveQL Queries

Given below are the types of HiveQL queries that are widely used:

1. HiveQL query for information_schema database

Hive queries can be written to get information about Hive privileges, tables, views or columns. Information_schema data is a read-only and user-friendly way to know the state of the system similar to sys database data.

Example:

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

Code:

Select * from information_schema.columns where table_schema = ‘database_name’

This will retrieve all the columns in the database table specified.

2. Creation and loading of data into a table

The bulk load operation is used to insert data into managed tables as Hive does not support row-level insert, delete or update.

Code:

LOAD DATA LOCAL INPATH ‘$Home/students_address’ OVERWRITE INTO TABLE students
PARTITION (class = “12”, section = “science”);

With the above command, a directory is first created for the partition, and then all the files are copied in the directory. The keyword “local” is used to specify that the data is present in the local file system. “Partition” keyword can be omitted if the table does not have a partition key. Hive query will not check for the data being loaded to match the schema of the table.

The “INSERT” command is used to load data from a query into a table. “OVERWRITE” keyword is used to replace the data in a table. In Hive v0.8.0 or later, data will get appended into a table if overwrite keyword is omitted.

Code:

INSERT OVERWRITE TABLE students
PARTITION ( class = “12”, section = “science”)
Select * from students_data where class = “12” and section = “science”

All the partitions of the table students_data can be dynamically inserted by setting below properties:

Set hive.exec.dynamic.partition = True;

Set hive.exec.dynamic.partition.mode = unstrict

Set hive.exec.max.dynamic.partition.pernode = 1000;

CREATE TABLE clause will also create a table, and schema will be taken from the select clause.

3. Merge data in tables

Data can be merged from tables using classic SQL joins like inner, full outer, left, right join.

Code:

Select a.roll_number, class, section from students as a
inner join pass_table as b
on a.roll_number = b.roll_number

This will return class and section of all the roll numbers who have passed. Using a left join to this will return the “grade” for only pass students and “NULL” for the failed ones.

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,408 ratings)

Code:

Select a.roll_number, class, section, b.grade from students as a
Left join pass_table as b
on a.roll_number = b.roll_number

UNION ALL and UNION are also used to append data present in two tables. However, few things need to be taken care of on doing so like, Schema of both the tables should be same. UNION is used to append the table and return unique records while UNION ALL returns all the records, including duplicates.

4. Ordering a table

ORDER BY clause enables total ordering of the data set by passing all data through one reducer. This may take a long time for large data tables, so SORT BY clause can be used to achieve partial sorting, by sorting each reducer.

Code:

Select customer_id, spends from customer as a order by spends DESC limit 100

This will return the top 100 customers with highest spends.

5. Aggregation of data in a table

Aggregation is done using aggregate functions that returns a single value after doing computation on many rows. These are count(col), sum(col), avg(col), min(col), max(col), stddev_pop(col), percentile_approx(int_expr, P, NB), where NB is number of histogram bins for estimation), collect_set(col), this returns duplicate elements after removing collection column.

The set property which helps in improving the performance of aggregation is hive.map.aggr = true.

“GROUP BY” clause is used with an aggregate function.

Example:

Code:

Select year(date_yy), avg(spends) from customer_spends where merchant = “Retail” group by year(date_yy)

HAVING clause is used to restrict the output from GROUP BY, which is done using a subquery.

6. Conditional statements

CASE…WHEN…THEN clause is similar to if-else statements to perform a conditional operation on any column in a query.

For example:

Code:

Select customer,
Case when percentage <40 then “Fail”
When percentage >=40 and percentage <80 then “Average” Else “Excellent”
End as rank From students;

7. Filtering of data

WHERE clause is used to filter data in HiveQL. LIKE is used along with WHERE clause as a predicate operator to match a regular expression in a record.

8. Way to escape an illegal identifier

There is a way to use special characters or keywords or space in columns or partition names by enclosing it in backticks ( ` ).

Comments in Hive Scripts:

There is a way to add comment lines to the Hive script by starting it with the string ‘- -‘.

Example:

Below is the code to display students data.

Code:

Select * from student_table;

This only works in scripts, if pasted in CLI error messages will get displayed.

HiveQL supports a member of file formats like Textfile, Parquet, etc. The ORC (Optimized Row Columnar) format can support tables upto 300 PB as Hive supports ANSI SQL and ACID (Atomic, Consistent, Isolated and Durable) transactions.

Recommended Articles

This is a guide to HiveQL Queries. Here we discuss the introduction to HiveQL Queries along with data types and types of HiveQL queries, respectively. You may also have a look at the following articles to learn more –

  1. Views in Hive
  2. Bucketing in Hive
  3. Partitioning in Hive
  4. Hive Alternatives
Popular Course in this category
Hive Training (2 Courses, 5+ Projects)
  2 Online Courses |  5 Hands-on Projects |  25+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course

Related Courses

Hadoop Training Program (20 Courses, 14+ Projects, 4 Quizzes)4.9
All in One Data Science Bundle (360+ Courses, 50+ projects)4.8
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more