EDUCBA Logo

EDUCBA

MENUMENU
  • Explore
    • EDUCBA Pro
    • PRO Bundles
    • Featured Skills
    • New & Trending
    • Fresh Entries
    • Finance
    • Data Science
    • Programming and Dev
    • Excel
    • Marketing
    • HR
    • PDP
    • VFX and Design
    • Project Management
    • Exam Prep
    • All Courses
  • Blog
  • Enterprise
  • Free Courses
  • Log in
  • Sign Up
Home Data Science Data Science Tutorials Hive Tutorial Hive Group By
 

Hive Group By

Priya Pedamkar
Article byPriya Pedamkar

Updated March 22, 2023

Hive Group By

 

 

Introduction to Hive Group By

Group By as the name suggests it will group the record which satisfies certain criteria. In this article, we will look at the group by HIVE. In legacy RDBMS like MySQL, SQL, etc., group by is one of the oldest clauses used. Now it has found its place in a similar way in file-based data storage famously know as HIVE.

Watch our Demo Courses and Videos

Valuation, Hadoop, Excel, Mobile Apps, Web Development & many more.

We know that the Hive has surpassed many legacy RDBMS in handling huge data without a penny being spent on vendors to maintain the databases and servers. We need to configure HDFS to handle hive. Generally, we move to tables because the end-user can interpret from its structure and can query upon as files will be clumsy for them. But we had to do this by paying the vendors to provide servers and maintain our data in the format of tables. So Hive provides the cost-effective mechanism where it takes advantage of file-based systems (the way the hive saves its data) and tables (table structure for the end-users to query upon).

Group By

Group by uses the defined columns from the Hive table to group the data. Consider you have a table with the census data from each city of all the states where city name and state name are columns. Now in the query, if we group by states, then all the data from different cities of a particular state will be grouped together, and one can easily visualize the data better now before the way group by was applied.

Syntax of Hive Group By

The general syntax of the group by clause is as below:

SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition] [GROUP BY col_list] [HAVING having_condition] [ORDER BY col_list]] [LIMIT number];

or for simpler queries,

<Select Clause> <rferenced Columns> from <table_name> Group By <The columns on which we want to group the data>

Select department, count(*) from the university.college Group By department;

Here the department refers to one of the college table columns, which is present in the university database, and its value is various in departments like arts, mathematics, engineering, etc. Now let see some example to demonstrate group by.

I have created a sample table deck_of_cards to demonstrate the group by. Its create table statement is as follows:

Hive Group By

you can see from above that it has three string columns colour, suit, and pip. Let me write a query to group the data by its colour and get its count.

select color, count(*)  from deck_of_cards group by color;

Hive basically takes the above query to convert it to the map-reduce program by generating corresponding java code and jar file and then executes. This process may take a bit of time, but it can definitely handle the big data compared to traditional RDBMS. See the below screenshot with the detailed log for executing the above query.

Hive Group By eg1

You can see that BLACK is 26 and RED is 26.

Let us apply the grouping on two columns (colour and suit and getting group count) and see the result below.

Select color, suit, count(*) from  deck_of_cards group by color, suit

Hive eg2

Basically, there are four distinct groups above Club, Spade which have colour black and Diamond and heart, which are colour red.

Storing the Result from Group by Cause in Another Table

Hive also like any other RDBMS provides the feature of inserting the data with create table statements. Let us look at storing the result from a select expression using a group by into another table. Let me use the above query itself where I have used two columns in group by.

create table cards_group_by
as
select color,suit,count(*) from deck_of_cards
group by color,suit;

result group by eg1

Now let us query upon the created table to see and validate the data.

result eg2

Now let us restrict the result of the group by using having clause. As shown in the generic syntax, we can apply restriction on the group, by using having. Here I am using the ordser_items table, and its structure is as follows from the describe statement.

hive (retail_db_ali)> describe order_items;
OK
order_item_id    int
order_item_order_id   int
order_item_product_id   int
order_item_quantity   tinyint
order_item_subtotal   float
order_item_product_price   float
Time taken: 0.387 seconds, Fetched: 6 row(s)

select order_item_id, order_item_order_id from order_items group by order_item_id, order_item_order_id having order_item_order_id=5;

result eg3

You can see from the result the screenshot that we have records only with order_item_order_id value 5.

Group by Along with Case Statement

Now let us look at bit complex queries involving the CASE statements with the group by. We will apply this to the order_items table. We will see below that we can categorize the nonaggregating columns on which we cannot directly apply the group by clause.

Select
case
when order_item_subtotal <=200 then "less_profit"
when order_item_subtotal <=300 then "avg_prof"
when order_item_subtotal<=500 then "good_prof"
when order_item_subtotal<=550 then "max_profit"
else 'corsed_treshold'
end
as order_profits,
count(*) from order_items
group by
case
when order_item_subtotal <=200 then "less_profit"
when order_item_subtotal <=300 then "avg_prof"
when order_item_subtotal<=500 then "good_prof"
when order_item_subtotal<=550 then "max_profit"
else 'corsed_treshold'
end;

let us execute it in the hive for results

Along with eg1

Conclusion

We can see that we have grouped the order_item_subtotal into four different categories (if you note that order_item_subtotal is a non-aggregating column and direct group by cannot be applied on to it). We have grouped them together and got their counts for the values that are satisfying the range as defined in the select expression. Here is the simple rule if the column is nonaggregating. Our select expression is complex, then whatever they’re in the select expression that should also be present in the group by clause expression. So we have seen how a famous clause RDBMS clause group can also be applied to the Hive without any restrictions. It can be applied to simple select expressions. Aggregate and filtering expressions, join expressions and complex CASE expressions as well.

Recommended Articles

This is a guide to Hive Group By. Here we discuss the group by, syntax, examples of the hive group by with different conditions and implementation in detail. You may also look at the following articles to learn more –

  1. What is a Hive?
  2. Hive Architecture
  3. Hive Function
  4. Hive Order By
  5. Hive Installation

Primary Sidebar

Footer

Follow us!
  • EDUCBA FacebookEDUCBA TwitterEDUCBA LinkedINEDUCBA Instagram
  • EDUCBA YoutubeEDUCBA CourseraEDUCBA Udemy
APPS
EDUCBA Android AppEDUCBA iOS App
Blog
  • Blog
  • Free Tutorials
  • About us
  • Contact us
  • Log in
Courses
  • Enterprise Solutions
  • Free Courses
  • Explore Programs
  • All Courses
  • All in One Bundles
  • Sign up
Email
  • [email protected]

ISO 10004:2018 & ISO 9001:2015 Certified

© 2025 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

EDUCBA Login

Forgot Password?

🚀 Limited Time Offer! - 🎁 ENROLL NOW