Introduction to SQL HAVING Clause
The very basic question that comes into the mind is what is this HAVING clause? Well, the HAVING Clause is used to filter out the results from a SQL query with aggregate functions. To understand in plain English, it is commanding the SQL parser ‘Hey SQL, from our Customer data table, fetch me the names of Countries HAVING more than 1 million customers’.
Wait, that’s what the WHERE Clause does, doesn’t it? Yes, that is very similar to how the WHERE clause operates but with a slight difference. The WHERE clause doesn’t work with aggregate functions.
Now, just to recap a little about aggregate functions, these are functions that take multiple rows as input and give a more significantly processed output. A few examples are Count(), Sum(), Min(), Max(), Avg() etc.
Why HAVING and not WHERE?
We see that HAVING and WHERE clauses perform a very similar task to filter out the results. Then what was the need for the HAVING clause? Why couldn’t the WHERE clause be used with aggregate functions?
To answer this, we would need to understand how the SQL engine treats the two clauses. The FROM clause in every SQL command tells the engine from where to read the rows. The data is stored on the disk and is fetched into the memory for processing. As the rows are read one by one from the disk to the memory, they are checked for the WHERE clause. The rows that fail the WHERE clause aren’t loaded into the memory. Thus, the WHERE clause is evaluated for each row as they are processed by the SQL engine.
On the contrary, the HAVING clause comes into picture only after the rows have been loaded into the memory. Once loaded into the memory, the aggregate functions perform their task on the rows HAVING the desired condition.
Now, if we were to put a WHERE clause with the aggregate function such as avg(), this would confuse the SQL engine on whether to include the row for calculating the average or not. Essentially, we would be commanding the engine to not read the row since it did not pass the avg() criteria in the WHERE clause. But hey, to determine whether it passed or failed the avg() calculation criteria, the row needs to be read into the memory. A state of deadlock.
4.5 (2,139 ratings)
WHERE <condition> -- optional
GROUP BY <column(s)> -- groups the rows to apply aggregate function
HAVING <condition> -- aggregate function in the condition
ORDER BY <column(s)>; -- define the sorting order, optional
Note – GROUP BY clause is required with the HAVING clause. This is because Having clause needs a group of data to apply an aggregate function and filter out the results.
How HAVING clause works?
Let us understand the working of the HAVING clause in SQL.
The HAVING clause is always accompanied by the GROUP BY clause. The GROUP BY clause groups together the data that match a certain criterion. It has three phases – split, apply, and combine. The split-phase divides the rows into groups. The apply phase applies some aggregate functions on the groups of data. The combined phase produces a single result by combining the groups with the aggregate function result.
Now that the groups are formed, the HAVING clause comes into the picture. The HAVING clause then filters out the groups which do not satisfy the given condition.
SELECT Col_A, avg(Col_B) as Col_B
GROUP BY Col_A
Thus, in the example above, we see that the table is first to split into three groups based on the column Col_A. The aggregate function to calculate the average of Col_B values is then applied to the groups. This results in a single row for each group. The rows are then combined and filtered based on the condition in the HAVING clause.
Now let us look at a real-world example. Consider we have the following table of customers and the orders that they have placed with us.
|2||Denny Cockett||México D.F.||Mexico|
|3||Eleanor Calnan||México D.F.||Mexico|
|12||Nora Reyna||Buenos Aires||Argentina|
|13||Ursula Laforest||México D.F.||Mexico|
|15||Portia Yee||São Paulo||Brazil|
|21||Pura Ray||São Paulo||Brazil|
Now, we want to know customers of which countries have placed a combined total of 5 or more orders with us. It could be a single customer placing more than 5 orders or 5 customers placing 1 order each.
To accomplish this, we would need to
Step 1: Join the two tables
Step 2: Group the customers based on their countries
Step 3: Count the number of orders for each group
Step 4: Filter the results for 5 or more orders
Let’s formulate the command:
SELECT C.Country, COUNT(O.OrderId) as NumberOfOrders -- Step 1,3
FROM Customers C -- Step 1
INNER JOIN Orders O on C.CustomerID = O.CustomerID -- Step 1
GROUP BY C.Country -- Step 2
HAVING COUNT(O.OrderId) >= 5 -- Step 4
ORDER BY COUNT(O.OrderId) DESC
Here are the results:
Conclusion – SQL HAVING Clause
Thus, we have seen what the purpose of the HAVING clause is and how does it work. It is important to understand the basic working or else you may end up getting confused about why the HAVING clause is not producing the desired results. Keep playing with various tables and joins and combinations along with the HAVING clause.
This is a guide to the SQL HAVING Clause. Here we discuss the working of the HAVING clause in SQL and example with the following table of customers. You can also go through our other suggested articles –