Introduction to SQL HAVING Clause
‘Having’ clause in SQL is used for aggregation operations along with ‘Where’, ‘group by’ & ‘order by’ condition statements. It is applied on a table/ database where there is a need for filtering aggregate results and allows ‘group by’ and ‘order by’ conditions. When the ‘having’ clause is used in a query, it is expected that the resulting output data set can have more than one record from the table/ database.
Why HAVING and not WHERE?
We see that HAVING and WHERE clauses perform a very similar task to filter out the results. Then what was the need for the HAVING clause? Why couldn’t the WHERE clause be used with aggregate functions?
To answer this, we would need to understand how the SQL engine treats the two clauses. The FROM clause in every SQL command tells the engine from where to read the rows. The data is stored on the disk and is fetched into the memory for processing. As the rows are read one by one from the disk to the memory, they are checked for the WHERE clause. The rows that fail the WHERE clause aren’t loaded into the memory. Thus, the WHERE clause is evaluated for each row as the SQL engine processes them.
On the contrary, the HAVING clause comes into the picture only after the rows have been loaded into the memory. Once loaded into the memory, the aggregate functions perform their task on the rows HAVING the desired condition.
Now, if we were to put a WHERE clause with the aggregate function such as avg(), this would confuse the SQL engine on whether to include the row for calculating the average or not. Essentially, we would be commanding the engine to not read the row since it did not pass the avg() criteria in the WHERE clause. But hey, to determine whether it passed or failed the avg() calculation criteria, the row needs to be read into the memory. A state of deadlock.
WHERE <condition> -- optional
GROUP BY <column(s)> -- groups the rows to apply aggregate function
HAVING <condition> -- aggregate function in the condition
ORDER BY <column(s)>; -- define the sorting order, optional
Note – GROUP BY clause is required with the HAVING clause. This is because the Having clause needs a group of data to apply an aggregate function and filter out the results.
How HAVING clause works?
Let us understand the working of the HAVING clause in SQL.
The GROUP BY clause always accompanies the HAVING clause. The GROUP BY clause groups together the data that match a certain criterion. It has three phases – split, apply, and combine. The split-phase divides the rows into groups. The apply phase applies some aggregate functions on the groups of data. The combined phase produces a single result by combining the groups with the aggregate function result.
Now that the groups are formed, the HAVING clause comes into the picture. The HAVING clause then filters out the groups which do not satisfy the given condition.
SELECT Col_A, avg(Col_B) as Col_B
GROUP BY Col_A
Thus, in the example above, we see that the table is first to split into three groups based on the column Col_A. The aggregate function to calculate the average of Col_B values is then applied to the groups. This results in a single row for each group. The rows are then combined and filtered based on the condition in the HAVING clause.
Now let us look at a real-world example. Consider we have the following table of customers and the orders that they have placed with us.
|2||Denny Cockett||México D.F.||Mexico|
|3||Eleanor Calnan||México D.F.||Mexico|
|12||Nora Reyna||Buenos Aires||Argentina|
|13||Ursula Laforest||México D.F.||Mexico|
|15||Portia Yee||São Paulo||Brazil|
|21||Pura Ray||São Paulo||Brazil|
Now, we want to know customers of which countries have placed a combined total of 5 or more orders with us. It could be a single customer placing more than 5 orders or 5 customers placing 1 order each.
To accomplish this, we would need to
Step 1: Join the two tables
Step 2: Group the customers based on their countries
Step 3: Count the number of orders for each group
Step 4: Filter the results for 5 or more orders
Let’s formulate the command:
SELECT C.Country, COUNT(O.OrderId) as NumberOfOrders -- Step 1,3
FROM Customers C -- Step 1
INNER JOIN Orders O on C.CustomerID = O.CustomerID -- Step 1
GROUP BY C.Country -- Step 2
HAVING COUNT(O.OrderId) >= 5 -- Step 4
ORDER BY COUNT(O.OrderId) DESC
Here are the results:
Conclusion – SQL HAVING Clause
Thus, we have seen what the purpose of the HAVING clause is and how does it work. It is important to understand the basic working, or else you may end up getting confused about why the HAVING clause is not producing the desired results. Keep playing with various tables and joins and combinations along with the HAVING clause.
This is a guide to the SQL HAVING Clause. Here we discuss working the HAVING clause in SQL and the example with the following table of customers. You can also go through our other suggested articles –