New to this series? Start with our intro post.
So far in this series, every query we’ve built has returned details, i.e. one row of output for every row selected in the database.
But sometimes, we want totals only, not details. Or maybe we want the biggest (maximum) or smallest (minimum) value of a column from the rows selected. These are called aggregates.
The Ins and Outs of Aggregates
In this post, we’ll focus on the “SUM” aggregate, which is just like SUM in Excel. It delivers the total amount of a column over a number of rows.
In the following example, we’ll develop a query to get the total dollar value of orders summarized by product for a given period in time.
We start with a basic order detail query:
As you can see, SQL Server returns 57 rows.
Before we proceed, let me make a few points about this query:
- Math on columns
In the above example, I have a calculated amount:
od.UnitPrice * od.Quantity = ExtendedAmt
The “*” performs multiplication, just like in Excel. SQL Server supports many math functions as long as columns are numeric.
- Alias for calculated amount
After I calculated the amount by multiplying Unit Price * Quantity, I assigned an alias to the column. I called it ExtendedAmt.
When creating a calculated column, you should always give it an alias. This makes it easy to reference in reporting and analysis tools.
- Brackets around Order Details
SQL Server, unlike most other databases, allows developers to put spaces and other special characters in table and column names. This is a bad idea. But as reporting people, we can’t do much about it. We just have to make sure to enclose any names within brackets.
- Date range
You’ll see in the output that OrderDate is a date/time (giving hours, minutes and seconds) while I’ve used only dates in my example. This can cause problems, but I’ll deal with that in my final output.
Now, back to our example.
We wanted sales by product. So, let’s trim the query. We need to add a GROUP BY clause and the aggregation SUM:
The GROUP BY clause states how we want to summarize the rows—in this case by ProductId.
Note that we still include an ORDER BY clause. Just because we group something does not mean the output will display in the same order.
This query returned 35 rows.
You might have noticed that we didn’t provide an alias for SUM(od.Quantity) so the TotalQty column has no name. We fix that here:
So we have a basic total. But it seems we’ve made things too simple—we don’t have the product description any longer.
Let’s add it back in:
Oops. That didn’t work. Let’s look at the error message:
Column ‘dbo.Products.ProductName’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
That means just what it says. Once you use GROUP BY, every column in the select list must:
- It must be included in the GROUP BY clause, or
- It must be enclosed with an aggregate, such as SUM, MAX or MIN.
Let’s try it with both methods.
First, let’s add ProductName to the GROUP BY function:
That works just fine.
Second, let’s surround “prod.ProductName” with an aggregate.
(In this case I use MAX. I could just as easily used MIN. For a given product, every ProductName is the same.)
That covers the basics of aggregates. But before we finish up, there’s one more clause I want to cover. It’s called HAVING.
HAVING vs. WHERE
HAVING is similar to WHERE in that it filters rows. But it’s different in that it filters rows based upon the output of all the other statements.
Let’s look at an example. Say we only want to see products that have had more than $1000 in sales for the month. We could use HAVING for this:
The query only returned 9 rows.
But again, the filter was based on the Total Amount for the product, and it works on the aggregate.
Let’s try the same example using WHERE.
If I put the comparison in the WHERE clause, it just doesn’t work:
Let’s try it again but make another mistake—we remove the aggregate. The SQL works but we don’t get what we expect:
We only get seven rows back.
Why? Because SQL is only finding rows where the Extended Amount is greater than $1000 for the individual order line, not the total amount.
Therefore, we need to use HAVING, as in the first example.