Querying data from a single table doesn’t help much in the real world. We need to be able to query more than one. And to do that we have to use joins.
Joins are a big topic. Even without going into too much detail, I’m going to need three posts to give an overview.
Let’s start with a simple example. We want to add category name to the previous query we built.
Here’s the query against the Products table as it currently exists:
And here’s the Categories table:
If we look at the columns in SSMS, we see that CategoryID is the primary key:
And we can see that CategoryID is a foreign key:
As mentioned in Part 1 of this series, in most ERP systems, the foreign key is implicit. You can’t always rely on the handy visual cue of these little silver keys.
Let’s put the two tables together. (I’m going to intentionally make some mistakes because I still make these mistakes. And I’m sure you’ll make these mistakes too when you get started.)
First, we add the second table to the FROM clause and the CategoryName column to the SELECT statement:
As you can see, SQL Server returns an error message, saying that CategoryID is ambiguous.
This means that the identical column name exists in both tables. Consequently, SQL Server doesn’t know which one we’re referring to.
We run the query again, specifying which table we want:
We put the table name before the ambiguous column name, and the error message goes away. Even better, we now have CategoryName in our output.
But wait, we now have 96 rows in our output! Taking a closer look, we can see that ProductID 1 shows up eight times—one for every category in the table. We’ll address that in a minute.
But first, I’m not crazy about how the query looks. Repeating the full table name requires a lot of typing or cutting/pasting. And also, some columns have table names while others don’t. As we add other tables, we’ll probably have to go back and fix other columns to solve other ambiguities.
To solve this problem, we create a few aliases:
After each table we added a little abbreviation—“prod” for “dbo.Products” and “cat” for “dbo.Categories.” These are called aliases.
After we created the aliases, we then prefixed each of the column names, wherever they appeared in the query, with the appropriate alias.
This is an excellent practice for building queries. While many things change, one rule holds true: People ask for more data on every report you build. And therefore, your queries become more complex.
Creating comprehensible and consistent aliases makes life easy for you and anyone else who looks at your code. (And don’t get lazy and try to use aliases “a,” “b,” and “c.” Those won’t work for long.)
Now let’s get back to the issue of our query returning 96 rows. Why is that happening?
It’s happening because when we added the extra table, we didn’t tell SQL Server how to match records in our tables. This is what JOIN is for.
To fix this, we simply add a JOIN clause and tell SQL Server which columns should match:
Thanks to the added JOIN, we’re now back to 12 rows.
We continue the topic of joins in part 4 of this series, where we’ll cover the multi-column join.