SQL for Super Users Part 2: SELECT Statements

If you’re reading this series for the first time, you’ll want to start with our introductory post. It explains everything you need to know about this series.

In Part 1 of this series, we covered some basic terminology.

In today’s post, we move on to create a couple simple queries using SELECT statements.

But first, you need to know these four terms:

SELECT specifies the list of columns you want to include in your output.

FROM specifies the list of tables you need to access

WHERE limits the output based on the values you specify

ORDER determines how to sort the output.

A Simple Query

Let’s start with a very simple query:

SELECT * FROM dbo.Products

If we enter this query in SSRS, we get the following output:

This is the simplest possible query in SQL. But even so it contains several mistakes:

  1. We didn’t specify any columns for output

We asked for everything, and we should never do this in the real world. Database tables can and do change over time. When we use the wild card, the output will change when the table changes. And this can cause reports to blow up.

  1. We didn’t limit output with a WHERE clause

With no limiting WHERE clause, this query selects every record in the table. Which is fine when you only have 77 records (as in the Northwind Products table we’re using). But it’s not fine when you have several million records in your sales table.

  1. We didn’t sort the output with an ORDER clause

The output looks like it’s sorted on ProductID, so what’s the issue? Unless you state it, order is NOT Guaranteed. If you want a particular order, you must specify it.

So, let’s fix the mistakes. Let’s start by specifying four columns:

Now, let’s select records using a WHERE clause where CategoryID = 1.

In the bottom right hand corner, we can see that SQL Server only returned 12 rows. The first query returned 77 rows.

When exploring data for the first time, sometimes it’s hard to figure out how to limit output when you don’t know what the data contains. In that case, we have another option for limiting data: The TOP function.

Below, instead of using a WHERE clause, we specify the number of records we want to return with TOP.

Even with no WHERE clause, SQL server only returned five records.

For one final example of a simple select, let’s sort the records using ORDER by ProductName:

Now, ProductID 39 comes before ProductID 38 because “chartreuse” comes before “cote” in alphabetical order.

In our next post in this series, we’ll get acquainted with a single table join.

 

SQL for Super Users Part 1: Basic Terminology

As mentioned in my introduction to this series, we’ll be using this series to give super users an overview of some SQL basics. The objective isn’t to turn you into a SQL programmer, but to give you enough knowledge to fix queries and understand what your IT team is driving at.

Throughout this series, we’ll be using the Northwind database. For more info about the series and how to play along at home, do read the introductory post first.

Now, let’s start by covering some basic terminology.

Tables, Rows, Columns

As a super user, you probably already understand some of this basic terminology. But let’s do a quick review to make sure we’re all on the same page.

A table is a collection of data, where each column contains a specific item of data and each row contains a set of items related to one particular key.

For our purposes, a table is just like an Excel sheet.

As an example, let’s look at the first 10 rows of the Products table in Northwind:

We have 10 columns:

  1. ProductID
  2. ProductName
  3. SupplierID
  4. CategoryID
  5. QuantityPerUnit
  6. UnitPrice
  7. UnitsInStock
  8. UnitsOnOrder
  9. ReorderLevel, and
  10. Discontinued.

And we have 10 rows that correspond to ProductID, numbered one through 10.

Many people refer to columns as “fields” and rows as “records.” And that’s fine if you know what you mean. But keep in mind that DBA’s and SQL gurus may secretly make fun of you. (Michael J. Swart explains why.)

So, if you want to be cool, you should use the terms columns and rows. As I’m definitely not cool, I still use the terms fields and records sometimes.

Primary Key

In a relationally correct database, every table has a unique column (or set of columns). The values in that column (or set of columns) uniquely identify each record. The column (or columns) is the primary key.

Going back to our example, ProductID is the primary key. It only refers to one description. Always, always, always. It’s the unique value that identifies the record.

So, for example, ProductID 1 refers to only one description, in this case “Chai.”

(It’s possible to have other constraints on a table, but let’s put that aside for now. It’s rarely important when querying data (although it is important when inserting data).

SSMS does you a favor by displaying a nice yellow key to the left of the primary key.

In our example, it’s in the Products table: ProductID:

Let’s look at a few other examples to drive the point home. In the Customers table, it’s CustomerID:

And in the Orders table it’s OrderID:

A side note: In these examples, the primary key happens to occur at the beginning of the table. This ISN’T ALWAYS the case.

Another side note: All of these examples have a single column primary key. However, that ISN’T ALWAYS the case either.

For example, the Order Details table has two columns in the key—OrderID and ProductID. Which makes sense because you can order multiple products on a given order:

In most systems, including GP and NAV, the primary key of the Order Detail table is order and order line, because customers might want to order the same thing twice on a given order.

Foreign Keys

Primary keys are great. But they aren’t that helpful for reporting. They are helpful when used to bring tables together, a.k.a. “join.”

Let’s go back to the Orders table. You may have noticed (if you were paying attention) that in addition to the gold key, there were also two silver keys, one for CustomerID and one for EmployeeID:

The silver keys indicate that these columns are foreign keys, which means values in these columns MUST exist in another table. (Usually, they’re the primary key of that table.)

In this case, customer values in the Orders table must also exist in the Customers table.

Two Important Points About Foreign Keys

  1. When writing a report on orders, you connect (or join) to customers to get the customer name using these keys.
  2. In GP and NAV (as well as many other ERP databases), you won’t see those helpful little silver keys. But the concept still holds true. For an order, the customer must exist in the customer table.

(An explanation of why GP and NAV don’t give you these helpful little hints is beyond the scope of these posts.)

So now that we have our basic terms down, let’s move on to create some real code—in our next blog post.

 

SQL for Super Users: A 7-Part Series

I like my customers to be as independent as possible. I hate consultants who pretend everything they do is brain surgery and beyond anyone else’s capability.

So, when GPUG asked for a volunteer to give a webinar on SQL, I proposed one on SQL for super users.

It’s not that I expect my customers to become programmers. But I believe educated consumers are the best customers. And a little knowledge of SQL can help data jockeys fix simple queries or at least better understand what IT has done to their system.

The feedback from the seminar was positive, in spite of a few challenges. (If you’ve ever given a webinar before, you’ll now how odd it is to talk to a screen with little immediate response, instead of an in-person audience.)

The audience consisted of all Dynamics user groups (GP, NAV, AX and CRM), which added another level of complexity.

I chose to use the generic Northwind database with some additions for my examples (so everyone would be equally confused).

A Series of Posts for SQL Super Users

Since then, I’ve turned the content of the webinar into a series of blog posts. Keep in mind that while I’ve written these posts for SQL newbies, I’m assuming you have data experience. Maybe you’re an Excel pivot table jockey. Or you’ve used a different kind of reporting tool (such as Crystal, Access or SSRS).

Over the course of the series, I’ll cover the following topics:

  • Database terminology
  • How to write a basic select statement
  • How to add single and multiple columns with joins
  • How to use the left outer join
  • How to use aggregates, such as sum, min and max
  • The difference between “having” and “where” clauses
  • Common pitfalls of “date” vs. “date and time” columns.

By the time we’re done, you should have a firm grasp of the following terms:

  • Table
  • Column
  • Row
  • Primary key
  • Foreign key

And a solid understanding of the following clauses:

  • SELECT
  • FROM
  • JOIN
  • WHERE
  • GROUP BY
  • ORDER BY
  • HAVING

Technical Prerequisites

Of course, it’s one thing to read these posts and another to play along with your own database.

Here’s what you’ll need to run the examples I use:

  1. SQL Server. All my examples work with SQL Server. (At Red Three, our focus is Microsoft.) If can’t use your company’s servers, you can download the free version.
  2. SQL Server Management Studio. You can use many tools to build queries. But SSMS is the standard for SQL people in the Microsoft world. Before, you used to have to install the database to get the management studio, but now there’s a standalone install.
  3. The Northwind database. The AdventureWorks database is more complex than Northwind, and I wanted to keep things simple for these examples. You can download Northwind here.

A Few Additional Notes

For some of my examples, I had to create a few new tables. I’m in the process of creating some sample documents that will include these new tables. Let us know if you’d like a copy. We’ll notify you when they’re ready.

I’m also planning to create examples using the sample data in NAV and GP. If you’d like a copy, let us know. We’ll notify you when they’re ready.

All the source code for the examples is also available. Contact us if you’d like the files.

Here are links to the parts of the series published thus far:

SQL for Super Users Part 1: Basic Terminology

SQL for Super Users Part 2: SELECT Statements

SQL for Super Users Part 3: Adding Another Table Using JOINs

SQL for Super Users Part 4: The Multi-Column Join

SQL for Super Users Part 5: The Left Outer Join

SQL for Super Users Part 6: Aggregates and Having vs. Where

SQL for Super Users Part 7: Date vs. Date-Time

 

SQL Server on Linux—Why It Doesn’t Matter to My BI Clients

Rarely does a technology I work with it make it to the pages of the Wall Street Journal, but Microsoft’s recent announcement that SQL Server will be available for Linux is one of those times.

I’m sure I’ll get questions about this. So, I thought I’d prepare my answers here. (If you want analysis of this announcement from a technical guru, check out Brent Ozar’s post.)

SQL Server on Linux is Irrelevant to Some

For my mid-market BI clients, the announcement is irrelevant. Yes, Linux has tremendous market share in many areas of web development and start up software. But in mid-market ERP and BI, it’s almost non-existent.

So if a customer asks about running SQL Server on Linux, I’ll make the point that, for them and other mid-market companies, SQL Server on Linux isn’t a mainstream solution.

And I really like mainstream solutions, for two reasons:

1. Continuity

If a bus takes out my team and I, the customer can move on (preferably after a brief period of mourning).

2. Lower costs

Mainstream solutions are more cost effective, which I’ll explain.

System Costs vs. People Costs

In a mid-sized business, the cost of an operating system is a small percentage of the cost of the software. It can be as little as five percent.

And the software cost is usually a small percentage of the lifetime cost of the people who implement, extend and maintain that software. In a simple environment, the software cost may equal the people cost. In a more complex environment, the software cost may be only 10 percent of the people cost.

So, if you want to keep costs down, you need to focus on the people cost. And to do that, you need a mainstream solution. Because with a mainstream solution, the required knowledge and skill sets are more widely available.

In addition, when you come across an issue (and all software has issues), you might even be able to find what you need through an Internet search. (This is absolutely NOT to say that the Internet is a substitute for training and experience.)

But if you’re on a platform used by a small number of similarly sized companies, getting help isn’t going to be easy. And the couple of thousand dollars you saved on operating licenses will be quickly burnt in labor and system downtime.

The same argument applies to SQL Server for Linux. If it takes off, the first people who know anything about it will be able to charge a premium—if you can find them.

And good luck finding any hints or tricks on the Internet.

Again, I want to repeat that this is true in my mid-market ERP/BI universe. If you’re developing the next great web app, the economics can be totally different. Which is why Linux is a dominant force in certain areas of the market.

We’re surrounded by vast amounts of different technologies. The key to success is learning what’s not relevant.

 

Understanding the Left Outer Join

In May 2016, we expanded on the topic of joins (and left outer joins) in a series of posts on SQL for super users.

While we’ve left this original post in place (below), we also recommend that you check out the series for a more detailed exploration of the topic. Here are the links:

***

You’ve taken on the challenge of writing reports—whether in Crystal Reports, Access or any similar tool. Once you’ve managed to get a basic report out, one of the first “intermediate” tasks you’ll need to understand is the “Left Outer Join.”

As every report writer understands, any given report requires multiple tables. For example when you create a list of vendor invoices from the invoice table, you’ll want to get the vendor name from the vendor table. That’s a simple example—if your system is working, for every invoice you have you expect to have a vendor record located somewhere.

But what happens if you’re not certain whether a certain piece of data exists? You use a Left Outer Join. Here’s an example (based on a recent project for a customer which will provided the basis for a future post on null values in Crystal Reports):

Imagine you’re a toy distributor. You stock many popular games: Sorry!, Othello, Connect 4 and Monopoly. So you have a basic table with all your items and the current Stock on Hand (SOH). This “Item Master” data might look something like this:

Simple enough. But what if your buyer, in looking through this quick inventory list, wants to know the next date that you can expect more inventory in? When will you be able to start selling Othello again?

In a standard ERP system, that information is most likely stored in a purchase order line file, which may look something like this:

If you created a report combining the SOH from the “item master” and the expected date from the line file, this would be your result:

What happened to Sorry!? Sure, there’s no PO open for Sorry!, but you still wanted to see the SOH.

In every basic reporting system, the default join is called an “Inner Join”, meaning that you need to have data in every table you use. But that doesn’t always happen – sometimes there is no data in a given field.

If you switch the option to a “Left Outer Join” that will tell the system that you want EVERY record from your item master with whatever records may or may not exist in your purchase order line file. And when you switch the option to Left Outer Join you get the following output:

There are different ways to specify the join in the various systems. But if you are always going to have data in one table, and you may or may not have data in another table, you need to do a Left Outer Join.

 

Get tips and insights delivered to your inbox

Start a conversation with us

Call: 917-848-7284
Email: inquiries@redthree.com

Request a Consult