Click here to monitor SSC
  • Av rating:
  • Total votes: 418
  • Total comments: 5
Robert Sheldon

Subqueries in SQL Server

26 May 2011

Subqueries and derived tables can add great versatility to SQL statements, cut down complexity, but can occasionally be a curse when their effect on performance is poorly understood. Surely everyone understands the various types of subqueries and how they are used? If you felt a twinge of doubt, here is Rob Sheldon's easy guide to the subject.

Few elements within a Transact-SQL statement are as versatile as the subquery. A subquery—also referred to as an inner query or inner select—is a SELECT statement embedded within a data manipulation language (DML) statement or nested within another subquery. You can use subqueries in SELECT, INSERT, UPDATE, and DELETE statements wherever expressions are allowed. For instance, you can use a subquery as one of the column expressions in a SELECT list or as a table expression in the FROM clause.

A DML statement that includes a subquery is referred to as the outer query. The following guidelines provide details about how to implement subqueries in your outer queries or in other subqueries:

  • You must enclose a subquery in parenthesis.
  • A subquery must include a SELECT clause and a FROM clause.
  • A subquery can include optional WHERE, GROUP BY, and HAVING clauses.
  • A subquery cannot include COMPUTE or FOR BROWSE clauses.
  • You can include an ORDER BY clause only when a TOP clause is included.
  • You can nest subqueries up to 32 levels.

There are several ways you can categorize subqueries—by the number of results they returns, whether they’re correlated (linked to the outer query), or where they’re used within a DML statement. For the purposes of this article, I take the last approach and explain how subqueries can be implemented in the SELECT, FROM, and WHERE clauses of a SELECT statement. Although you can implement subqueries in other clauses and other statement types, the examples I provide should demonstrate the essential principles of how subqueries can be used in any circumstances. (The examples all return data from the AdventureWorks2008 database on a local instance of SQL Server 2008.)

NOTE: Microsoft documentation states that subqueries perform about the same as statements that are semantically equivalent, such as subqueries and joins. However, if existence must be checked (as will be described later in the article), a join often performs better if the subquery must be processed for each row returned by the outer query.

Adding Subqueries to the SELECT Clause

You can add a subquery to a SELECT clause as a column expression in the SELECT list. The subquery must return a scalar (single) value for each row returned by the outer query. For example, in the following SELECT statement, I use a subquery to define the TotalQuantity column:

SELECT

  SalesOrderNumber,

  SubTotal,

  OrderDate,

  (

    SELECT SUM(OrderQty)

    FROM Sales.SalesOrderDetail

    WHERE SalesOrderID = 43659

  ) AS TotalQuantity

FROM

  Sales.SalesOrderHeader

WHERE

  SalesOrderID = 43659;

Notice I’ve inserted the subquery as the fourth column expression in the SELECT list and named the column TotalQuantity. The subquery itself is enclosed in parentheses and made up of a single SELECT statement. The statement retrieves the total number of items sold for sales order 43659. Because there are multiple line items in this order, I used the SUM aggregate function to add the numbers together and return a single value. The following table shows the result set returned by the outer SELECT statement.

SalesOrderNumber

SubTotal

OrderDate

TotalQuantity

SO43659

24643.9362

2001-07-01 00:00:00.000

26

As the results show, the outer SELECT statement returns a single row from the SalesOrderHeader table for order 43659, and the TotalQuantity column itself returns a value of 26. If you were to run the subquery’s SELECT statement on its own (without running the outer query), you would also receive a value of 26. However, by running the SELECT statement as a subquery within the outer SELECT statement, the total number of items sold is now provided as part of the order information.

You can use a subquery anywhere in a SQL Statement where an expression is allowed. For the next example we’ll use it as part of a CASE statement. In the following example, I use a CASE expression and subquery to check whether line item sales totals in the SalesOrderDetail table equals the sales subtotal listed in the SalesOrderHeader table:

SELECT

  SalesOrderNumber,

  SubTotal,

  OrderDate,

  CASE WHEN

    (

      SELECT SUM(LineTotal)

      FROM Sales.SalesOrderDetail

      WHERE SalesOrderID = 43659

    ) =  SubTotal THEN 'balanced'

    ELSE 'not balanced'

  END AS LineTotals

FROM

  Sales.SalesOrderHeader

WHERE

  SalesOrderID = 43659;

I’ve included the CASE expression as part of the fourth column expression. The CASE expression uses the subquery to total the line item sales in the SalesOrderDetail table for order 43659. Notice that, as in the preceding example, the subquery is enclosed in parentheses and uses the SUM aggregate function to return a single value. I then use an equal (=) operator to compare the subquery’s result to the SubTotal column in the SalesOrderHeader table. If the amounts are equal, the CASE expression returns a value of balanced. It the values are not equal, CASE returns not balanced. The following table shows the results returned by the outer SELECT statement.

SalesOrderNumber

SubTotal

OrderDate

LineTotals

SO43659

24643.9362

2001-07-01 00:00:00.000

not balanced

As you can see, the line item sales total in the SalesOrderDetail table does not match the subtotal in the SalesOrderHeader table, at least not for sale 43659. However, suppose you want to verify all the sales listed in the two tables to see whether the totals balance. To do so, you must modify both the subquery and the outer query in order to create the condition necessary to support a correlated subquery. A correlated subquery, also known as a repeating subquery, is one that depends on the outer query for specific values. This is particularly important if your outer query returns multiple rows.

The best way to understand how correlated subqueries work is to look at an example. In the following SELECT statement, I include a CASE expression as one of the column expressions, as you saw in the preceding example:

SELECT

  SalesOrderNumber,

  SubTotal,

  OrderDate,

  CASE WHEN

    (

      SELECT SUM(LineTotal)

      FROM Sales.SalesOrderDetail d

      WHERE d.SalesOrderID = h.SalesOrderID

    ) =  h.SubTotal THEN 'balanced'

    ELSE 'not balanced'

  END AS LineTotals

FROM

  Sales.SalesOrderHeader h;

As before, the CASE expression includes a subquery that returns the total amount for line item sales. However, notice that the subquery’s WHERE clause is different from the previous example. Instead of specifying an order ID, the WHERE clause references the SalesOrderID column from the outer query. I do this by using table aliases to distinguish the two columns—h for SalesOrderHeader and d for SalesOrderDetail—and then specifying that the column values must be equal for the WHERE condition to evaluate to true. That means that, for each row in the SalesOrderHeader table returned by the outer query, the SalesOrderID value associated with that row is plugged into the subquery and compared with the SalesOrderID value of the SalesOrderDetail table. As a result, the subquery is executed for each row returned by the outer query.

The value returned by the subquery is then compared to the SubTotal column of the SalesOrderHeader table and a value for the LineTotals column is provided, a process repeated for each row. The following table provides a sample of the data returned by the outer query.

SalesOrderNumber

SubTotal

OrderDate

LineTotals

SO61168

1170.48

2003-12-31 00:00:00.000

balanced

SO61169

619.46

2003-12-31 00:00:00.000

balanced

SO61170

607.96

2003-12-31 00:00:00.000

balanced

SO61171

553.97

2003-12-31 00:00:00.000

balanced

SO61172

2398.05

2003-12-31 00:00:00.000

balanced

SO61173

34851.8445

2004-01-01 00:00:00.000

not balanced

SO61174

8261.4247

2004-01-01 00:00:00.000

not balanced

SO61175

30966.9005

2004-01-01 00:00:00.000

not balanced

SO61176

1570.725

2004-01-01 00:00:00.000

not balanced

SO61177

25599.8392

2004-01-01 00:00:00.000

not balanced

SO61178

3227.0112

2004-01-01 00:00:00.000

not balanced

SO61179

47199.0054

2004-01-01 00:00:00.000

not balanced

SO61180

4208.8078

2004-01-01 00:00:00.000

not balanced

SO61181

36564.9023

2004-01-01 00:00:00.000

not balanced

SO61182

63162.5722

2004-01-01 00:00:00.000

not balanced

SO61183

35.0935

2004-01-01 00:00:00.000

not balanced

SO61184

113451.8266

2004-01-01 00:00:00.000

not balanced

SO61185

554.0328

2004-01-01 00:00:00.000

not balanced

SO61186

39441.4489

2004-01-01 00:00:00.000

not balanced

SO61187

65.988

2004-01-01 00:00:00.000

balanced

SO61188

58992.9256

2004-01-01 00:00:00.000

not balanced

As you can see, some of the totals balance out, and others do not. Again, the important thing to keep in mind with correlated subqueries is that the subquery is executed for each row returned by the outer query. The correlated subquery then uses a value supplied by the outer query to return its results. For more details about correlated subqueries, see the topic “Correlated Subqueries” in SQL Server Books Online.

Adding Subqueries to the FROM Clause

The subquery examples in the previous section each return a single value, which they must do in order to be used in the SELECT clause. However, not all subquery results are limited in this way. A subquery can also be used in the FROM clause to return multiple rows and columns. The results returned by such a subquery are referred to as a derived table. A derived table is useful when you want to work with a subset of data from one or more tables without needing to create a view or temporary table. For instance, in the following example, I create a subquery that retrieves product subcategory information from the ProductSubcategory table, but only for those products that include the word “bike” in their name:

SELECT

  p.ProductID,

  p.Name AS ProductName,

  p.ProductSubcategoryID AS SubcategoryID,

  ps.Name AS SubcategoryName

FROM

  Production.Product p INNER JOIN

  (

    SELECT ProductSubcategoryID, Name

    FROM Production.ProductSubcategory

    WHERE Name LIKE '%bikes%'

  ) AS ps

  ON p.ProductSubcategoryID = ps.ProductSubcategoryID;

The first thing to notice is that the subquery returns a derived table that includes two columns and multiple rows. Because the subquery returns a table, I can join that table, which I’ve named ps, to the results from the Product table (p). As the join demonstrates, you treat a subquery used in the FROM clause just as you would treat any table. I could have just as easily created a view or temporary table—or even added a regular table to the database—that accesses the same data as that available through the subquery.

I defined the join based on the subcategory ID in the derived table and Product table. I was then able to include columns from both these tables in the SELECT list, as I would any type of join. The following table shows a subset of the results returned by the outer query.

ProductID

PeoductName

SubcategoryID

SubcategoryName

786

Mountain-300 Black, 40

1

Mountain Bikes

787

Mountain-300 Black, 44

1

Mountain Bikes

788

Mountain-300 Black, 48

1

Mountain Bikes

789

Road-250 Red, 44

2

Road Bikes

790

Road-250 Red, 48

2

Road Bikes

791

Road-250 Red, 52

2

Road Bikes

792

Road-250 Red, 58

2

Road Bikes

793

Road-250 Black, 44

2

Road Bikes

794

Road-250 Black, 48

2

Road Bikes

795

Road-250 Black, 52

2

Road Bikes

796

Road-250 Black, 58

2

Road Bikes

797

Road-550-W Yellow, 38

2

Road Bikes

798

Road-550-W Yellow, 40

2

Road Bikes

799

Road-550-W Yellow, 42

2

Road Bikes

800

Road-550-W Yellow, 44

2

Road Bikes

801

Road-550-W Yellow, 48

2

Road Bikes

953

Touring-2000 Blue, 60

3

Touring Bikes

954

Touring-1000 Yellow, 46

3

Touring Bikes

955

Touring-1000 Yellow, 50

3

Touring Bikes

As you can see, the results include the subcategory names, which are taken from the derived table returned by the subquery. Because I was able to join the Product table to the derived table, I was able to match the subcategory names to the product names in the outer query’s result set.

Adding Subqueries to the WHERE Clause

Another common way of implementing subqueries in a DML statement is to use them to help define conditions in the WHERE clause. For instance, you can use comparison operators to compare a column’s value to a value returned by the subquery. In the following example, I use the equal (=) operator to compare the BusinessEntityID value in the Person table to the value returned by a subquery:

SELECT

  BusinessEntityID,

  FirstName,

  LastName

FROM

  Person.Person

WHERE

  BusinessEntityID =

  (

    SELECT BusinessEntityID

    FROM HumanResources.Employee

    WHERE NationalIDNumber = '895209680'

  );

The subquery retrieves the BusinessEntityID value from the Employee table for the employee whose national ID is 895209680. The BusinessEntityID value from the subquery is then compared to the BusinessEntityID value in the Person table. If the two values are equal, the row is returned, as shown in the following results.

SELECT

  p.BusinessEntityID,

  p.FirstName,

  p.LastName,

  s.SalesQuota

FROM

  Person.Person p INNER JOIN

  Sales.SalesPerson s

  ON p.BusinessEntityID = s.BusinessEntityID

WHERE

  s.SalesQuota IS NOT NULL AND

  s.SalesQuota >

  (

    SELECT AVG(SalesQuota)

    FROM Sales.SalesPerson

  );

In the subquery, I use the AVG aggregate function to find the average sales quota figure. This way, the subquery returns only one value. I can then compare that value to the SalesQuota column. If the SalesQuota figure is greater than the average, the WHERE expression evaluates to true, and the row is returned by the outer query. Otherwise, the expression evaluates to false and the row is not returned. As the following table shows, only three rows have a SalesQuota value greater than the average.

BusinessEntityID

FirstName

LastName

SalesQuota

275

Michael

Blythe

300000.00

279

Tsvi

Reiter

300000.00

284

Tete

Mensa-Annan

300000.00

At times, you might want to compare your column to a list of values, rather than a single value, in which case you can use one of the following keywords to modify the comparison modifier:

  • ALL: The column value is compared to all values returned by the subquery.
  • ANY: The column value is compared to the one most applicable distinct value.
  • SOME: The ISO equivalent to ANY.

The best way to understand how these modifiers work is to see them in action. In the following example, I use the ANY modifier along with the greater than (>) operator to compare the SalesQuota column to the list of SalesQuota values returned by the subquery:

SELECT

  p.BusinessEntityID,

  p.FirstName,

  p.LastName,

  s.SalesQuota

FROM

  Person.Person p INNER JOIN

  Sales.SalesPerson s

  ON p.BusinessEntityID = s.BusinessEntityID

WHERE

  s.SalesQuota IS NOT NULL AND

  s.SalesQuota > ANY

  (

    SELECT SalesQuota

    FROM Sales.SalesPerson

  );

In this case, the subquery returns a list of values, rather than one value. I can return a list because I’m using the ANY modifier. As a result, the SalesQuota value for each row returned must be greater than any of the values returned by the subquery. In other words, as long as the SalesQuota value exceeds any one value returned by the subquery, that row is returned. As the following results indicate, only three rows in the SalesPerson table have SalesQuota values that exceed at least one of the values returned by the subquery.

BusinessEntityID

FirstName

LastName

SalesQuota

275

Michael

Blythe

300000.00

279

Tsvi

Reiter

300000.00

284

Tete

Mensa-Annan

300000.00

The next example is identical to the preceding one, except that I use the ALL modifier to qualify the comparison operator:

SELECT

  p.BusinessEntityID,

  p.FirstName,

  p.LastName,

  s.SalesQuota

FROM

  Person.Person p INNER JOIN

  Sales.SalesPerson s

  ON p.BusinessEntityID = s.BusinessEntityID

WHERE

  s.SalesQuota IS NOT NULL AND

  s.SalesQuota > ALL

  (

    SELECT SalesQuota

    FROM Sales.SalesPerson

  );

Because I’ve used the ALL modifier, each row returned must have a SalesQuota value that exceeds all the values returned by the subquery. In other words, the SalesQuota value must exceed the highest value returned by the subquery. As it turns out, no row has a SalesQuota value that exceeds all the values returned by the subquery, so the statement now returns no rows.

Another operator that lets you work with a subquery that returns a list is the IN operator. The column value is compared to the list, and the WHERE expression evaluates to true if any of the subquery values matches the column value. For example, the following SELECT statement includes a subquery that returns a list of IDs for sales representatives:

SELECT

  BusinessEntityID,

  FirstName,

  LastName

FROM

  Person.Person

WHERE

  BusinessEntityID IN

  (

    SELECT BusinessEntityID

    FROM HumanResources.Employee

    WHERE JobTitle = 'Sales Representative'

  );

The BusinessEntityID value from the outer query is compared to the list of ID values returned by the subquery. If the BusinessEntityID value matches one of the values in the subquery list, the row is included in the outer query’s results, as shown in the following results:

BusinessEntityID

FirstName

LastName

275

Michael

Blythe

276

Linda

Mitchell

277

Jillian

Carson

278

Garrett

Vargas

279

Tsvi

Reiter

280

Pamela

Ansman-Wolfe

281

Shu

Ito

282

José

Saraiva

283

David

Campbell

284

Tete

Mensa-Annan

286

Lynn

Tsoflias

288

Rachel

Valdez

289

Jae

Pak

290

Ranjit

Varkey Chudukatil

If you want to return only those rows whose BusinessEntityID value does not match any values in the list returned by the subquery, you can instead use the NOT IN operator, as in the following example:

SELECT

  BusinessEntityID,

  FirstName,

  LastName

FROM

  Person.Person

WHERE

  BusinessEntityID NOT IN

  (

    SELECT BusinessEntityID

    FROM HumanResources.Employee

    WHERE JobTitle = 'Sales Representative'

  );

This statement is exactly the same as the preceding example except for the use of the NOT IN operator, but the results are quite different. Rather than returning 14 rows, one for each sales representative, the statement now returns nearly 20,000 rows, one for each person who is not a sales representative.

One other method you can use when including a subquery in your WHERE clause is to check for existence. In this case, you use the EXIST keyword to verify whether the subquery returns a row that matches your search criteria. The subquery doesn’t produce any data but instead returns a value of true or false, depending on whether the row exists. For example, in the following SELECT statement, I use a correlated subquery to check the name of each product’s subcategory to determine whether that name is Mountain Bikes:

SELECT ProductID, Name AS ProductName FROM Production.Product p WHERE EXISTS ( SELECT * FROM Production.ProductSubcategory s WHERE p.ProductSubcategoryID = s.ProductSubcategoryID AND s.Name = 'Mountain Bikes' );

For each row returned by the outer query, the existence of a row returned by the correlated subquery is checked. If a row is returned by the subquery, the existence test evaluates to true, and the outer query’s row is included in the result set. The following table shows a partial list of the results returned by the outer query, after checking for existence.

ProductID

ProductName

771

Mountain-100 Silver, 38

772

Mountain-100 Silver, 42

773

Mountain-100 Silver, 44

774

Mountain-100 Silver, 48

775

Mountain-100 Black, 38

776

Mountain-100 Black, 42

777

Mountain-100 Black, 44

778

Mountain-100 Black, 48

779

Mountain-200 Silver, 38

780

Mountain-200 Silver, 42

781

Mountain-200 Silver, 46

782

Mountain-200 Black, 38

783

Mountain-200 Black, 42

784

Mountain-200 Black, 46

785

Mountain-300 Black, 38

786

Mountain-300 Black, 40

For each row included in the results, the existence test evaluated to true. In other words, the returned rows are part of the Mountain Bikes subcategory.

You can also return results for rows whose existence test returns false by using the NOT EXIST operator, as shown in the following example:

SELECT

  ProductID,

  Name AS ProductName

FROM

  Production.Product p

WHERE NOT EXISTS

  (

    SELECT *

    FROM Production.ProductSubcategory s

    WHERE p.ProductSubcategoryID = s.ProductSubcategoryID

      AND s.Name = 'Mountain Bikes'

  );

Now the statement returns only those rows that are not part of the Mountain Bikes subcategory. Any row whose existence test returns a true is not included in the results.

Conclusion

As the examples in the article demonstrate, subqueries are a flexible and versatile tool. You can define them wherever an expression is allowed in a SELECT, INSERT, UPDATE, or DELETE statement. You can then use the data returned by the subquery in your outer query to make those statements more powerful and ultimately more useful to your various applications. For more information about subqueries, see the topic “Subquery Fundamentals” as well as other topics that address how to use subqueries in SQL Server Books Online.

The Select clause, showing how scalar subqueries can be used within them (A full-size PDF version is available from the speech-bubble at the top of the article

Robert Sheldon

Author profile:

After being dropped 35 feet from a helicopter and spending the next year recovering, Robert Sheldon left the Colorado Rockies and emergency rescue work to pursue safer and less painful interests—thus his entry into the world of technology. He is now a technical consultant and the author of numerous books, articles, and training material related to Microsoft Windows, various relational database management systems, and business intelligence design and implementation. He has also written news stories, feature articles, restaurant reviews, legal summaries, and the novel 'Dancing the River Lightly'. You can find more information at http://www.rhsheldon.com.

Search for other articles by Robert Sheldon

Rate this article:   Avg rating: from a total of 418 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: performance
Posted by: Peter Lake (not signed in)
Posted on: Tuesday, May 31, 2011 at 4:02 AM
Message: After the brief message that the effect on performance needs to be understood there was no information on performance, which made the article not too useful.

Subject: performance
Posted by: Anonymous (not signed in)
Posted on: Tuesday, May 31, 2011 at 6:31 AM
Message: although performance part was not covered, from my experience it is good some where and bad some where...Use it with a caution but when it comes to cross database queries, it gives a good performence with derived table or in above subquery in from clause (specially if the data is too big).

Subject: Re: Performance
Posted by: Andrew Clarke (view profile)
Posted on: Tuesday, May 31, 2011 at 7:08 AM
Message: The performance issues will need to be a separate article, but Robert explains about the correlated subquery 'As a result, the subquery is executed for each row returned by the outer query.' The use of a derived table can speed up a query if it avoids a second pass through a large table. The use of a subquery in a WHERE clause can be faster than the equivalent join. To explain the detail and qualifications of all this in an introductory article about subqueries would be reckless, but it might make an interesting separate article. Apoologies if the 'blurb' misled readers into expecting an article on the performance of subqueries!

Subject: Looking for article on performance imacts of subqueries
Posted by: Anonymous (not signed in)
Posted on: Tuesday, May 31, 2011 at 11:36 AM
Message: Yeah, as noted by someone else, I was really reading this in the hope of learning more about the impact of subqueries on the performance. I was looking for guidelines on when and where to use subqueries. I already knew most of what was in this article.

Subject: FROM in a subquery
Posted by: OLAP Man (view profile)
Posted on: Tuesday, October 02, 2012 at 3:35 PM
Message: I was floored upon recently discovering this, but in fact (in TSQL at least), "FROM" is *not* required in a subquery. You can refer to columns from the outer query without mentioning a FROM.

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

SQL Server XML Questions You Were Too Shy To Ask
 Sometimes, XML seems a bewildering convention that offers solutions to problems that the average... Read more...

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.