John Magnabosco

SQL Server Development and Data Security

Comparing WHERE and FROM Filtering

Published Monday, January 04, 2010 5:47 AM

I am kicking off the new year with a quick jaunt back in time through Mr. Peabody's WABAC Machine. The blog entry below was published on April 30, 2008, during my pre-Simple-Talk blogging days. The topic below is something that I have found to be a valuable consideration in optimizing queries. Enjoy!

As we work with other Professionals in our industry, we quickly learn that there are many ways to accomplish a given task. When presented with multiple options in a situation, there are times where one option stands out as the optimal choice based upon our general knowledge of the database engine.

There are also times where the optimal choice may not be so obvious and will vary depending on many considerations such as database architecture, level of use, indexing, hardware configuration or general best practices. Any given approach may perform superbly with one database and drag on another.

A sample case might be that we are given two versions of a seemingly simple query that filters the data. One option is the common use of a WHERE clause. The other is utilizing the filters within the JOIN clause. The examples below illustrate the syntax differences between these options.

WHERE Filter Option

SELECT
    [TABLE1].[FIELD1]
FROM
    [TABLE1]
    INNER JOIN [TABLE2]
        ON [TABLE1].[FIELD1] = [TABLE2].[FIELD1]
    INNER JOIN [TABLE3]
    ON [TABLE2].[FIELD2] = [TABLE3].[FIELD2]
WHERE
    [TABLE1].[FIELD1] = 700
    AND [TABLE2].[FIELD2] > 1000
    AND [TABLE3].[FIELD2] > 1000

JOIN Filter Option

SELECT
    [TABLE1].[FIELD1]
FROM
    [TABLE1]
    INNER JOIN [TABLE2]
        ON [TABLE1].[FIELD1] = [TABLE2].[FIELD1]
        AND [TABLE1].[FIELD1] = 700
        AND [TABLE2].[FIELD2] > 1000
    INNER JOIN [TABLE3]
        ON [TABLE2].[FIELD2] = [TABLE3].[FIELD2]
        AND [TABLE3].[FIELD2] > 1000


When these are executed, the Query Optimizer evaluates the query and determines the best execution plan for the statement. In this case, Query Optimizer determined that both options should utilize the same execution plan; but since these statements are syntactically unique they are cached separately.

Execution Plan

Nested Loops(Inner Join)
|--Nested Loops(Inner Join, OUTER REFERENCES:([TABLE2].[FIELD2]) WITH PREFETCH)| |--Clustered Index Seek(OBJECT:([TABLE2].[TABLE2_P]),SEEK:([TABLE2].[FIELD1]=700 AND [TABLE2].[FIELD2] > 1000) ORDERED FORWARD)| |--Clustered Index Seek(OBJECT:([TABLE3].[TABLE3_P]), SEEK:([TABLE3].[FIELD2]=[TABLE2].[FIELD2]), WHERE:([TABLE3].[FIELD2]>1000) ORDERED FORWARD)
|--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([TABLE1]))
|--Index Seek(OBJECT:([TABLE1].[IX_TABLE1_2]), SEEK:([TABLE1].[FIELD1]=700) ORDERED FORWARD)

In review of SQL Server Profiler, the duration of the T-SQL batch appear identical (both reflected the value of 20 in my sample). A review of the sysprocesses system table reveals some interesting information about these two options. The CPU, Physical IO (Reads) and Memory usage are significantly different.


JOIN Option
CPU: 200
PIO: 5
MEM: 15

WHERE Option
CPU: 19110
PIO: 707
MEM: 38

According to this information, the performance of the filtering that occurred within the JOIN clause performed much better than the WHERE clause option. To ensure that the initial differences in the performance was not unique to the first execution, I ran the same queries multiple times, obtaining a cache hit for each execution. The results were consistent.

It is important to note that these numbers and results are specific to my test environment. The results reflected in your specific environment may differ; but through the sample above the process of evaluating and selecting the most optimal solution is illustrated.

by Johnm

Comments

 

mcflyamorim said:

Hello John, good to know that, but in my tests the sysprocesses and profiler shows the same values...

What do you do to get the physical IO ? Have you stooped SQL Server for each execution?

Can you post the script used to populate the table1, table2 and table3?

Cheers
Fabiano Amorim
January 4, 2010 10:55 AM
 

timothyawiseman@gmail.com said:

This is very interesting.  I have previously done some quick and basic testing of where versus from join clauses and I have not been able to generate a difference in query performance at all.  I consistently use the join-on syntax since I find it easier to read I know I will have to maintain the code I write.

Can I ask which version of SQL Server you were using?  (I used SQL Server 2005 SP2).  
January 4, 2010 11:50 AM
 

laerte said:

Interesting.
I did the tests here and also had no difference. I'm no expert in QP, but what intrigued me most is that, I am not mistaken, when we use a condition to join and apply (outer joins I belive not), the QP is a further step called Predicate Pushdown in which he tries to pass the condition to where.This case, in theory, the condition in where or join would not have any difference
January 4, 2010 1:22 PM
 

Johnm said:

The real impact of this approach is realized when joining large tables. The idea here is that the volume of rows are reduced prior to their joining. I have implemented this approach (using SQL Server 2005 SP3) to optimize queries with great success.
January 4, 2010 8:18 PM
 

Saggi Neumann said:

Hey John,

This doesn't make much sense because as laerte mentioned, the query optimizer pushes WHERE predicates below the joins if it doesn't change the logic of the query (i.e. the physical query order and logical query order may differ, as long as you get the result according to the logical order).

As you've noticed, the QP is identical and you see index seeks occuring before the join. My guess is that either you ran some other statements on one SPID and that accumulated into the cpu/io/memory usage of that SPID, or that you didn't clean the buffers between executions and that's why the first query you executed did all the physical IO and the second already had the data pages in memory.

Try to reproduce on fresh connections, make sure to clean the buffers between executions and you'll see no difference.
January 5, 2010 2:36 AM
 

Michael Lutz said:

Outside of performance there's another reason to choose the FROM or WHERE clause that is often overloooked.  If the join is an OUTER JOIN the choice can make a difference in the logic.  Outer rows are added in **after** the OUTER JOIN criteria is applied, whereas the outer rows are added **before** the WHERE clause filter is applied.  So if you don't want the criteria to filter out the outer rows, place the filter logic in the corresponding OUTER JOIN clause.  If you do want the filter criteria to filter out outer rows, then place the logic in the WHERE clause.  Itzik BenGan covers this extensively in logic query processing in his books.
January 10, 2010 1:26 PM
 

Phil Brammer said:

Hmm...  Are you sure the plans are identical?  If these tables are large, are the statistics up to date?

Also, when making performance claims, as you have done, it is best to include the SQL Server version as well as repro scripts so that the rest of us can dig into it a bit more.

Did you issue a DBCC FREEPROCCACHE between each test?

January 11, 2010 8:45 AM
 

laerte said:

January 15, 2010 4:26 AM
You need to sign in to comment on this blog

About Johnm

John Magnabosco manages the Data Services Group at one of the fastest growing companies in the United States. He is also the current President and Co-Founder of the Indianapolis Professional Association for SQL Server (IndyPASS), the Speaker Coordinator and Co-Founder of IndyTechFest and the author of the book titled "Protecting SQL Server Data". John is a Microsoft MVP for SQL Server.



















<January 2010>
SuMoTuWeThFrSa
272829303112
3456789
10111213141516
17181920212223
24252627282930
31123456
Minesweeper in T-SQL
 Whatever happened to the idea that programming in TSQL can be fun? A Simple-Talk reader contributes an... Read more...

SQL Source Control: The Development Story, Part II
 When creating SQL Source Control, the team had to make decisions as to which source control systems the... Read more...

Raw Materials: Healthy Caution or Something Else?
 Derek slips a cog. Read more...

The DIS-Information Principle, Part II
 Database design simply involves populating a schema with tables that model sets of entities and... Read more...

OCS Disaster Recovery, Part 2
 There are several possible disasters which might happen to your Office Communications Server... Read more...