21 May 2012

SQL Server Prefetch and Query Performance

Prefetching can make a surprising difference to SQL Server query execution times where there is a high incidence of waiting for disk i/o operations, but the benefits come at a cost. Mostly, the Query Optimizer gets it right, but occasionally there are queries that would benefit from tuning.

Introduction

Prefetching is often ignored by developers and DBAs when they are analyzing performance problems. After you’ve read this article I hope you will understand how prefetching works and why, and under what circumstances, this is so important for speeding-up your queries. I hope you’ll also be interested in the techniques and tools I’ve used to troubleshoot the problem, as they are useful for tracking a lot of other performance problems.

In short, prefetching is used to execute I/O operations in parallel, in a way that is somewhat similar to a read-ahead mechanism. It is used in the nested loops execution plan when there are more than a threshold number of rows in the outer input table.   You can see whether prefetch is being used  by viewing the property WithOrderedPrefetch or WithUnorderedPrefetch in the nested loops operator. Prefetching can be ordered or not depending on your query. In this article I’ll show the behavior of unordered prefetching. If you’re interested, you can use the links in the end of this article to read more about ordered prefetching.

Setting the environment

To demonstrate prefetching, I’ll start by creating a 500MB database on a USB flash drive:

With the database created, let’s create two tables in it, one called TestTab1 and another called TestTab2. After I create the tables, I’ll update the column ID_Tab1 with some random values. It is a very simple database in which the TestTab2 has a column related to the TestTab1.

After I create and populate the tables I’ll also update the statistics with fullscan to guarantee that the statistics are updated with the best precision.

The following script took almost 6 mins to run in my notebook.

Let’s now check on the size of the tables we’ve created.

1495-wpzncw7h.jpg

As we can see, the table TestTab1 has 241 MB of data, and the table TestTab2 has 80 MB of data.

Testing

Now that we’ve created the test scenario, we can execute the join between the two tables.

The query above takes less than 1 second to finish and return 42 rows. Here are the results and the actual execution plan:

1495-img4A.jpg

Query Results

1495-img4B.jpg

Execution plan with 114 rows being returned from table TestTab1

1495-img4C.jpg

Prefetching is being executed by the Nested Loops operator

As we can see it is a very straightforward execution plan with two Nested Loops operators. Let’s start investigating it in more detail.

So  how does the loop join work? For each row from the “outer table” (in this case the table TestTab1), search for the match in the “inner table” (TestTab2). For more details about how joins work, look at the links in the final section of this article.

There are 114 rows being returned from the TestTab1. For each row returned from the table TestTab1, SQL Server will execute an Index Seek on the TestTab2. This seek is a random read because the rows are being retrieved in the order of TestTab1.Col4.

To optimize the unordered I/O operations on the table TestTab2, it triggers many asynchronous I/O operations in parallel, rather than executing one I/O operation per row read on table TestTab1. You can see prefetching is being used because the nested loops property WithUnorderedPrefetch is set to true.

To compare the performance of this query without Prefetch, I’ll use the traceflag 8744 to disable the prefetch,  and the command QUERYTRACEON to disable it only in the query scope.

The query above also run under 1 second, the only difference from the first query we ran is that I’m now using the trace flag to disable the prefetch. Even though prefetch was not used (you can confirm it looking at the nested loops operator properties, the property), there was no performance difference.

Without prefetching, SQL Server will trigger the I/O requests at something similar to “row-by-row”. In other words, for every row that is read on TestTab1,  it will request the I/O on the table TestTab2: When the I/O finishes,  it will request another I/O operation for the second row and so on…

It is important to understand here that SQL Server always read pages from memory (buffer cache), it means if a page is not in memory, it will request the page to the disk subsystem, put it in memory and then, read from memory.

When using prefetching, many threads will request the required I/O operations concurrently, so that  the I/O requests are likely to finish earlier, because it os running many requests at the same time. Like read-ahead, it means that when data needed to be joined, the chances of a page to be in cache (because the I/O operation already had finished and the page is in the buffer) is very closely to happen, and it means faster joins.

The key point here is to understand that because the pages are put in memory earlier in the query execution, when the join is processed it already has the pages in memory and don’t need to wait for the pages being read from disk. Again, it is the same principle from read-ahead.

Testing under disk pressure

To make things a little more interesting, what if we simulate the same behavior in a pressure disk subsystem? Let’s do it.

To simulate disk pressure I’ll use the SQLIO tool to read some data on my flash drive. SQLIO is a tool provided by Microsoft which can be used to determine the I/O capacity of a given configuration. It is very easy to setup and can give you valuable information about your disk system.

To understand more about how to use SQLIO read the link in the end of the article.

I’ll run the SQLIO with the following parameters:

1495-img4F.jpg

If you had never used SQLIO, here is the translation for the parameters used above:

  • kR means I’ll simulate Reads.
  • -t16 means I want 16 threads running in parallel.
  • -dH is the specification for the disk I want to use, in this case H:
  • -s600 is the time in seconds I want to run the tests.
  • -b64 is the block size I want to read the data, in this case I’m using 64kb

Now I have SQLIO using 16 threads reading data on my flash drive, so let’s run the query with prefetching again:

1495-img51.jpg

We can see from the profiler results that the disk the query is taking just 1.3 seconds to run, even with all that SQLIO stressing,

Now let’s see the query without prefetching:

1495-img52.jpg

Now the same query without prefetch is taking 7.5 seconds to run; it is 7 times slower. Because the I/O operations are not running in advance, it is taking more time to finish.

Another good way to analyze the prefetching performance is looking at the Current Disk Queue Length disk counter in the performance monitor. We can easily see that, when the query is using prefetching, the number of I/O requests in the queue is much higher than the same query without prefetching.

Let’s use the same query, but, this time, we force  more disk reads: In other words, let’s join more rows.

In the following screenshot, you can see the perfmon counter when the queries were running.

  1495-clip_image010.jpg

Looking at the counters we can see that when prefetching is used, SQL Server is triggering many read-ahead page operations, and the disk queue increases to perform the advanced I/O operations.

The first query finishes fast, and the query without prefetching is taking much more time to finish.

The real world problem

All of this can happen in your database when a plan without prefetching is re-used. As we know, SQL Server saves the plan of a query in the plan cache after compiling it. What if the plan saved in the cache is not using the prefetch, and then another execution reuses this plan?

If you know this can happens, in other words, sometimes I’ll join just a few rows, and another times I’ll join many rows, you should use a query hint (OPTIMIZE FOR, or RECOMPILE) to make sure you are getting the right plan.

You can see this in action on the following link: http://blog.sqlworkshops.com/index.php/prefetch/

It is important to mention that even though prefetching looks awesome for a system that is under IO pressure, it has a cost. When prefetching is used, SQL Server acquires locks on the index seek even when run at read uncommitted isolation level and, if necessary due to blocking operators in the plan, holds these locks until the end of the statement rather than releasing them immediately. You can read more about it in the Craig blog post. So be aware that, in highly concurrent applications,  this extra locking can degrade performance.

Conclusion

Read-ahead prefetching is a very nice feature and should be used when needed. In other words, if you have an environment with waits related to I/O operations, a current queue performance counter low and avg disk reads per sec high it can mean that you are not prefetching data properly, the readahead pages/sec performance counter can also be useful here. In real world disk subsystems, problems can happen for many other reasons, but if you are facing a query without prefetching in the plan, and waiting for disk, this article may help you to tune it. Don’t miss the opportunity to read all the links from the references to understand it better.

That’s all folks.

References:

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

This post has been viewed 26688 times – thanks for reading.

Tags: , ,

  • Rate
    [Total: 20    Average: 4.3/5]
  • Share

Fabiano is fascinated by the SQL Server Query Processor and the way it works to optimize queries, procedures and functions. He graduated as a Technical Processor from Colégio Bezerra de Menezes, SP- Brazil, and has worked for several years with SQL Server, focusing in creating Data Warehouses and optimizing T-SQL codes for many companies in Brazil and Argentina. Fabiano is a SQL Server MVP, MCP for SQL Server 2000, MCTS and MCITP Data Base Developer for SQL Server 2005 and 2008. He also is actively involved in SQL Server community though forums such as MSDN and TechNet Brazil, writes articles for Simple-Talk and SQL Server Magazine Brazil, and he also presents online Webcasts and In-Person events for Microsoft Brazil. His blog is on http://blogfabiano.com

View all articles by Fabiano Amorim

  • Carmen Pentek (www.sqlworkshops.com)

    SSD
    Did you check the effect of prefetch on an SSD?

  • mcflyamorim

    SSD
    Hi Carmen,

    No I didn’t because I don’t have the hardware here… Would you do it and tell us 🙂 ? Do you have it?…

    I think the results will be similar, I don’t expect too much difference…

    Tks for the comment.

  • Carmen

    SSD
    Not yet, Fabiano. Will keep you updated!

  • Chuck Hottle

    I Don’t See The Property
    Fabiano,

    I opened up an execution plan that I had saved from last week and I don’t a property regarding prefetch for any of the nested loops operations. I’m viewing the plan in SSMS 2008. I don’t see a way to look at this in Plan Explorer. Is this something that is there sometimes and not others? Thanks.

  • mcflyamorim

    .
    Hi Chuck, when you can’t see the property it’s because it is not using the prefetching… Try to write a query that will return many rows from the join and try again…

    Best regards
    Fabiano

  • Chuck Hottle

    I Don’t See The Property
    Fabiano,

    I opened up an execution plan that I had saved from last week and I don’t a property regarding prefetch for any of the nested loops operations. I’m viewing the plan in SSMS 2008. I don’t see a way to look at this in Plan Explorer. Is this something that is there sometimes and not others? Thanks.

  • JimAZ

    SQL Server Prefetch and Query Performance
    I liked the article. The only comment I would like to make is about application workload. Typically, heavy duty joins and scans are more characteristic of a reporting environment, whereas a transactional environment is more direct lookup and direct update. If this type of separation can be enforced in your shop, then Prefetch would not be helpful on the transactional environment. On the other hand, Prefech can be a life saver on a reporting environment. The likelihood of disk pressure should not be great. But as you said, “real world” is not in the lab. Thanks again.

  • Chuck Hottle

    I Don’t See The Property
    Fabiano,

    I opened up an execution plan that I had saved from last week and I don’t a property regarding prefetch for any of the nested loops operations. I’m viewing the plan in SSMS 2008. I don’t see a way to look at this in Plan Explorer. Is this something that is there sometimes and not others? Thanks.

  • Chuck Hottle

    Duplicate Posts
    Thanks for the reply. I think my question from yesterday was duplicated because I hit refresh on this page twice. Sorry.

  • Gullimeel

    Clustering of the index key w.r.t. non clustered index key
    There are cases where the non clustered index key and cluistered index key has a correlation and in those cases the prefetching is used but the perfromance could be improved further if optimizer is smart enough to recognize that the data it needed is at very less number of pages..See below query to understand it further..

    drop table mysoh
    go
    select * into mysoh from Sales.SalesOrderHeader
    go
    alter table mysoh add constraint pk_mysoh primary key (SalesOrderId)
    go
    create nonclustered index idx_orderdate on mysoh(Orderdate)
    go
    update Statistics mysoh with fullscan
    go
    set statistics io,time on
    go
    dbcc dropcleanbuffers with no_infomsgs
    go
    begin tran
    select * from mysoh with (holdlock) where OrderDate between ‘20010701’ and ‘20010702’
    select * from sys.dm_tran_locks where request_session_id= @@SPID
    and resource_type = ‘PAGE’

    –You will see that whole data is read from two pages as the data was quite clustered w.r.t. orderdate..
    rollback tran
    go
    set statistics io,time off
    go
    /*
    1:40448
    1:38817
    1:38816
    */
    dbcc traceon(3604)
    go
    dbcc page(9,1,38817,3) –check the data on the page..
    –Now optimizer should be smart enough to recognize this and thus uses only two IO’s for the bookmark lookup instead of 154 IO’s it does(without prefetching)

    –To take advantage of this I could write my query like below..
    set statistics io,time on
    go
    declare @minid int,@maxid int
    select @minid = MIN(salesorderid),@maxid= MAX(salesorderid)
    from mysoh
    where OrderDate between ‘20010701’ and ‘20010702’

    select * from mysoh with (index = 1)
    where SalesOrderID between @minid and @maxid
    and OrderDate between ‘20010701’ and ‘20010702’
    go

    –just 7 IO’s…

    /*
    However, you should use this method when other performance tuning techniques are not giving you
    performance gain and the data is clustered properly.
    */

    For more on the advantages and disadvantages of this method.Please follow the below link..

    http://gullimeelsqlsybase.wordpress.com/2012/06/24/performance-enhancement-for-index-seek-key-lookup-in-sql-server-1/