Click here to monitor SSC
  • Av rating:
  • Total votes: 16
  • Total comments: 10
Fabiano Amorim

SQL Server Prefetch and Query Performance

21 May 2012

Prefetching can make a surprising difference to SQL Server query execution times where there is a high incidence of waiting for disk i/o operations, but the benefits come at a cost. Mostly, the Query Optimizer gets it right, but occasionally there are queries that would benefit from tuning.

Introduction

Prefetching is often ignored by developers and DBAs when they are analyzing performance problems. After you’ve read this article I hope you will understand how prefetching works and why, and under what circumstances, this is so important for speeding-up your queries. I hope you’ll also be interested in the techniques and tools I’ve used to troubleshoot the problem, as they are useful for tracking a lot of other performance problems.

In short, prefetching is used to execute I/O operations in parallel, in a way that is somewhat similar to a read-ahead mechanism. It is used in the nested loops execution plan when there are more than a threshold number of rows in the outer input table.   You can see whether prefetch is being used  by viewing the property WithOrderedPrefetch or WithUnorderedPrefetch in the nested loops operator. Prefetching can be ordered or not depending on your query. In this article I’ll show the behavior of unordered prefetching. If you’re interested, you can use the links in the end of this article to read more about ordered prefetching.

Setting the environment

To demonstrate prefetching, I’ll start by creating a 500MB database on a USB flash drive:

USE Master

GO

CREATE DATABASE TestPrefetching ON PRIMARY

( NAME = N'TestPrefetching', FILENAME = N'H:\TestPrefetching.mdf' , SIZE = 512000KB , FILEGROWTH = 1024KB )

 LOG ON

( NAME = N'TestPrefetching_log', FILENAME = N'H:\TestPrefetching_log.ldf' , SIZE = 1024KB , FILEGROWTH = 10%)

GO

ALTER DATABASE TestPrefetching SET RECOVERY SIMPLE

GO

With the database created, let’s create two tables in it, one called TestTab1 and another called TestTab2. After I create the tables, I’ll update the column ID_Tab1 with some random values. It is a very simple database in which the TestTab2 has a column related to the TestTab1.

After I create and populate the tables I’ll also update the statistics with fullscan to guarantee that the statistics are updated with the best precision.

The following script took almost 6 mins to run in my notebook.

USE TestPrefetching

GO

IF OBJECT_ID('TestTab1') IS NOT NULL

  DROP TABLE TestTab1

GO

CREATE TABLE TestTab1 (ID Int IDENTITY(1,1) PRIMARY KEY,

                       Col1 Char(5000),

                       Col2 Char(1250),

                       Col3 Char(1250),

                       Col4 Numeric(18,2))

GO

-- 6 mins to run

INSERT INTO TestTab1 (Col1, Col2, Col3, Col4)

SELECT TOP 1000 NEWID(), NEWID(), NEWID(), ABS(CHECKSUM(NEWID())) / 10000000.

  FROM sysobjects a

 CROSS JOIN sysobjects b

 CROSS JOIN sysobjects c

 CROSS JOIN sysobjects d

GO 30

CREATE INDEX ix_Col4 ON TestTab1(Col4)

GO

IF OBJECT_ID('TestTab2') IS NOT NULL

  DROP TABLE TestTab2

GO

CREATE TABLE TestTab2 (ID Int IDENTITY(1,1) PRIMARY KEY,

                       ID_Tab1 Int,

                       Col1 Char(5000),

                       Col2 Char(1250),

                       Col3 Char(1250))

GO

INSERT INTO TestTab2 (ID_Tab1, Col1, Col2, Col3)

SELECT TOP 1000 0, NEWID(), NEWID(), NEWID()

  FROM sysobjects a

 CROSS JOIN sysobjects b

 CROSS JOIN sysobjects c

 CROSS JOIN sysobjects d

GO 10

CREATE INDEX ix_ID_Tab1 ON TestTab2(ID_Tab1)

GO

 

DECLARE @MenorValor Int = 1, @MaiorValor Int = 5000, @i Int

SET @i = @MenorValor + ABS(CHECKSUM(NEWID())) % (@MaiorValor - @MenorValor)

;WITH CTE_1

AS

(

  SELECT ID, @MenorValor + ABS(CHECKSUM(NEWID())) % (@MaiorValor - @MenorValor) AS Col1

    FROM TestTab1

)

UPDATE TestTab2 SET ID_Tab1 = CTE_1.Col1

  FROM TestTab2

 INNER JOIN CTE_1

    ON CTE_1.ID = TestTab2.ID

GO

UPDATE STATISTICS TestTab1 WITH FULLSCAN

UPDATE STATISTICS TestTab2 WITH FULLSCAN

Let’s now check on the size of the tables we’ve created.

-- Checking table size

sp_spaceused TestTab1

GO

sp_spaceused TestTab2

GO

As we can see, the table TestTab1 has 241 MB of data, and the table TestTab2 has 80 MB of data.

Testing

Now that we’ve created the test scenario, we can execute the join between the two tables.

CHECKPOINT

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

GO

SELECT TestTab1.Col4, TestTab2.Col1

  FROM TestTab1 WITH(index=ix_Col4)

 INNER JOIN TestTab2

    ON TestTab1.ID = TestTab2.ID_Tab1

 WHERE TestTab1.Col4 < 0.8

OPTION (RECOMPILE)

The query above takes less than 1 second to finish and return 42 rows. Here are the results and the actual execution plan:

Query Results

Execution plan with 114 rows being returned from table TestTab1

Prefetching is being executed by the Nested Loops operator

As we can see it is a very straightforward execution plan with two Nested Loops operators. Let’s start investigating it in more detail.

So  how does the loop join work? For each row from the “outer table” (in this case the table TestTab1), search for the match in the “inner table” (TestTab2). For more details about how joins work, look at the links in the final section of this article.

There are 114 rows being returned from the TestTab1. For each row returned from the table TestTab1, SQL Server will execute an Index Seek on the TestTab2. This seek is a random read because the rows are being retrieved in the order of TestTab1.Col4.

To optimize the unordered I/O operations on the table TestTab2, it triggers many asynchronous I/O operations in parallel, rather than executing one I/O operation per row read on table TestTab1. You can see prefetching is being used because the nested loops property WithUnorderedPrefetch is set to true.

To compare the performance of this query without Prefetch, I’ll use the traceflag 8744 to disable the prefetch,  and the command QUERYTRACEON to disable it only in the query scope.

CHECKPOINT

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

GO

 

SELECT TestTab1.Col4, TestTab2.Col1

  FROM TestTab1

 INNER JOIN TestTab2

    ON TestTab1.ID = TestTab2.ID_Tab1

 WHERE TestTab1.Col4 < 0.8

OPTION (RECOMPILE, QUERYTRACEON 8744)

The query above also run under 1 second, the only difference from the first query we ran is that I’m now using the trace flag to disable the prefetch. Even though prefetch was not used (you can confirm it looking at the nested loops operator properties, the property), there was no performance difference.

Without prefetching, SQL Server will trigger the I/O requests at something similar to “row-by-row”. In other words, for every row that is read on TestTab1,  it will request the I/O on the table TestTab2: When the I/O finishes,  it will request another I/O operation for the second row and so on…

It is important to understand here that SQL Server always read pages from memory (buffer cache), it means if a page is not in memory, it will request the page to the disk subsystem, put it in memory and then, read from memory.

When using prefetching, many threads will request the required I/O operations concurrently, so that  the I/O requests are likely to finish earlier, because it os running many requests at the same time. Like read-ahead, it means that when data needed to be joined, the chances of a page to be in cache (because the I/O operation already had finished and the page is in the buffer) is very closely to happen, and it means faster joins.

The key point here is to understand that because the pages are put in memory earlier in the query execution, when the join is processed it already has the pages in memory and don’t need to wait for the pages being read from disk. Again, it is the same principle from read-ahead.

Testing under disk pressure

To make things a little more interesting, what if we simulate the same behavior in a pressure disk subsystem? Let’s do it.

To simulate disk pressure I’ll use the SQLIO tool to read some data on my flash drive. SQLIO is a tool provided by Microsoft which can be used to determine the I/O capacity of a given configuration. It is very easy to setup and can give you valuable information about your disk system.

To understand more about how to use SQLIO read the link in the end of the article.

I’ll run the SQLIO with the following parameters:

sqlio.exe -kR -t16 -dH -s600 -b64

If you had never used SQLIO, here is the translation for the parameters used above:

  • -kR means I’ll simulate Reads.
  • -t16 means I want 16 threads running in parallel.
  • -dH is the specification for the disk I want to use, in this case H:
  • -s600 is the time in seconds I want to run the tests.
  • -b64 is the block size I want to read the data, in this case I’m using 64kb

Now I have SQLIO using 16 threads reading data on my flash drive, so let’s run the query with prefetching again:

CHECKPOINT

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

GO

SELECT TestTab1.Col4, TestTab2.Col1

  FROM TestTab1

 INNER JOIN TestTab2

    ON TestTab1.ID = TestTab2.ID_Tab1

 WHERE TestTab1.Col4 < 0.8

OPTION (RECOMPILE)

We can see from the profiler results that the disk the query is taking just 1.3 seconds to run, even with all that SQLIO stressing,

Now let’s see the query without prefetching:

CHECKPOINT

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

GO

SELECT TestTab1.Col4, TestTab2.Col1

  FROM TestTab1

 INNER JOIN TestTab2

    ON TestTab1.ID = TestTab2.ID_Tab1

 WHERE TestTab1.Col4 < 0.8

OPTION (RECOMPILE, QUERYTRACEON 8744)

Now the same query without prefetch is taking 7.5 seconds to run; it is 7 times slower. Because the I/O operations are not running in advance, it is taking more time to finish.

Another good way to analyze the prefetching performance is looking at the Current Disk Queue Length disk counter in the performance monitor. We can easily see that, when the query is using prefetching, the number of I/O requests in the queue is much higher than the same query without prefetching.

Let’s use the same query, but, this time, we force  more disk reads: In other words, let’s join more rows.

CHECKPOINT

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

SELECT TestTab1.Col4, TestTab2.Col1

  FROM TestTab1 WITH(index=ix_Col4)

 INNER JOIN TestTab2

    ON TestTab1.ID = TestTab2.ID_Tab1

 WHERE TestTab1.Col4 < 50

OPTION (RECOMPILE, LOOP JOIN)

GO

CHECKPOINT

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

SELECT TestTab1.Col4, TestTab2.Col1

  FROM TestTab1 WITH(index=ix_Col4)

 INNER JOIN TestTab2

    ON TestTab1.ID = TestTab2.ID_Tab1

 WHERE TestTab1.Col4 < 50

OPTION (RECOMPILE, LOOP JOIN, QUERYTRACEON 8744)

GO

 

In the following screenshot, you can see the perfmon counter when the queries were running.

 

Looking at the counters we can see that when prefetching is used, SQL Server is triggering many read-ahead page operations, and the disk queue increases to perform the advanced I/O operations.

The first query finishes fast, and the query without prefetching is taking much more time to finish.

The real world problem

All of this can happen in your database when a plan without prefetching is re-used. As we know, SQL Server saves the plan of a query in the plan cache after compiling it. What if the plan saved in the cache is not using the prefetch, and then another execution reuses this plan?

If you know this can happens, in other words, sometimes I’ll join just a few rows, and another times I’ll join many rows, you should use a query hint (OPTIMIZE FOR, or RECOMPILE) to make sure you are getting the right plan.

You can see this in action on the following link: http://blog.sqlworkshops.com/index.php/prefetch/

It is important to mention that even though prefetching looks awesome for a system that is under IO pressure, it has a cost. When prefetching is used, SQL Server acquires locks on the index seek even when run at read uncommitted isolation level and, if necessary due to blocking operators in the plan, holds these locks until the end of the statement rather than releasing them immediately. You can read more about it in the Craig blog post. So be aware that, in highly concurrent applications,  this extra locking can degrade performance.

Conclusion

Read-ahead prefetching is a very nice feature and should be used when needed. In other words, if you have an environment with waits related to I/O operations, a current queue performance counter low and avg disk reads per sec high it can mean that you are not prefetching data properly, the readahead pages/sec performance counter can also be useful here. In real world disk subsystems, problems can happen for many other reasons, but if you are facing a query without prefetching in the plan, and waiting for disk, this article may help you to tune it. Don’t miss the opportunity to read all the links from the references to understand it better.

That’s all folks.

References:

Fabiano Amorim

Author profile:

Fabiano is fascinated by the SQL Server Query Processor and the way it works to optimize queries, procedures and functions. He graduated as a Technical Processor from Colégio Bezerra de Menezes, SP- Brazil, and has worked for several years with SQL Server, focusing in creating Data Warehouses and optimizing T-SQL codes for many companies in Brazil and Argentina. Fabiano is a SQL Server MVP, MCP for SQL Server 2000, MCTS and MCITP Data Base Developer for SQL Server 2005 and 2008. He also is actively involved in SQL Server community though forums such as MSDN and TechNet Brazil, writes articles for Simple-Talk and SQL Server Magazine Brazil, and he also presents online Webcasts and In-Person events for Microsoft Brazil. His blog is on http://blogfabiano.com

Search for other articles by Fabiano Amorim

Rate this article:   Avg rating: from a total of 16 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: SSD
Posted by: Carmen Pentek (www.sqlworkshops.com) (not signed in)
Posted on: Monday, May 21, 2012 at 1:23 PM
Message: Did you check the effect of prefetch on an SSD?


Subject: SSD
Posted by: mcflyamorim (view profile)
Posted on: Monday, May 21, 2012 at 3:05 PM
Message: Hi Carmen,

No I didn't because I don't have the hardware here... Would you do it and tell us :-) ? Do you have it?...

I think the results will be similar, I don't expect too much difference...

Tks for the comment.

Subject: SSD
Posted by: Carmen (view profile)
Posted on: Tuesday, May 22, 2012 at 9:39 AM
Message: Not yet, Fabiano. Will keep you updated!

Subject: I Don't See The Property
Posted by: Chuck Hottle (not signed in)
Posted on: Wednesday, May 30, 2012 at 10:35 AM
Message: Fabiano,

I opened up an execution plan that I had saved from last week and I don't a property regarding prefetch for any of the nested loops operations. I'm viewing the plan in SSMS 2008. I don't see a way to look at this in Plan Explorer. Is this something that is there sometimes and not others? Thanks.

Subject: .
Posted by: mcflyamorim (view profile)
Posted on: Wednesday, May 30, 2012 at 1:02 PM
Message: Hi Chuck, when you can't see the property it's because it is not using the prefetching... Try to write a query that will return many rows from the join and try again...

Best regards
Fabiano

Subject: I Don't See The Property
Posted by: Chuck Hottle (not signed in)
Posted on: Wednesday, May 30, 2012 at 2:44 PM
Message: Fabiano,

I opened up an execution plan that I had saved from last week and I don't a property regarding prefetch for any of the nested loops operations. I'm viewing the plan in SSMS 2008. I don't see a way to look at this in Plan Explorer. Is this something that is there sometimes and not others? Thanks.

Subject: SQL Server Prefetch and Query Performance
Posted by: JimAZ (not signed in)
Posted on: Wednesday, May 30, 2012 at 6:58 PM
Message: I liked the article. The only comment I would like to make is about application workload. Typically, heavy duty joins and scans are more characteristic of a reporting environment, whereas a transactional environment is more direct lookup and direct update. If this type of separation can be enforced in your shop, then Prefetch would not be helpful on the transactional environment. On the other hand, Prefech can be a life saver on a reporting environment. The likelihood of disk pressure should not be great. But as you said, "real world" is not in the lab. Thanks again.

Subject: I Don't See The Property
Posted by: Chuck Hottle (not signed in)
Posted on: Thursday, May 31, 2012 at 6:03 AM
Message: Fabiano,

I opened up an execution plan that I had saved from last week and I don't a property regarding prefetch for any of the nested loops operations. I'm viewing the plan in SSMS 2008. I don't see a way to look at this in Plan Explorer. Is this something that is there sometimes and not others? Thanks.

Subject: Duplicate Posts
Posted by: Chuck Hottle (not signed in)
Posted on: Thursday, May 31, 2012 at 6:05 AM
Message: Thanks for the reply. I think my question from yesterday was duplicated because I hit refresh on this page twice. Sorry.

Subject: Clustering of the index key w.r.t. non clustered index key
Posted by: Gullimeel (view profile)
Posted on: Wednesday, June 27, 2012 at 8:25 AM
Message: There are cases where the non clustered index key and cluistered index key has a correlation and in those cases the prefetching is used but the perfromance could be improved further if optimizer is smart enough to recognize that the data it needed is at very less number of pages..See below query to understand it further..

drop table mysoh
go
select * into mysoh from Sales.SalesOrderHeader
go
alter table mysoh add constraint pk_mysoh primary key (SalesOrderId)
go
create nonclustered index idx_orderdate on mysoh(Orderdate)
go
update Statistics mysoh with fullscan
go
set statistics io,time on
go
dbcc dropcleanbuffers with no_infomsgs
go
begin tran
select * from mysoh with (holdlock) where OrderDate between '20010701' and '20010702'
select * from sys.dm_tran_locks where request_session_id= @@SPID
and resource_type = 'PAGE'

--You will see that whole data is read from two pages as the data was quite clustered w.r.t. orderdate..
rollback tran
go
set statistics io,time off
go
/*
1:40448
1:38817
1:38816
*/
dbcc traceon(3604)
go
dbcc page(9,1,38817,3) --check the data on the page..
--Now optimizer should be smart enough to recognize this and thus uses only two IO's for the bookmark lookup instead of 154 IO's it does(without prefetching)

--To take advantage of this I could write my query like below..
set statistics io,time on
go
declare @minid int,@maxid int
select @minid = MIN(salesorderid),@maxid= MAX(salesorderid)
from mysoh
where OrderDate between '20010701' and '20010702'

select * from mysoh with (index = 1)
where SalesOrderID between @minid and @maxid
and OrderDate between '20010701' and '20010702'
go

--just 7 IO's...

/*
However, you should use this method when other performance tuning techniques are not giving you
performance gain and the data is clustered properly.
*/

For more on the advantages and disadvantages of this method.Please follow the below link..

http://gullimeelsqlsybase.wordpress.com/2012/06/24/performance-enhancement-for-index-seek-key-lookup-in-sql-server-1/

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

SQL Server XML Questions You Were Too Shy To Ask
 Sometimes, XML seems a bewildering convention that offers solutions to problems that the average... Read more...

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.