Click here to monitor SSC

Phil Factor's Phrenetic Phoughts

Simple-Talk columnist
The wilder shores of Transact SQL    Phil on Twitter   Phil on SQL Server Central  Phil on BOS

Stolen Pages, Ad-hoc queries and the sins of dynamic SQL in the application.

Published Monday, August 03, 2009 11:40 AM


/*
One of the great advantages of doing development work on SQL Server machines with very modest hardware specifications is that mistakes are visible. The mistake of doing ad-hoc dynamic queries from an application will  soon reveal itself. You might get a complaint from the developers that a database is running slowly. You'll notice a very high CPU usage, up to 100% but without any blocking during the periods of slow performance.
In extreme cases, You may even receive errors such as

Error: 701, Severity: 17, State: 1
There is insufficient system memory to run this query.
  ..or..
Msg 8645, Level 17, State 1, Procedure , Line 1
A time out occurred while waiting for memory resources to execute the query. Re-run the query.

You may be seeing the effects of the running of too many ad-hoc queries. The high CPU can be due to the query optimiser compiling a large number of Ad-Hoc queries. The memory stress is due to the extra memory required to store these. In other words, some developer is spraying a large number of Ad-hoc queries at the database instead of using either stored procedures or prepared statements. the chances are, they're doing something 'suboptimal' or, as we used to call it, Daft.

A compiled execution plan takes around 70KB depending on its size, and a stored procedure plan takes two to three times as much, depending on its complexity.  The difference is that there will be only one plan for each stored procedure. With ad-hoc queries, you run the risk of having a separate plan for each query. We'll illustrate this in this article, and show you what to look out for.

Query plans have to be stored for re-use in the procedure cache, and SQL Server takes buffers from the LRU Buffer data store to do this. The word 'stolen' is a bit misleading as this is a perfectly legitimate exercise. Stolen pages are buffer cache pages that are 'stolen' to use for other server memory requests. Stolen pages are used for several miscellaneous server purposes such as procedure cache, sorting or for hashing operations (query workspace memory). It is also used as a generic memory store for allocations that are smaller than 8 KB, to store internal data structures such as locks, transaction context, and connection information. This is a simple way of allocating memory to routine tasks but if a server receives a huge rate of ad-hoc queries, it can lead to trouble. Unless SQL Server determines that it can automatically parameterize a query, or it determines that it is the same query, it is forced to generate a new execution plan. You are seeing the consequences of this with the starvation of memory for data buffers.

The first thing to check is the number of query plans being compiled. The SQL Server Performance Monitor will show many SQL Compilations/sec are being done. Ideally, the ratio of SQL Recompilations/sec to Batch Requests/sec should be very low.

DBCC MemoryStatus will indicate a rise in the number of stolen pages, and there are a host of DMVs that can point to the cause of the problems.

Once you are sure as to what is causing the problem, then there are several things you can do. Ideally, the best cure is to use stored procedures. If this solution isn't available for some reason, then develop parameterized queries in the applications so that SQL Server is enabled to reuse an existing plan. It is possible to force SQL Server to use a compiled plan for ad-hoc queries, but as it is so easy to form correct parameterised queries, this is hardly worth considering.

Let's show you an example. First, we'll prepare a 'person' table and populate it with, say, half a million rows.
*/
IF NOT EXISTS (SELECT 1 FROM information_schema.tables
  
WHERE table_name LIKE 'Person')
CREATE TABLE [dbo].[Person](
  
[Person_id] [int] IDENTITY(1,1) NOT NULL,
  
[ContactTitle] [varchar](50) NOT NULL CONSTRAINT [DF_Person_ContactTitle]  DEFAULT (''),
  
[Title] [varchar](20) NOT NULL CONSTRAINT [DF_Person_Title]  DEFAULT (''),
  
[FirrstName] [nvarchar](50) NOT NULL CONSTRAINT [DF_Person_FirstName]  DEFAULT (''),
  
[NickName] [nvarchar](50) NOT NULL CONSTRAINT [DF_Person_NickName]  DEFAULT (''),
  
[LastName] [nvarchar](50) NOT NULL CONSTRAINT [DF_Person_creator]  DEFAULT (USER_NAME()),
  
[DateOfBirth] [datetime] NULL,
  
[insertiondate] [datetime] NULL CONSTRAINT [DF_Person_insertiondate]  DEFAULT (GETDATE()),
  
[terminationdate] [datetime] NULL,
CONSTRAINT [PK_dbo_Person] PRIMARY KEY CLUSTERED
(
  
[Person_id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]


/*Now we are going to simulate a programmer failing to come to grips with how to get information from a table and trying to do it by iterating through the rows using Ad-Hoc queries
*/
SET NOCOUNT ON --as we don't want it
--Lets start with a clean sheet.
CHECKPOINT --Writes all data pages that have been entered into the buffer cache and modified but not yet written to disk, for the current database to disk.
GO
DBCC DROPCLEANBUFFERS --remove all buffers from the buffer pool.
DBCC FREEPROCCACHE --Removes all elements from the plan cache
GO

--We'll create a temporary table to put our results into

--drop the temporary table if it exists and create it
IF EXISTS (SELECT 1 FROM tempdb.information_schema.tables
  
WHERE table_name LIKE '#names%') DROP TABLE #names

CREATE TABLE #names ([name] VARCHAR(80))

--now let's simulate a simple loop through a table looking for a particular name in a "Person" table, assuming that a programmer is making the simple mistake of using procedural code

DECLARE @ii INT --iteration counter
DECLARE @id VARCHAR(10)
SELECT @ii=1
WHILE @ii<2000 --and just look through 2000 rows.
  
BEGIN
   SELECT
@ii=@ii+1, @id=CONVERT(VARCHAR(5),@ii)

  
EXECUTE ('insert into #names (name) Select Lastname from person where person_ID='+@id)
  
END
--Hmm. Lets see what is in the query-plan cache. Just peep at a sample of plans

SELECT TOP 5  TEXT
FROM
sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
WHERE cacheobjtype ='Compiled Plan'

/* look at that in the cache! Each execution has created a cached plan
insert into #names (name) Select Lastname from person where person_ID=2000
insert into #names (name) Select Lastname from person where person_ID=1999
insert into #names (name) Select Lastname from person where person_ID=1998
insert into #names (name) Select Lastname from person where person_ID=1997
insert into #names (name) Select Lastname from person where person_ID=1996

How much memory did that lot take up, I wonder?*/

SELECT SUM(size_in_bytes)/1024/1024 AS "Megs"
      
(SUM(size_in_bytes)/1024)/COUNT(*) AS [Average size(k)],
      
AVG(refcounts) AS "average ref."
      
AVG(Usecounts) AS "average use"
       ObjType  
  
FROM sys.dm_exec_cached_plans cp
    
WHERE cacheobjtype ='Compiled Plan'
    
GROUP BY Objtype

/*
Megs   Average size(k) average ref. average use ObjType
------ --------------- ------------ ----------- --------
78     40              2            1           Adhoc

So that simple loop in procedural code within an application took 78 megs of memory!


Now we've seen the damage, let's see if using a prameterised query is any better. We'll make up a little test harness.
*/
GO
CHECKPOINT
GO
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
GO
SET NOCOUNT ON
-- create a temporary table
IF EXISTS (SELECT 1 FROM tempdb.information_schema.tables
  
WHERE table_name LIKE '#names%') DROP TABLE #names

CREATE TABLE #names ([name] VARCHAR(80))

--create a temporary timer table
DECLARE  @log TABLE
  
(
    
Log_ID INT IDENTITY(1, 1),
    
"Event"VARCHAR(40),
    
"Stolen Pages"INT,
  
"iterations"INT,
    
InsertionDate DATETIME DEFAULT GETDATE()
   )
DECLARE @ii INT, @IterationsToDo INT
DECLARE
@id VARCHAR(8)
DECLARE @StolenPageCounter INT

--DBCC MemoryStatus
--find out the stolen page counter value
SELECT @StolenPageCounter=cntr_value FROM SYS.SYSPERFINFO
  
WHERE COUNTER_NAME='STOLEN PAGES'
  
AND OBJECT_NAME LIKE '%BUFFER MANAGER%'
SET @IterationsToDo=5000

INSERT INTO @log (event, "Stolen Pages"Iterations) SELECT 'Ad-hoc queries', @StolenPageCounter,@IterationsToDo
SELECT @ii=1
WHILE @ii<=@IterationsToDo
  
BEGIN
   SELECT
@ii=@ii+1, @id=CONVERT(VARCHAR(5),@ii)

  
EXECUTE ('insert into #names (name) Select Title + '' ''+Firstname+ '' ''+Lastname from person as philfactor where person_ID='+@id)
  
END

SELECT
@StolenPageCounter=cntr_value FROM SYS.SYSPERFINFO
  
WHERE COUNTER_NAME='STOLEN PAGES'
  
AND OBJECT_NAME LIKE '%BUFFER MANAGER%'

INSERT INTO @log (event, "Stolen Pages" Iterations) SELECT 'Parameterised Queries', @StolenPageCounter,@IterationsToDo

SELECT @ii=1
WHILE @ii<=@IterationsToDo
  
BEGIN
   SELECT
@ii=@ii+1
  
/* the first time that the sp_executesql statement is executed,
   SQL Server generates a parameterized plan for the SELECT statement
   from person with id as the parameter. For all subsequent executions,
   SQL Server reuses the plan with the new parameter value */
  
EXEC sp_executesql N'
insert into #names (name) Select Title + '' ''+Firstname+ '' ''+Lastname
from person as factorphil
where person_ID=@id'
,
              
N'@id int',
              
@id=@ii
  
END
SELECT
@StolenPageCounter=cntr_value FROM SYS.SYSPERFINFO
  
WHERE COUNTER_NAME='STOLEN PAGES'
  
AND OBJECT_NAME LIKE '%BUFFER MANAGER%'
INSERT INTO @log (event,"Stolen Pages" SELECT 'Completed parameterised query', @StolenPageCounter

--first, we'll see what cached plans we have.

SELECT  COUNT(*) AS [No.],
      
CONVERT(NUMERIC(9, 2),
      
SUM(size_in_bytes * 1.00) / 1024 / 1024) AS Mb,
       (
SUM(size_in_bytes) / 1024)/COUNT(*) AS [Average size (K)],
      
AVG(refcounts) AS "average ref."
        
AVG(Usecounts) AS "average use"
       ObjType
FROM     sys.dm_exec_cached_plans cp
GROUP BY Objtype
ORDER BY [Mb] DESC
/*
No.     Mb      Average size (K) average ref. average use ObjType
------ ------- ---------------- ------------ ----------- --------
5003   195.82  40               2            1           Adhoc
8      0.26    33               1            4           View
1      0.04    40               2            5000        Prepared
1      0.01    8                1            5000        Proc

*/

SELECT   event,
      
[Pages Stolen] = (SELECT "Stolen Pages"
                              
FROM @log f
                              
WHERE f . Log_ID = g . Log_ID + 1)-"Stolen Pages"
       [Time (Ms)]
= DATEDIFF(ms, [InsertionDate],
                           (
SELECT [InsertionDate]
                              
FROM @log f
                              
WHERE f . Log_ID = g . Log_ID + 1
                          
)
                       )
FROM     @log g
WHERE    Log_ID < ( SELECT MAX (Log_ID) FROM @log )
/*
so we see that it is much faster to use parameterized queries and the number of stolen pages is far less
event                      Pages Stolen Time (Ms)
-------------------------- ------------ -----------
Ad-hoc queries             21193        3906
Parameterised Queries      4492         970


So now, lets pick out the two queries from the Procedure Cache, recognising them by the two strings I embedded into them
*/

SELECT  'parameterised query' AS [query type],
      
SUM(size_in_bytes)/1024 AS [memory consumed (K)],
      
COUNT(*) AS [number of plans]
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
WHERE TEXT LIKE '%factorphil%'
UNION ALL
SELECT  'Ad-Hoc query' AS [query type],
      
SUM(size_in_bytes)/1024 AS [memory consumed (K)],
      
COUNT(*) AS [number of plans]
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
WHERE TEXT LIKE '%philfactor%'
/*

query type          memory consumed (K) number of plans
------------------- ------------------- ---------------
parameterised query 3072                2
Ad-Hoc query        205728              5001

*/

GO

SELECT LEFT([type], 20) AS [type], SUM(single_pages_kb) AS [Total_kb]
FROM sys.dm_os_memory_clerks
WHERE TYPE IN ('CACHESTORE_SQLCP','CACHESTORE_PHDR','CACHESTORE_OBJCP')
GROUP BY TYPE
ORDER BY
SUM(single_pages_kb) DESC

/*
we can see the huge figure again from the DMV sys.dm_os_memory_clerks
type                 Total_kb
-------------------- --------------------
CACHESTORE_SQLCP     202928
CACHESTORE_PHDR      312
CACHESTORE_OBJCP     8


but probably the best way of viewing the damage is from this DMV sys.dm_os_memory_cache_counters
*/
SELECT  
  
LEFT([name], 20) AS [name],
  
LEFT([type], 20) AS [type],
  
SUM([single_pages_kb] + [multi_pages_kb]) AS cache_kb,
  
SUM([entries_count]) AS No_Entries
FROM sys.dm_os_memory_cache_counters
WHERE TYPE IN ('CACHESTORE_SQLCP','CACHESTORE_PHDR','CACHESTORE_OBJCP')
GROUP BY [type], [name]
ORDER BY cache_kb DESC
/*
name                 type                 cache_kb             No_Entries
------------- -------------------- -------- -------
SQL Plans     CACHESTORE_SQLCP     202768   5005
Bound Trees   CACHESTORE_PHDR      312      10
Object Plans  CACHESTORE_OBJCP     8        0

Note that:
CACHESTORE_SQLCP are our ad-hoc cached SQL statements or batches that aren't in stored procedures, functions or triggers.  Thes consist of dynamic ad-hoc SQL  sent to the server by an application.
CACHESTORE_PHDR are algebrizer trees for views, constraints and defaults.  An algebrizer tree is the parsed SQL text that resolves the table and column names.
CACHESTORE_OBJCP  are compiled plans for stored procedures, functions and triggers.


So there you have it. This is a problem that can be very insidious, but once you know what to watch out for, it is very easy to put right. You just make sure that the developers do not try using too many ad hoc queries.
*/

Comments

 

Rodney said:

Always great stuff Phil.
August 6, 2009 1:11 PM
 

ShawnNWF said:

This is an absolute must read for you DBAs and developers.  Keep it up Phil.
August 6, 2009 1:15 PM
 

callcopse said:

Top quality - cheers. Whilst I did understand the principle this article shows exactly why ad hoccery can mess up your database. This has also given me a few great ideas for test statements to run on an underperforming database I am maintaining, which I sadly did not write.
August 10, 2009 3:37 AM
You need to sign in to comment on this blog
<August 2009>
SuMoTuWeThFrSa
2627282930311
2345678
9101112131415
16171819202122
23242526272829
303112345
Automated Script-generation with Powershell and SMO
 In the first of a series of articles on automating the process of building, modifying and copying SQL... Read more...

Converting String Data to XML and XML to String Data
 We all appreciate that, in general, XML documents or fragments are held in strings as text markup. In... Read more...

Geek of the Week: Don Syme
 With the arrival of F# 3.0 Microsoft announced a wide range of improvements such as type providers that... Read more...

How to Document and Configure SQL Server Instance Settings
 Occasionally, when you install identical databases on two different SQL Server instances, they will... Read more...

What's the Point of Using VARCHAR(n) Anymore?
 The arrival of the (MAX) data types in SQL Server 2005 were one of the most popular feature for the... Read more...