Finding the causes of poor performance in SQL Server, Part 2

In the first part of this series of articles, Gail showed how to find the procedures that contributed most to performance problems. In this final part, she shows how to use query statistics to pinpoint the queries that are causing the biggest problems, and then use the query execution plans to find out which operations are the root cause. With this information one can select and modify the indexes that can optimize the plans and therefore performance of the query.

In Part 1 of this 2-part series, I took a fictional  problem, in which I was the DBA responsible for maintaining an internet forum. The forum was performing poorly due to a lack of indexing. I showed how to use server-side Profiler tracing to capture the queries and stored procedures that represented a typical Server workload. By aggregating, at the stored procedure level, the execution statistics from the Profiler trace, I was able to identify the three worst-performing stored procedures.

Here, in Part 2, I’m going to take apart these procedures, and identify the queries within them that are having the biggest impact on performance, and why. Having defined the main problems, I’m going to investigate how to make the queries run faster, by implementing appropriate indexes. Finally, I’ll review and report on the performance improvements observed when the changes are deployed to production.

If you’d like to work through the example, you’ll need to download the WebForums database, which you can find a link to at the bottom of the article.

NOTE:   You may not, for a number of reasons, see the same performance values, or execution plans, as I have reported in the article. Firstly, my tests were performed on SQL 2008, and you may see different plans on other versions. Secondly, data volume may well affect the plans; my original copy of the database has 180 000 records in the Threads table and 1.2 million in the Posts table, whereas the copy provided with this article has 2500 and 75000, respectively.

The Execution Plan and Query Stats

In Part 1, the aggregated execution statistics identified the following three stored procedures, as the main contributors to the performance problems:

  1. ViewThread. This procedure is called anytime a user views a forum thread on the website. It returns the title of the thread, all of the posts as well as the details of who made each post
  2. ViewForum. This procedure is called anytime a user views the list of threads within a forum. It returns a subset of the threads within the forum, based on the PageNo parameter
  3. AddPost. This procedure is called whenever a new post to an existing thread. The procedure adds the row to the Posts table, update’s the user’s post count and updates the Threads table with the last modified date

The statistics from Part 1 suggest that the performance impact of the first two procedures, ViewThread and ViewForum, is far greater than AddPost, both in terms of query duration and IO load.

Therefore, I’ll focus my tuning efforts on ViewThread and ViewForum. I’ll investigate each one in turn, examining the query statistics and execution plans for the queries contained within. In order to do this, I’m going to load up Management studio, connect to a development system and run the procedure in question, with following three options switched on:

  1. Statistics IO – this will show me all the tables that the procedure is reading, how many times each table was scanned and how many reads were done.
  2. Statistics Time – this will show me the exact duration and CPU time of each statement within the procedure
  3. Execution plan – this will show me how the steps that the storage engine took to execute the query, and the estimated cost of each operation that comprises the plan.

NOTE: It is worth noting that, regardless of whether you view the estimated or actual execution plans, the costs are estimates. See Grant Fritchey’s blog post for further details. For more general information on Execution Plans, see Grant Fritchey’s e-book, my blog series on Execution plans, and the book “Inside SQL Server 2005: Query Tuning and Optimisation” by Kalen Delaney, et al.

I’ll start the investigation with the worst offender, ViewThread, which had the longest duration and the highest IO.

The ViewThread Procedure

The ViewThread stored procedure is called whenever a user views a thread on the front-end web site. The procedure takes the ID of the thread that they’ve selected and their UserID. It updates the viewing stats for this thread, updates the user’s status and then retrieves a list of the posts within the thread, along with the users who are currently viewing the thread. The full code of the procedure is shown in Listing 1:

Listing 1: The ViewThread stored procedure

I need to find out which of these queries are causing the biggest problems. In order to do this, I need to execute the procedure in a development environment, using a representative set of sample parameter values, in this case for ThreadID and UserID. These parameter values can be obtained from the Profiler trace. You’ll occasionally find that a stored procedure exhibits different performance characteristics, depending on the parameter set that is supplied. In such cases, you’d need to test the stored procedure with each set of parameters that gives rise to distinct performance characteristics. In this case, however, the procedure appears to perform much the same regardless of the parameters supplied. Specifically, there is little difference in the number of reads for different parameter values, so I can pick any entry from the Profiler trace.

Figure 1 shows how I executed the stored procedure in Management Studio. First, I activate query statistics, either by typing the options, as shown, or from the Query Options dialog which is accessed from the Query menu in Management Studio. I then run the procedure within a transaction, issuing a rollback at the end. This is to ensure that I don’t change any data that could cause later tests to have different performance characteristics, or even to fail altogether.

686-image002.jpg
Figure 1: Executing ViewThread

Having generated the detailed execution characteristics for the procedure, I can map each set of characteristics to the individual queries within the procedure.

NOTE:
With only 4 queries, breaking down which of the IO and time statistics belongs to each query is relatively easy. With larger or more complex procedures, especially ones with control flow statements, it may not be as simple. One trick is to modify the stored procedure (in the dev environment) and add print statements before and after each query within the procedure so that it’s very clear which IO and time stats are associated with which query.

Once I’ve done the breakdowns, this is the situation that emerges:

Query 1 (Update Thread’s view count)

Execution Plan

686-image004.jpg
Figure 2: Execution plan for “Update Thread’s view count” query

Query 2 (Update User’s status)

Execution Plan

686-image006.jpg
Figure 3: Execution plan for “Update User’s status” query

Query 3 (Retrieve thread details)

Execution Plan

686-image008.gif
Figure 4: Execution plan for “Retrieve thread details” query

Query 4 (Retrieve Users viewing this thread)

Execution Plan

686-image010.jpg
Figure 5: Execution plan for “Retrieve Users viewing this thread” query

It’s readily apparent that the main problem with this stored procedure is that third query (Retrieve thread details). It performs over 18K reads of the Posts table, and has the longest duration and highest CPU time. Furthermore, the estimated costs shown in the execution plan indicates that it contributes 100% of the cost of executing the entire procedure.

The immediate issue I have is that this is a “SELECT *” query, so it is going to be very difficult for me to create good indexes. I pay a visit the developers to find out exactly which columns they’re using from this procedure. It transpires that while they are returning all the columns, they only actually need five of them. I modify the procedure to return only those fields and, when I run it again, the execution characteristics are unchanged, but the smaller number of fields means that I’m more likely to be able to create useful indexes. I can’t see any other changes that I could make to the code to make it more efficient, so I move on to take a closer look at the execution plan for this query.

My first goal is to identify and examine the highest-cost step in the plan. In this case, I don’t have to look far. If I read the execution plan from right to left, the first operator that I see is a clustered index scan of the Posts table. In other words, every row of the clustered index (in this case the primary key) is being read in order to identify the rows that the query needs to return. This operation represents 100% of the cost associated with executing the query, and also explains the very high number of reads of the Posts table that we saw in the IO statistics.

So, I now know what the problem is. Later in the article, I’m going to analyse further to see if an index would be appropriate for this query and, if so, which index. However, first, I want to take a look at statistics and execution plans for the next procedure, ViewForum.

The ViewForum Procedure

The ViewForum procedure is called whenever a user views the list of threads within a forum. The procedure takes as parameters the forum ID of the forum to display, the ID of the user who made the request and the page number. The page number is used to paginate the list of threads and return just 20, as there may be a large number of threads in a forum. The first query in the procedure sets the user’s status and the second returns the details of the thread for the requested page of the selected forum. The code for the procedure is shown in Listing 2:

Listing 2: The ViewForum Procedure

Once again, I execute the procedure in a development environment, using a representative set of sample parameter values for ForumID, PageNo and UserID, as shown in Figure 6:

686-image012.jpg
Figure 6: Executing “ViewForum”

Query 1 (UPDATE User’s status)

Execution Plan

686-image014.jpg
Figure 7: Execution plan for “UPDATE User’s status” query

Query 2 (Retrieve list of threads)

Execution plan:

686-image016.jpg

Figure 8: Execution plan for “Retrieve list of threads” query

I can see immediately that the second query in the procedure is cause of the problem, as the optimizer estimates that it contributes 100% of the cost of executing the procedure, and because of its IO impact and duration. The query appears to be well-written and, as such, there are no changes that I need to make to the code before considering how indexes might help optimize the execution plan.

The execution plan for this query is a little more complex than the previous one, so I’m going to go over it in detail. Refer to figure 8 for the annotations:

  1. A clustered index scan on the Users table. This reads all of the rows in the Users table and outputs the columns UserID and UserName. There is no predicate on this scan, so all the rows in the table are returned
  2. A clustered index scan on the Threads table. This reads all of the rows in the Threads table and outputs the columns ThreadID, Title, CreatedOn, LastModified, CreatedBy, LastPoster, TotalReplies, TotalViews and Locked. There is a predicate on this seek, on the ForumID column, meaning that the rows that are read by the scan are tested to see if they match and only the ones that do so are returned. The estimated cost of this operator of 70% of that of the entire query, hence this clustered index scan is going to be the focus of my efforts to optimise this query
  3. Another clustered index scan on the Users table, again returning all the rows in the table and returning the same columns as the scan detailed in step 1 did.
  4. A hash match join joins the resultsets returned from the scans of the Users (step 1) and Threads (step 2) tables. Hash match joins are quite expensive joins, because of the memory they use to create the hash tables. If possible, I’d like to see this replaced by a less expensive join, but to do that I need to reduce the number of rows that are joined. Hash joins are the most efficient joins for large result sets. If I just added a join hint here to force it to a nested loop, I’d very likely make the query performance far, far worse.
  5. Another hash match join. This time it’s joining the result of the second scan on the Users table (step 3) with the result of the hash join from step 4. For the same reasons that I listed above, I’d like to see this replaced by a less costly join, if possible
  6. A clustered index seek on the Forums table. The seek predicate is based on the ForumID column and returns just one row. I’m not going to worry about this. A clustered index seek is quite efficient and, since it only returns 1 row I’m not going to be able to make it any more efficient than it already is.
  7. A sort on the ModifiedDate column. The sort is there so that SQL can evaluate the row_number function that’s listed within the SELECT clause. (ROW_NUMBER() OVER (Order By ModifiedDate). The estimated cost of this operator is 19% of the entire query. That makes it the second-highest cost operator in the entire query. Hence a secondary objective for me will be to see if there’s any way I can remove the need for this sort.
  8. A nested loop join. This joins the result of the clustered index seek on the Forums table (step 6) with the output of the sort (step 7). I’m not concerned about this. The estimated operator cost is 0% and a nested loop is an efficient join when one or both of the inputs have a small number of rows, which is the case here.
  9. The Segment operator is one of the operators that calculates the row_number function.
  10. The Sequence Project is also involved in the calculation of the row_number function.
  11. The TOP operator partially implements the filter on the ThreadNumber (the name given to the calculated column based on the row_number function) that’s specified in the outer query (WHERE ThreadNumber BETWEEN @PageNo*20 AND (@PageNo+1)*20-1). The Sort operation, from step 7, left the resultset ordered by the ModifiedDate column, which is the same column that’s used to derive the row_number that’s used in the BETWEEN predicate. This means that this BETWEEN predicate can be partially implemented using the TOP operator. Specifically it can eliminate rows beyond the upper range of the BETWEEN clause.
  12. The Filter implements the rest of the filter on the derived column ThreadNumber.

So, in summary, my primary goal for this query is to eliminate the clustered index scan on Threads (step 2), and my secondary objectives are to remove the sort (step 7) and to convert the hash joins (steps 4 and 5) to less expensive joins.

Identifying Indexes

At the end of Part 1 of this article series, I had a list of stored procedures that I needed to “fix”. At this stage, I have now drilled down to a much more detailed level, and I now know the actual queries that are causing the issues, along with the specific operations being performed during the execution of these queries that are generating the heaviest workload on the server.

I’m now ready to identify the indexes that could help optimize the query execution plans, and so improve the performance of the queries. To identify potential indexes, I need to re-examine the query execution plans, focussing specifically on the operations within the execution plans that I identified as problematic, namely:

  • The clustered index scan on the Posts table – by implementing an index, we will hopefully see this replaced with more-efficient seek operation.
  • The clustered index scan on the Threads table – again, I would like to see this replaced with a more efficient seek operation.

Indexing the Posts Table

I’m going to start with the “SELECT Threads” query, from the first procedure, ViewThread. This was originally a “SELECT *” query, but I modified it based on the columns the developers said they really needed, as shown in Listing 3:

Listing 3: The modified “Retrieve thread details” query

Figure 9 shows the execution plan for this query, with the ToolTip window displayed for the problematic clustered index scan on the Posts table:

686-image018.jpg
Figure 9: The ToolTips window, displaying properties of the clustered index scan

The predicate that is applied to the rows that the scan returns is on the ThreadID column. Since that’s the only filter that’s listed and the number of rows returned (3) is much lower than the total number of rows in the table (1.2 million), it seems like a very good candidate for a first try at an index.

With the index created, I’m going to run the query again to see the effect that it has had, both on the execution plan and on the IO statistics. Figure 10 shows the new execution plan:

686-image020.jpg

Figure 10: The new execution plan for the “Retrieve thread details” query

The new set of query stats are as follows:

From 184012 logical reads on the Posts table, and a CPU time of 2563 ms, down to 15 logical reads and 0ms. Not bad.

Looking at the execution plan, I can see that in place of the original clustered index scan, I now have an index seek operation on my newly-created index. The seek operation is coupled with a bookmark lookup to retrieve the data for columns that need to be returned (see the Output List in Figure 9) but are not covered by the index. I could probably improve performance still further by widening the index to make it a covering index. In this case, I’m not going to do that right now, for three reasons:

  1. 27 ms duration and 15 logical reads is well within the realms of acceptable performance for this query.
  2. I know that, in most cases, this query will return less than 20 rows, making the index very selective, and so the cost of the bookmark lookups negligible
  3. One of the columns that the query uses, specifically PostContent, is a VARCHAR(MAX) column. While a VARCHAR(MAX) column can be used as one of the INCLUDE columns of an index, doing so will cause the entire content of the VARCHAR(MAX) column to be stored twice; once in the table and once in the index. Since this is the largest table in the database, that’s going to increase the size of my database significantly.

I’m going to make a note that, should performance problems arise again with this query, the widening of the index can be considered. For now though, I’m going to call this query done and move onto the next one.

Indexing the Threads table

Listing 4 shows the problematic query from the ViewForum stored procedure, which causes the expensive clustered index scan on the Threads table:

Listing 4: The “Retrieve list of threads” query

Again, I’m going to start by taking a look at the properties of this clustered index, shown in Figure 11, in order to identify candidate columns for an index:

686-image022.jpg
Figure 11: Properties of the clustered index scan operation on the Threads table

As a first step, I’m going to try the same trick as before, and create a non-clustered index on the ForumID column, as shown in Listing 5, since that is the column, listed in the predicate, being applied to the rows that the scan returns:

Listing 5: A simple, non-clustered index on the Threads table

When I re-run the query, however, I see that the execution plan is unchanged, and the index has not been used, as shown in Figure 12:

686-image024.jpg
Figure 12: The index on ForumID does not affect the execution plan.

Once again, our index is not a covering index, as if does not contain all the columns used in the query. However, whereas before the query was highly selective and the index got used regardless, in this case the predicate matches 19113 rows out of a total of almost 190000, and so the index is not selective enough to be used. I could simply make the index covering by adding, as INCLUDE columns, all of the columns listed in the Output List of the clustered index scan. That would at least ensure that the index is used, but it may not be the best solution. Given that this query uses all but one of the columns in the Threads table, if I include them all, I’m going to be making the index very large, almost the same size as the table itself. It is an option, if I can’t find any other way to make the query fast. However, I’m going to try a few other tricks first to see if adding any of the columns to the index key will reduce the number of rows processed by the index seek .

The execution plan shows that there’s a join to the Users table on the CreatedBy column, and a join to Users again on the LastPoster column, as shown in figure 13:

686-image026.jpg
Figure 13: The two joins to the Users table.

Maybe if I add the CreatedBy and LastPoster columns to the index key, it will persuade the optimizer to use it.

Listing 6: Adding the CreatedBy and LastPoster columns to the non-clustered index on the Threads table

Unfortunately, when I rerun the query, I see that, once again, the execution plan is unchanged. I’ve still got a clustered index scan on the Threads table. Ok, so that didn’t work; one more attempt before I add all the columns as INCLUDE columns.

There’s another predicate on this query, namely the BETWEEN predicate on the ThreadNumber column (see Listing 4). I can’t add that column to the index, because it’s a computed column and, in fact, it’s a row number ordered by the LastModified column. However, I might be able to get SQL to eliminate some rows earlier than it is doing currently if the LastModified column is part of the index key.

I have a second reason for considering the LastModified column for the index. There’s a sort in the middle of the execution plan that sorts by LastModified (as part of the evaluation of the Row_number), as shown in Figure 14:

686-image028.jpg
Figure 14: Properties of the Sort operation.

If I add LastModified to the non-clustered index on ForumID and SQL uses that index then, after the filter on ForumID is done, the rows will already be sorted by LastModified and an explicit sort won’t be required.

Let’s see if theory meets reality in this case. Listing 7 recreates the index using ForumID and LastModified as key columns:

Listing 7: Recreating the index using ForumID and LastModified as key columns

When I rerun the query, I see that the execution plan has changed radically. The clustered index scan is gone, replaced by a non-clustered index seek and a rather expensive-looking key lookup, the sort is gone as it’s no longer necessary and, in addition, the hash joins have been replaced by nested loop joins, as shown in Figure 15:

686-image030.jpg

Figure 15: The execution plan now shows an index seek and a key lookup.

The change of the join type is due to the reduced number of rows being processed. Hash joins are efficient for joining very large numbers of rows, which is why they appeared in the previous execution plan. With an efficient index the number of rows returned is lower and hence the number of rows that the joins have to process is much reduced and so the nested loop join can be used instead. The nested loop is only efficient on small numbers of rows and hence the Query Optimiser will never select it when the number of rows being joined is very high.

The new query stats are as follows:

That’s better. I’ve reduced the logical reads on the Threads table from 3263 to 564, although I have also increased the logical reads on the Users table from 12 to 720. However, I have also reduced the CPU and elapsed times to virtually zero.

Since the logical reads are still rather high, and this is the most-frequently run query in the system, in this case I am going to add the rest of the columns used by the query as INCLUDE columns to this index, as shown in Listing 8:

Listing 8: Adding the INCLUDE  columns to the index

Figure 16 shows the final execution plan for this query:

686-image032.jpg

Figure 16: The final execution plan.

And here is the final set of query stats:

That’s much better. I’ve reduced the IOs on the Threads table to 8. The IOs on the Users table are still at 720 but I don’t think that will be a major problem. There’s not much I can do about it, because the two operators that read from the Users table are both clustered index seeks. The high number of reads is because, in both cases, Users is the inner table of a nested loop join and hence the seeks on that table must be executed once for each row returned from the Threads table. I’m going to leave this as it is for now, if there’s a problem in future I’ll probably have to rewrite the query as there’s no more I can do on the indexing side to reduce the reads further.

I’ve optimised everything I set out to optimise and the queries are looking pretty good when run by themselves in dev, so now I need to take them through testing and get them deployed to the production environment.

Post-implementation Performance Review

Once I’ve implemented those index and procedure changes in production, I need to capture a new Profiler trace, to see what kind of effect, if any, they’ve had on the overall server performance.

In order to this, I’m going to rerun the trace that I initially used to find the server’s workload (see Part 1), import it into a database table and re-analyse the data. To ensure that the comparison is meaningful, I’m going to run the trace at the same time I did before, and with the same trace duration.

The new aggregated execution statistics are shown in Figure 17:

686-image034.jpg

Figure 17: Aggregating execution statistics by stored procedure

Not only has the time impact and IO impact dropped significantly compared to the initial measurements, but also the execution count has more than tripled, indicating that the application has a far greater throughput after the optimisation than it did before. As a bonus, the procedure AddPost, which I decided not to look at, is running significantly faster as well. This is probably due to a decrease in blocking. Since the queries that I optimised read from the Posts table and are now much more efficient, the locks they take will be held for a shorter time. That means that the procedure AddPosts that writes to that table will have to wait less time to obtain a lock and hence will run faster.

The following table summarizes the effect of the indexing neatly:

 

Procedure

Time Impact

IO Impact

CPU Impact

Execution Count

Avg. Duration (ms)

Avg. IOs

Avg. CPU

Before

ViewThread

1815846

442711386

1064111

2377

763

186247

447

ViewForum

1536715

5193415

223289

2433

631

2134

91

AddPost

229295

206833

876

244

939

847

3

After

ViewThread

116150

7897615

15376

9765

11

808

1

ViewForum

76486

716541

6650

9718

7

73

0

AddPost

21708

383221

1527

448

48

855

3

 Now that looks like a nice spreadsheet to hand to management come review time!

Summary

Over the course of this 2-part article, I’ve demonstrated how to tackle SQL Server performance problems using the following steps:

  1. Find the queries that constitute a typical workload, using SQL Profiler
  2. Aggregate at the stored procedure level the execution characteristics, provided by the trace, in order to find the stored procedures that are having the most impact on the system.
  3. Use query statistics to pinpoint the queries that are causing the biggest problems, and the query execution plans to find out which operations are the root cause.
  4. Identify the indexes that can optimize the plans and therefore the overall query execution.

It’s probably apparent at this point that finding good indexes for a query is partially art and partially science, and a lot of trial and error. Identifying good indexes requires knowledge of how indexes work and how SQL uses them, knowledge of the application and the data, and a fair amount of patience. For further information on how SQL uses indexes, see this short blog series, anything index-related on Kimberly Tripp’s blog and the recently released book “SQL Server 2008 Query Performance Tuning Distilled” which devotes an entire chapter to index selection.

The final point to note is that, in these examples, the queries were all quite well written and, as such, did not need extensive modifications. More often, however, performance tuning is a mixture of fixing the code and tuning the indexes. Fixing the code could entail anything from converting cursors into set-based code, to tweaking the WHERE clause so that it’s SARGable, to removing redundant queries. But that’s a topic for another article!

For more articles like this, sign up to the fortnightly Simple-Talk newsletter.

Downloads

Tags: , , ,

  • 96002 views

  • Rate
    [Total: 176    Average: 4.6/5]
  • sagreene

    Wonderful
    Thanks for the well written peek inside your optimazing mind!

  • Anonymous

    Sam
    ExCellent gail m really a big fan of ur suggestions…. thankz gail

  • Anonymous

    just a side question
    I dont get it… Where are the ads on this site?

  • Andrew Clarke

    re: just a side question
    There aren’t any ads. We think that ads are distracting. The funding for Simple-Talk comes from Red Gate software.

  • Crispin Proctor

    You really should qualify your object names…
    It is a surprising overhead when names are not owner qualified. SQL 2000 was really bad, some improvements in 2005 onwards.

    See: http://sqlblog.com/blogs/linchi_shea/archive/2007/07/05/performance-impact-of-procedure-calls-without-owner-qualification-sql-server-2000.aspx for more details.

    Crispin

  • Jack the DBA

    Awesome practical tips
    Love the practical tips Gail. Keep the articles coming.

  • Jeff Moden

    Incredible
    These 2 articles teach in just a couple of hours what others take 2 or more days to teach. Absolutely outstanding. Thanks for taking the time, Gail.

  • Jerry Hung

    well written
    Well written, Gail, I like this approach of explaining the steps for each Execution Plan

    We all pretty much do the same, except I usually jump to the largest % icon without looking at the whole plan

    My question is
    Did you run the stored procedures in SSMS 2008, and did SSMS recommend ANY “missing index”?
    If so, any index same as your indexes?

  • Harinath

    Useful Information
    Hi Gail Shaw,
    Really most usefull information i read, keep it up.

    Like the paractical tips of Gail.Do post.

  • GilaMonster

    Missing indexes
    Jerry, I ran all of these in 2008’s management studio. I just edited out the missing index line when I put the screen shots together because I didn’t want it to confuse the issue.

    There were missing indexes on both of the initial queries. The index recommended in the first case had the same predicate as the one I suggested, but was covering, complete with the VARCHAR(MAX) column. For the second query SQL recommended a covering index with just the forumID in the key.

    The missing index info’s there in SQL 2005 as well. It’s buried in the XML of the showplan.

    The danger in jumping straight to the operator with the largest % is that the percentages are estimates and can be completely and totally wrong. I didn’t really touch on that here, I think I need to write something on spotting bad estimates….

  • Bruce W Cassidy

    Well done
    Hi Gail,

    An excellent followup to the original answer. I particularly like the fact that although you are using a synthetic problem as the basis of an article, it still seems quite realistic. Particularly the conversations with the developers!

    I also like the fact that the examples you have given of poorly performing code are also things I have seen in the real world. I feel that you did an excellent job of selecting realistic examples.

    There were a couple of areas where I think the text could have used better editing and proof reading (missing or incorrect words), but I think even the best of professional authors face that issue! So that’s not really a critism.

    I like the pacing and the way you methodically work through what needs to be done.

    All in all, an excellent introduction to an incredibly complex subject. I gave you five stars. 🙂

  • Anonymous

    Nice
    It’s really good. Very well explained. Especially for a beginner like me. Looking forward for another article about sql server. 😀

  • Dustin Jones

    Great Article
    Great article! Well written, easy to follow examples, great explanations. Thank you!

  • chandra

    Appreciate ur work!
    Hey…good on u mate!
    Quite helpful

  • Anonymous

    Great Article
    Appreciate your great work.

  • alexlpi

    Great but you can do better
    Here is how you CAN do about ViewForum’s “The IOs on the Users table” – push joining to Users toward the end of the query plan. This way you execute nested loops only for the 20 output records:
    alter PROCEDURE [dbo].[ViewForum] (@ForumID INT, @PageNo SMALLINT, @UserID INT)
    AS
    begin
    UPDATE Users set CurrentStatus = ‘Viewing Forum’, ActiveEntity = @ForumID
    WHERE UserID = @UserID

    ;with Forum as
    (
    select
    ForumName
    , Forums.Descr
    , Threads.ThreadID
    , Threads.Title
    , Threads.CreatedOn
    , Threads.LastModified
    , Threads.Locked
    , Threads.TotalReplies
    , Threads.TotalViews
    , Threads.CreatedBy
    , Threads.LastPoster
    , ROW_NUMBER() OVER (ORDER BY Threads.LastModified DESC) AS ThreadNumber
    from Forums
    join Threads on dbo.Forums.ForumID = Threads.ForumID
    WHERE Threads.ForumID = @ForumID
    )
    ,ForumPage as
    (
    select * from Forum
    where ThreadNumber BETWEEN @PageNo*20 AND (@PageNo+1)*20 -1
    )
    SELECT ForumPage.*
    , Creator.UserName AS ThreadCreator
    , LastPoster.UserName AS ThreadLastPoster
    from ForumPage
    join Users Creator on ForumPage.CreatedBy = Creator.UserID
    join Users LastPoster ON ForumPage.LastPoster = LastPoster.UserID

    end — ViewForum

  • alexlpi

    Great but you can do better
    Here is how you CAN do about ViewForum’s “The IOs on the Users table” – push joining to Users toward the end of the query plan. This way you execute nested loops only for the 20 output records:
    alter PROCEDURE [dbo].[ViewForum] (@ForumID INT, @PageNo SMALLINT, @UserID INT)
    AS
    begin
    UPDATE Users set CurrentStatus = ‘Viewing Forum’, ActiveEntity = @ForumID
    WHERE UserID = @UserID

    ;with Forum as
    (
    select
    ForumName
    , Forums.Descr
    , Threads.ThreadID
    , Threads.Title
    , Threads.CreatedOn
    , Threads.LastModified
    , Threads.Locked
    , Threads.TotalReplies
    , Threads.TotalViews
    , Threads.CreatedBy
    , Threads.LastPoster
    , ROW_NUMBER() OVER (ORDER BY Threads.LastModified DESC) AS ThreadNumber
    from Forums
    join Threads on dbo.Forums.ForumID = Threads.ForumID
    WHERE Threads.ForumID = @ForumID
    )
    ,ForumPage as
    (
    select * from Forum
    where ThreadNumber BETWEEN @PageNo*20 AND (@PageNo+1)*20 -1
    )
    SELECT ForumPage.*
    , Creator.UserName AS ThreadCreator
    , LastPoster.UserName AS ThreadLastPoster
    from ForumPage
    join Users Creator on ForumPage.CreatedBy = Creator.UserID
    join Users LastPoster ON ForumPage.LastPoster = LastPoster.UserID

    end — ViewForum

  • GilaMonster

    Re: Great but you can do better
    Alex, if you note, I said in the article:
    “if there’s a problem in future I’ll probably have to rewrite the query as there’s no more I can do on the indexing side to reduce the reads further”

    Your rewrite is probably what I would do if there’s a future problem, but short of rewriting the query, there’s nothing to be done, and the main focus of this article was fixing indexes, not rewriting queries.

  • alexlpi

    Re: Great but you can do better
    You are right.
    My bad.

  • alexlpi

    Re: Great but you can do better
    But again, IF you rewrite the query the way I am suggesting – you can remove CreatedBy and LastPoster from the index (and that is better on the indexing side, right? :-)).

  • alexlpi

    Re: Great but you can do better
    Again, on the indexing side:
    I can’t believe you really like a nonclustered index covering 9 columns out of total 11 in a table.
    But IF you rewrite the main (see below) query then you can remove the whole INCLUDE clause and leave only
    CREATE NONCLUSTERED INDEX idx_Threads_ForumIDLastModified
    ON Threads (ForumID, LastModified DESC)
    As a result you would have much smaller index and still get better/less reads.

    declare @ForumID INT, @PageNo SMALLINT
    set @ForumID = 1
    set @PageNo = 1
    ;with Forum as
    (
    select
    ForumName
    , Threads.LastModified
    , Threads.ThreadID
    , Forums.Descr
    , ROW_NUMBER() OVER (ORDER BY Threads.LastModified DESC) AS ThreadNumber
    from Forums
    join Threads on dbo.Forums.ForumID = Threads.ForumID
    WHERE Threads.ForumID = @ForumID
    )
    ,ForumPage as
    (
    select * from Forum
    where ThreadNumber BETWEEN @PageNo*20 AND (@PageNo+1)*20 -1
    )
    SELECT ForumPage.*
    , Creator.UserName AS ThreadCreator
    , LastPoster.UserName AS ThreadLastPoster
    , Threads.Title
    , Threads.CreatedOn
    , Threads.Locked
    , Threads.TotalReplies
    , Threads.TotalViews
    , Threads.CreatedBy
    , Threads.LastPoster
    from ForumPage
    join Threads on Threads.ThreadID = ForumPage.ThreadID
    join Users Creator on Threads.CreatedBy = Creator.UserID
    join Users LastPoster ON Threads.LastPoster = LastPoster.UserID

  • GilaMonster

    Covering
    I don’t like a covering index on 9 of 11 columns. I said as much in the article.

    However without rewriting the query, which as I mentioned was beyond the scope of this article, it was the only option for getting the reads to an acceptable level.

    If you remove CreatedBy and LastPoster from the index (and you can remove the rest of the included columns if you do that as the index is no longer covering), then the original query’s IO stats are:

    Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0.
    Table ‘Threads’. Scan count 1, logical reads 3557, physical reads 0.
    Table ‘Users’. Scan count 2, logical reads 12, physical reads 0.
    Table ‘Forums’. Scan count 0, logical reads 2, physical reads 0.

    and your rewritten query’s IO stats are:
    Table ‘Users’. Scan count 0, logical reads 80, physical reads 0.
    Table ‘Threads’. Scan count 1, logical reads 3617, physical reads 0.
    Table ‘Forums’. Scan count 0, logical reads 2, physical reads 0

    An increase of 60 on the Threads table and 68 on the users table.

    The exec time of your query is very slightly better, but not significantly so (around 50ms)

  • GilaMonster

    Re: Covering
    That’s with just the CreatedBy and LastPoster columns removed from the include.

    With the rest of the include columns dropped as well, the exec plan goes back to the one that I showed in the article (maybe the wider but noncovering index was evaluated as less efficient than a clustered index scan), and the exec stats go to this.

    Original:
    Table ‘Users’. Scan count 0, logical reads 236
    Table ‘Threads’. Scan count 1, logical reads 193

    Revised:
    Table ‘Users’. Scan count 0, logical reads 80
    Table ‘Threads’. Scan count 1, logical reads 63

    The execution times for the two are identical.

  • GilaMonster

    Re: Covering
    Oh, and I should mention that I’ve deleted data since I wrote this, that’s why the reads are lower than the ones in the article.

  • alexlpi

    Re: Covering
    Gail,
    Again – a great-great-great article! Many thanks!

    Agreed – this discussion is beyond the scope of the article.

    But if we do go into statistics (I used SQL 2005 and the database you’ve provided with the article, @ForumID=1) I can see three things:

    1) The index without the INCLUDE is much smaller (the covering index actually covers not 9 but 10 columns)

    2) Number of reads with the covering index, and the original query is higher AND increases with @PageNo:

    @PageNo = 1
    Table ‘Users’. Scan count 0, logical reads 160
    Table ‘Threads’. Scan count 1, logical reads 3
    Table ‘Forums’. Scan count 0, logical reads 2

    @PageNo = 5
    Table ‘Users’. Scan count 0, logical reads 480
    Table ‘Threads’. Scan count 1, logical reads 4
    Table ‘Forums’. Scan count 0, logical reads 2

    @PageNo = 10
    Table ‘Users’. Scan count 0, logical reads 880
    Table ‘Threads’. Scan count 1, logical reads 6
    Table ‘Forums’. Scan count 0, logical reads 2

    3) Number of reads for the modified query with the much smaller index without INCLUDE is lower AND does not increase with @PageNo:

    @PageNo = 1
    Table ‘Users’. Scan count 0, logical reads 80
    Table ‘Threads’. Scan count 1, logical reads 42
    Table ‘Forums’. Scan count 0, logical reads 2

    @PageNo = 5
    Table ‘Users’. Scan count 0, logical reads 80
    Table ‘Threads’. Scan count 1, logical reads 42
    Table ‘Forums’. Scan count 0, logical reads 2

    @PageNo = 10
    Table ‘Users’. Scan count 0, logical reads 80
    Table ‘Threads’. Scan count 1, logical reads 42
    Table ‘Forums’. Scan count 0, logical reads 2

  • alexlpi

    Re: Covering
    Gail,
    Again – a great-great-great article! Many thanks!

    Agreed – this discussion is beyond the scope of the article.

    But if we do go into statistics (I used SQL 2005 and the database you’ve provided with the article, @ForumID=1) I can see three things:

    1) The index without the INCLUDE is much smaller (the covering index actually covers not 9 but 10 columns)

    2) Number of reads with the covering index, and the original query is higher AND increases with @PageNo:

    @PageNo = 1
    Table ‘Users’. Scan count 0, logical reads 160
    Table ‘Threads’. Scan count 1, logical reads 3
    Table ‘Forums’. Scan count 0, logical reads 2

    @PageNo = 5
    Table ‘Users’. Scan count 0, logical reads 480
    Table ‘Threads’. Scan count 1, logical reads 4
    Table ‘Forums’. Scan count 0, logical reads 2

    @PageNo = 10
    Table ‘Users’. Scan count 0, logical reads 880
    Table ‘Threads’. Scan count 1, logical reads 6
    Table ‘Forums’. Scan count 0, logical reads 2

    3) Number of reads for the modified query with the much smaller index without INCLUDE is lower AND does not increase with @PageNo:

    @PageNo = 1
    Table ‘Users’. Scan count 0, logical reads 80
    Table ‘Threads’. Scan count 1, logical reads 42
    Table ‘Forums’. Scan count 0, logical reads 2

    @PageNo = 5
    Table ‘Users’. Scan count 0, logical reads 80
    Table ‘Threads’. Scan count 1, logical reads 42
    Table ‘Forums’. Scan count 0, logical reads 2

    @PageNo = 10
    Table ‘Users’. Scan count 0, logical reads 80
    Table ‘Threads’. Scan count 1, logical reads 42
    Table ‘Forums’. Scan count 0, logical reads 2

  • GilaMonster

    Re: Covering
    1) Yes it is. Whether that’s significant or not, whether it’s a problem or not is not clear. There’s no hard and fast rule as to when an index is too wide. It has to be evaluated in context of the app and in the context of the queries using the index.

    2) Yes it does, because SQL’s using the Modified date to push a kinda TOP predicate down into the index seek and the higher the PageNo, the larger the number of rows allowed by that TOP.

    3) Not in my tests. I have the number of reads on Threads slowly increasing as the page no increases. That may be a virtue of different data. The version you downloaded is much smaller, even after I cut mine down a bit.
    What I have for reads on Threads (using ForumId 8) is
    PageNo = 1 Reads = 63
    PageNo = 5 Reads = 64
    PageNo = 10 Reads = 67
    It’s increasing slowly because the index is much ‘denser’, but it is still increasing.
    The reads on Users won’t change, because it’s joined in later after all the filters on the threads table are done.

  • alexlpi

    Re: Covering
    I rest my case 🙂

  • Tom Brown

    Best I’ve seen on the subject
    I’ve struggled through books on interpreting plans, and this article beats them all.
    I really like the clear way you step through the plans, explaining how they relate to the query.

    I did get a bit lost towards the end of the “Indexing the Post Table” section – I’m not sure where the Bookmark lookup came in, or how you could tell that from the plan.

    Thanks for a Great article.

  • GilaMonster

    Bookmark lookup
    If you look at listing 15, on the right-hand side of the exec plan there’s an index seek and a key lookup. That key lookup is the bookmark lookup (by any other name)

    Still not clear?

  • sqlCrazy

    Great Dissection of Execution Plan!
    Thank you Gail for a great article.
    I have really liked the way you have dissected each operator from the execution plan, analyzed and suggested solution by trials with good reasoning.The article cleared up lots of ambiguities that I had working with big execution plans like these.

  • chungboon

    How to obtain the store procedure
    Hi,

    How to retrieve the stored procedure for your procedureName, so that i can proceed to view its statistic and execution plan.

    Can you explain in more details for me. Thanks

    Regards,
    Chung boon

  • Patrick Index

    Superb
    Thank you for for this superbly written, pragmatic article. I have never met a project manager yet that wants to change the code. So you article reflects the real world as we know it.