09 March 2009

Finding the Causes of Poor Performance in SQL Server, Part 1

To tackle performance problems with applications, you first find the queries that constitute a typical workload, using SQL Profiler: Then, from the trace, you find the queries or stored procedures that are having the most impact. After that, it is down to examining the execution plans and query statistics to identify queries that need tuning and indexes that need creating. You then See what effects you've had and maybe repeat the process. Gail explains all, in a two-part article.

Part 1: Rounding up the likely suspects.

Troubleshooting a SQL Server performance problem is often a frustrating exercise as, without some experience in tuning, it’s hard to know where to start and what to do. In this article, part 1 of a 2-part series, I’m going to walk through a hypothetical scenario that takes a badly-performing database and demonstrates how to find the problematic procedures, and the problematic queries within them, using SQL Profiler.

In part 2, I’ll examine the execution plans and query statistics for the poorly-performing queries identified by Profiler, and propose some possible fixes for these problems.

The Problem

I’m the DBA responsible for maintaining an internet forum. Initially the performance of the site was great but, as the volume of data has grown and the load on the server increased, the queries have grown slower and deadlocks have become a frequent occurrence.

After discussion with the developers, I’ve identified indexing as a likely issue, because it sounds like little time was given to index design or performance tests during development. In fact, a cursory glance at the forum database reveals that no indexes were created other than the primary keys!

The Plan

It’s very likely that creating a few useful indexes will speed things up nicely. The problem is that creating useful indexes is impossible without knowing exactly what queries are running against the database, what tables and columns they’re using and which ones are contributing the most to the performance problem.

In light of this, I’ve devised the following plan to identify the indexes, and any other changes, that are necessary to restore forum response times to the required level:

  1. Capture a profiler trace for a short period while the server is active. This will identify the queries that constitute a typical workload.
  2. Import the trace results into a database table and analyse them to find the queries or stored procedures that are having the most impact on the server.
  3. Run those stored procedures in Management Studio against a test database, and examine the execution plans and query statistics.
  4. Use the execution plan and statistics to identify queries that need tuning and indexes that need creating.
  5. Implement the changes and review the effects.

Obtaining Trace Information

The first thing that I need to do is to identify the procedures that have the highest duration, the highest CPU, or are performing the most IOs. It’s a waste of time to try and optimise procedures that are either running well, or are never run

I’m not going to use the Profiler GUI to obtain the trace. Using the Profiler GUI is fine for a development server, but it can cause performance degradation on busy servers, even to the point of crashing the server. Hence I’ll be using a server-side trace.

NOTE: For details of why the Profiler GUI can cause problems, see this post on Grant Fritchey’s blog.

While I won’t be using Profiler to run the actual trace, I will use it to create the trace definition, since generating this by hand can be error prone (not to mention tedious). Once I have the trace definition, I’ll capture the trace file using server-side trace, which uses the sp_trace set of stored procedures. This is the same set of procedures that Profiler uses “under the covers” when you capture traces through the GUI.

Creating the Trace Script

A trace definition describes the trace that you would like to capture. In other words, it defines:

  • The SQL Server events that you would like to capture
  • The data columns that you would like to capture for each event
  • Any filters that you would like to create, in order to omit from the trace any occurrences of the event in which you are uninterested. For example, it is common to capture only those events that are generated by a particulate user, application or database.

In order to generate the trace definition, I must first create a trace within Profiler that defines all of the events, columns and filters that I want. For identifying poorly performing stored procedures, the most important events are:

  • RPC:Completed (in the stored procedures event category) This event is fired whenever a remote procedure call completes. An example of a Remote Procedure Call would be a stored procedure executed from a .NET application where the SQLCommand object’s command type is StoredProcedure.
  • TSQL:BatchCompleted (in the T-SQL event category). This event is fired whenever an ad-hoc SQL batch completes executing. An example of an ad-hoc SQL batch would be a query run  from Management Studio, or a query executed from a .net application where the SQLCommand object’s command type is Text.

The most important data columns are TextData, CPU, Reads, Writes and Duration. Other columns, such as the LoginName, ApplicationName and HostName, may be useful for identifying where the query comes from, but they are not essential for identifying which queries are performing poorly.

Start up Profiler, create a new trace (call it “WebForums” or similar), and then select the events and data columns as shown below in Figure 1. At this time, I’m not going to use any filtering. I want to see all of the queries executed against the server because I don’t know what is important and what is not at this stage. I’m also not setting up any form of grouping as grouping only appears within the Profiler GUI and has no effect on the script generated.

666-image002.jpg

Figure 1: Creating the Trace Definition

In order to get the script definition, I need to start the trace. For this reason, I’m doing this with Profiler pointed at a dev server, not at the production box. I’ll stop the trace immediately as any results are unimportant; the trace must just run before it can be scripted.

Once I’ve started and stopped the trace, I can generate the trace definition. The script option is available under the File menu, by selecting Export | Script Trace Definition, as shown in Figure 2:

666-image004.jpg

Figure 2: Exporting the Trace Definition

Once the trace definition is saved to file, I can open it in Management Studio and make a couple of modifications to it, before I run the trace against my production server. The items that I need to change in the generated script are the output file, the file size and the end time.

The trace file should be saved on a fast drive that does not have data or log files on it. I’ll often use a backup drive if there’s nothing else available. The heavier the load on the server, the more important it is that the destination drive for the trace does not contain any database files. I also set the trace file name to contain a time stamp, so that the trace name doesn’t have to be modified if I need to run it again.

Note that the output file does not need the .trc extension, as it will be automatically added.

I set the end time of the trace to be offset from the current time so that the trace doesn’t have to be run at a specific time. Half an hour, or an hour, is usually a good time period to run a trace. Much longer and you will find that there is just too much data to analyze.

Finally, I modify the MaxFileSize to something reasonable, so that I’ll have just one or perhaps two trace files will be created for the interval I’m tracing. If the size is set too small, meaning that many “rollover” files are created, it becomes tedious to work with them later

The complete trace script, once I’ve finished modifying it, looks like this:

All I need to do now is run that script when the server is busy and wait half an hour. After that, I have my trace file.

Interpreting the Trace

The next step is to analyse the trace file to determine which are the procedures and queries that are placing the heaviest load on the server. I’m going to do that by loading the trace file into a database on a dev server.

One option would be to load the trace file into Profiler and then use Profiler’s “save to database table” option. However, I prefer to create a dedicated table, called TraceResults, and then use a built-in SQL function called fn_trace_gettable, to read the trace and import it into the table:

Notice that I’m dividing the duration by 1000 because, in SQL 2005 and 2008, Profiler captures the duration in microseconds, and I have no interest in durations with that level of granularity.

The reason for using a script to import the trace is twofold. Firstly it’s easier and quicker to run a single query than to start Profiler, load up the trace file and then save it to a database table. Secondly, if the loading is done with a script, then both the creation of the trace and the loading of it can be automated using SQL Agent jobs. So, for example, you may have a job on the main server to start the trace and then one on the dev server, which starts an hour later, to copy the trace file over and import it.

Once I’ve got the trace file into a table, I can query it. It’s tempting to simply order by the duration, reads or CPU and just focus on the top few records. However that is not sufficient. Adopting this strategy here, I find that “most expensive” query is a search query that has reads and duration far in excess of any other query, as shown in Figure 3:

666-image006.jpg

Figure 3: Finding the “most expensive” queries

However, further investigation shows that the search query only ran once in the entire sample period, as shown in Figure 4:

666-image008.jpg

Figure 4: The query with the longest duration ran only once over the analysis period.

A query that runs only once a day and minutes to run is less of an overall problem in general than one that runs 10 times a second and takes a half second to run each time. The queries that I need to optimise most urgently are the ones that consume the highest cumulative amount of time, or CPU, or number of reads, over the period monitored.

If ad-hoc SQL is involved, calculating the cumulative time, reads or CPU can be extremely non-trivial, as the slightly different query structures, and different values that may be hard-coded within the query, can make it very difficult to identify queries of the same pattern. Fortunately, this application uses just stored procedures, and as such getting the cumulative execution characteristics is relatively easy.

NOTE:  Parsing and identifying matching ad-hoc SQL Statements is extremely complex and there’s no good way to do it in T-SQL. For an app that uses a great deal of ad-hoc SQL, I would recommend a tool like ReTrace to do the import and analysis, as it has the capability to parameterise queries that it reads from a trace file.

The following query will extract the name of the procedure from the text of the statement, using the EXEC keyword to identify where the procedure name starts and a space to find where it ends. This will not work perfectly in all cases, for example when a procedure has a space in the name. As with ad-hoc SQL, there’s no easy method that will work in all cases to identify the procedure name, but this does cover the majority of cases.

Now that I have the procedure name for each of the queries in my trace table, it’s trivial to aggregate the execution characteristics of these stored procedure over the sample period, as shown in Figure 5:

666-image010.jpg

Figure 5: Aggregating execution statistics by stored procedure

As you can see, the procedure that had the both the highest total duration and the highest total IO is “ViewThread”, with a total duration summed over all executions of 1,815,846 milliseconds, a total number of IOs of 442,711,386 and an execution count of 2377.

It’s these procedures that I need to tackle first, in my drive to improve the overall system performance. I’ve often found, based on my experience performance tuning assorted systems, that optimising the three most intensive procedures (by time, CPU, reads or a combination) will result in dramatically improved performance of the entire system. If overall performance does not improve sufficiently, then I’ll look at the next three procedures, and so on.

I do not recommend ever trying to optimise everything in one go. Firstly, trying to optimise all stored procedures, in anything other than a tiny application, will take ages. Secondly, changes made because of one procedure may completely change the execution characteristics of a different procedure and invalidate any work done on that.

Summary

Now I know the worst offenders, I need to take those procedures apart and look at why they are performing so badly. To do that, I’m going to use query statistics and the execution plan from SQL Management Studio. Look out for Part 2!

In the downloads below, you’ll find the 2005 database .BAK file (“WebForums database”) and a .exe file called “BadForums”, which is simply a form that submits queries to the database. People using this will not see the same performance values as Gail has listed in the article. This is simply due to the data volume, her original copy has 180 000 records in the Threads table and 1.2 million in the Posts table, whereas this copy has 2500 and 75000.

In part 2, Gail examines the execution plans and query statistics for the poorly-performing queries identified by Profiler, and proposes some possible fixes for these problems.

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

Downloads

This post has been viewed 147739 times – thanks for reading.

Tags: , ,

  • Rate
    [Total: 356    Average: 4.5/5]
  • Share

Gail Shaw

View all articles by Gail Shaw

  • Granted

    Excellent
    Fantastic. Thanks for the nod.

  • AreEyeEkks

    Briliant
    Excellent article. Highly thorough. I’m looking forward to part 2!

  • pinaldave

    Good Work
    Thanks Very interesting. I enjoyed it very much.

    I have recently started to learn Profiler and I have created almost similar example.

    Again good work.

  • Mark Challen

    Great article
    Not used a server-side trace before, thanks for additional tweaks that make it practical.

    What are the pros/cons of this method vs. using the DMV’s? I have explored the DMV’s and they offer a very similar perspective on analyzing system performance and identifying long-running queries.

  • GilaMonster

    DMVs vs SQLTrace
    Mark: I assume you’re mainly using the sys.dm_exec_query_stats DMV?

    A query’s cumulative execution statistics only appear in that DMV while the query’s plan is in cache. If, for any reason, there’s a recompile (old statistics, DDL change, temp table creation) then the query stats, along with the plan, are flushed from cache. Previous values are lost.

    A not-so-obvious side effect of that is that a query that’s run with the RECOMPILE option (or part of a procedure marked WITH RECOMPILE) will not appear in that DMV as it’s plan is never cached.

    The plan cache also does not contain DDL statements as they only have one way to execute and hence don’t get an execution plan. If the root cause of the bad performance is a badly-timed index rebuild or consistency check, it won’t appear in the DMV.

    I’m not saying the DMV is useless. There are many, many good uses for the query stats DMV, but it is in no way a replacement for SQL Trace, not was it intended to be a replacement

  • Anonymous

    Excellent Article
    Excellent article. Short, sweet and to the point!

  • Senthilkumar

    Use of Profiler in Production
    This is a great article. We have a serious database performance issues in our production servers where as we were not able to simulate the same in our dev boxes because to simulate such load and hit is not that easy. Anyway i will try to follow the steps given above and see whether i can achieve anything out of it.

    Thanks
    Senthil

  • Jack Corbett

    Excellent. Now in my briefcase
    Nice article Gail. You’ve set the bar high as usual. Looking forward to article 2.

  • Ivan Tsatsarov

    Thank you for the article.
    This is very useful article i’ve not used server traces before ,I’ve tryied Profiler but the impact on performance is huge.
    Have you compared trace results with result of missing index DMV -sys.dm_db_missing_index_*

  • GilaMonster

    Trace and missing indexes
    Jack: Thanks. I’m still trying to get part 2 finished.

    Ivan: I’m not sure what you’re asking. The missing index DMVs and the trace look at completely different info. How would I compare them?

  • Willem Tilstra

    Retrace instructions
    Gail, I loved your article. Traces have been challenging for me and the process you detail in this article is very helpful. However, my shop has a lot of Ad-hoc SQL and I followed up your link on Retrace. Unfortunately, there is not much documentation on that application. Do you have any information or links that expand on Retrace? The instructions in Ken Henderson’s block assume a lot of things and it’d be nice to have the assumptions spelled out.

  • GilaMonster

    Retrace
    I don’t, unfortunately. I used it a long time back, and I don’t remember all the details

    I did run across an alternative in some ‘light’ reading today.

    Have a look at sp_get_query_template. It returns the parameterized form of a query, and hence should be a lot more appropriate for use here than the ReTrace app is.

    The proc’s intended for use with plan guides, but that doesn’t mean it’s the only use.

    I just wish I’d run across it a couple weeks back. *embarrassed*

  • Michael Lewis

    Great Article
    Grate Article, similar to the steps listed in Chapter 3 of Inside MSSQL Server 2005: T-SQL Querying.

    The chapter in the book extends your example by building the query signatures, Stuart Ozer (SQL Cat team) has a UDF that builds the query signatures, the book also goes through the steps of implametning this function in CLR… which runs much faster then the function.

    Perhaps Willem should get the book from Amazon, or perhaps search the SQL CAT blog for the code.

  • David K

    Run this everyday
    Hi Gail

    Great article. I do a similar thing, everyday on all my production servers. I created a profiler trace, but not using the export method, more manually. I then run this trace as a SQL Agent Job on all my production servers, starting at 8AM. I’ve set the MaxFileSize to 100MB and on the busy servers that fills up in 10 minutes! I then have a SQL agent job on my DBA server that runs a 7 step job.

    The 1st couple of steps are powershell scripts that delete all previous trace files in a specific folder and then move all the trace files from my prod SQL servers to my DBA server.

    Then I use powershell again to loop through all the new trace files and using fn_trace_gettable wrapped in an sp I import all the trace files.

    After importing and a couple of other steps I populate a Top10 table of the top 10 worst performing queries per database. So in one report I have my top 10 baddies for all databases for all my servers.

    If anyone wants this code, mail me at dkorzennik@hotmail.com.

    Happy Tracing.

  • Balaji

    Useful Information
    Hi Gail,

    It’s a very useful article. This is giving good information to get the trace information from production servers. Thanks for the information.

    Balaji.

  • RobertDavis

    Good article
    Good article Gail!! I didn’t recognize you by name (I do recognize you as GilaMonster though), and I was happy to see that it wasn’t another article suggesting using SQL Profiler on a production machine.

    I think this should be a must read for anyone new to troubleshooting and optimization.

  • Bruce W Cassidy

    An excellent introduction to the topic
    I like that you’ve used a real world example that people reading the article can identify with, and yet one easy enough to serve as a good test case.

    The article reads very well. 😀 I’ll look forward to part two.

  • Tim Walker

    Tune up advice for beginners and upwards
    Well explained step by step guide to basic profiling and more advanced tips on analysis, without which it’s hard to achieve anything. Thanks!

  • Raghunath

    Excellent Article
    must read this, good article to learn, how to improve DB performance for huge application.
    D I’ll look forward to part two.

  • MJ

    Nice
    Great advice.

    You can extract the stored procedure name easier with this line (I couldn’t get yours to work):

    update trace_results
    set [procedure_name] = substring(text_data, 6, charindex(‘ ‘, text_data, 6) – 6)

    where text_data like ‘%exec%’

  • Anonymous

    What version?
    Would be nice to know what version this works on – I got most of the way through only to realise it doesn’t appear to work on SQL 2000

  • Anonymous

    Verfy Good
    Good idea this is useful for analysis

  • GilaMonster

    Proc name and Version
    MJ: Not necessarily. Consider an RPC call like this (taken from the this example)

    declare @p5 int set @p5=199109 exec AddThread @ForumID=8,@UserID=13,@Title=N’Test’,@PostText=N’Test2′,@ThreadID=@p5 output select @p5

    Anonymous(1): What part doesn’t work on 2000? It was tested in SQL 2008, but there’s nothing I can see that’s 2005/2008 specific at this point.

  • Michael

    greatful help
    Hi Gail! For me as a newcomer your article is a very good help. I do all the steps and now i have my trace file. But i can’t work with it, because this file is still in use! There is no process still running on this file. Have you an idea?
    Thanks
    Michael

  • Michael

    solution
    i found the reason… the tracefile is stored on a fileserver. I can access this file from the sql-server, not from another server…..

  • Kevin Bacon

    SQL Server 2000
    Great article, looking forward to part 2!

    I got it working on SQL Server 2000 with the following changes:

    SELECT
    cast(TextData as varchar(4000)),
    Duration,
    Reads,
    Writes,
    CPU,
    StartTime
    FROM
    ::fn_trace_gettable(‘\it12TracesTest20090326135109.trc’,1)

  • Kevin Bacon

    SQL Server 2000
    Great article, looking forward to part 2!

    I got it working on SQL Server 2000 with the following changes:

    SELECT
    cast(TextData as varchar(4000)),
    Duration,
    Reads,
    Writes,
    CPU,
    StartTime
    FROM
    ::fn_trace_gettable(‘\it12TracesTest20090326135109.trc’,1)

  • GilaMonster

    On the location of the trace files
    Just one important point that I didn’t emphasise enough.

    On a busy production server, the trace file must go onto a fast *local* drive. Not a network share.

    Networks are still slower than a fast drive and, if the trace has to go across the network, it could end up having an adverse effect on SQL as processes have to wait for the trace to write before they can complete.

  • pezzar

    options parameter
    Hi Gail

    its about time someone covered this it gets a lot of questions on SSC.

    Important to note is the options value in sp_trace_create, if its not set to 2 (default is always zero for some odd reason, even when you tell profiler to rollover to a new file 🙁 ) then it will not roll over to a new file when maxsize is reached,

    Regards
    Pezzar

  • rubes

    ProcedureName
    Gail, great article! It gave me a few new ideas.

    One addition… If you’re using SQL 2005 and most of your calls are procedure based, I would also recommend adding the ObjectName column to your trace. Then you don’t have to parse the TextData column… or it at least confirms that your parsing is correct.

  • biill-k

    Great article, but one suggestion
    Great article on query tuning, but . . . before adding the indexes to specific tables, I would recommend capturing a workload specific to those tables. You can then run the workload thru the Database Tuning Advisor. My point is that while you may come up with an optimal index for a particular query, it may less than optimal for the total workload thar occurs against that table.

  • biill-k

    Prior comment should have been for second part of article
    Prior comment should have been for second part of article

  • GilaMonster

    DTA
    I prefer not to use DTA. I find it’s suggestions to be adequate at best, often well overboard and, quite frankly, I can do a better job by hand.

    In my opinion, DTA is useful for people who don’t know how to tune indexes. If you do, do the tuning yourself. It may take a bit longer but the results may very well far exceed what DTA will pick up.

  • Tom Brown

    Excellent Article
    This is very well prepared, clear and easy to follow. Lots of useful information. Thanks Gail.

  • ratheeshknair

    Great Article
    Hi Gail,

    This article is simply superb and really helpful.

    Thanks A lot

  • siddarthpal

    great artcile
    Hi Gail,

    Very Nice article and I will use the same to know how performance can be optimized.

    Also delighted to know that you are from SA. I have worked at Durban for 1.5 years, I love your country.

    Cheers
    Siddarth

  • ghanshyamdayal

    cool it is
    one confussion:
    microsoft on its site says that : Refference
    [http://msdn.microsoft.com/en-us/library/ms175848.aspx]

    In SQL Server 2005 and later, the SQL Server Profiler graphical user interface displays the Duration column in milliseconds by default, but when a trace is saved to either a file or a database table, the Duration column value is written in microseconds.
    Microseconds (one millionth, or 10-6, of a second)

    I can see u r are saving the trace output as an file. in this case is it correct to divide the duration by 1000 to get the duration in second if u r taking it in second.

    And yes i use objectname from the trace file to check the frequency of a procedure and the maximum time taken by it.

    please advice
    Ghanshyam Dayal

  • ghanshyamdayal

    cool it is
    one confussion:
    microsoft on its site says that : Refference
    [http://msdn.microsoft.com/en-us/library/ms175848.aspx]

    In SQL Server 2005 and later, the SQL Server Profiler graphical user interface displays the Duration column in milliseconds by default, but when a trace is saved to either a file or a database table, the Duration column value is written in microseconds.
    Microseconds (one millionth, or 10-6, of a second)

    I can see u r are saving the trace output as an file. in this case is it correct to divide the duration by 1000 to get the duration in second if u r taking it in second.

    And yes i use objectname from the trace file to check the frequency of a procedure and the maximum time taken by it.

    please advice
    Ghanshyam Dayal

  • vladi

    Internal optimization vs external optimization?
    Great article!
    Internal optimizations are important.
    But, in my experience, the DBA is part of the team, which includes the development, the IT and the DBA.
    Also, in many cases the optimization options are limited (3rd party application, legacy, dont touch what works even if works badly).

    This is why great performance optimization solutions like SafePeak, that sits outside of the database and performs dynamic caching.
    The guys there did real cool job and since it has free trial option it is easy to test it.

    What do you think?

    Vladi

  • Psalm121

    Excellent
    Very clear and detailed explaination. Good work!

  • kevin0017@yahoo.com

    thank you
    thank you for thing this info.

  • Sagheer

    Excellent Article with one problem
    This is great. I think author just forgot to mention if you don’t set the ‘max_rollover_files’ and ‘filecount, the code above will always create one file of a size set by @maxfilesize.

    So to fix that, make the following change:
    Replace this line:
    exec @rc = sp_trace_create @TraceID output, 0, @OutputFileName, @MaxFileSize, @EndTime
    with this line:
    exec @rc = sp_trace_create @TraceID output, 2, @OutputFileName, @MaxFileSize, @EndTime,5

    It will set the ‘max_rollover_files’ on and max filecount to 5.

    Thanks

  • peterdru

    Database Capacity Planning
    Poor Performance causes many effects on our sql server so we need to continuously check on the performance issue via monitoring it on a regular basis
    This problem is easily sort out by any of monitoring utility
    I saw some recommendation of this tool in different websites by Experts of monitoring the sql server performance

    http://www.sqlserverperformancemonitoring.com/

  • veryjrdba

    sp_cursoropen
    Hello Gail,

    Many thanks for this excellent article. I have one question, though.

    When I followed the steps you recommended, these were the top 4 procedures called:
    sp_cursoropen
    sp_prepexec
    sp_cursorfetch
    sp_execute

    All of them except sp_execute seem to be purely system prepared. I realize that probably indicates an issue in itself, namely a lot of ad hoc procedures rather than pre-designed ones. Also seem to be linked server queries including columns such as "Tbl1084" and "Col1076".

    Having said that, though, can you recommend a way to break up those results in order to get more meaningful reports about which exact statements were run? Or do I just have to order by TextData and go through each set individually?

    Thanks for any help.

    veryjrdba

  • Prateek25

    Excellent Article
    Excellent Article. Very informative for a new person. Learn lot of things.

    Thanks for sharing the knowledge.

    Prateek

  • Farooq

    Nice article
    Excellent and imformative.

    Thanks.

  • smtzac

    error message
    INSERT INTO TraceResults
    (TextData, Duration, Reads, Writes, CPU, StartTime)
    SELECT TextData, Duration/1000, Reads, Writes, CPU, StartTime

    FROM fn_trace_gettable(‘C:testWebForums20160602145620.trc’,1)
    Msg 8152, Level 16, State 10, Line 2
    String or binary data would be truncated.

    the issue is with ‘TextData’ Column. I even changed to VARCHAR (8000) not working. But when I deleted the ‘TextData’ it works!!! Please suggest