The Seven Sins against TSQL Performance

There are seven common antipatterns in TSQL coding that make code perform badly, and three good habits which will generally ensure that your code runs fast. If you learn nothing else from this list of great advice from Grant, just keep in mind that you should 'write for the optimizer'.

It’s not enough that your code is readable: it must perform well too.

There are three very basic rules for writing T-SQL that performs. They are cumulative. Doing all of these will have a positive impact. Skipping or changing any of these is likely to have a negative impact on your code’s performance.

  • Write to your data structures: If you are storing a datetime data type, use that, not a varchar or something else. Also take advantage of the foreign key constraints and other structures that you have in place when you’re writing your T-SQL code
  • Write for your indexes: If there are indexes on your tables, and there should be, write your code so that it can take advantage of those indexes. Make sure that your clustered index, you only get one, is working well to help drive your queries to your data in an efficient manner.
  • Write for the optimizer: The query optimizer is an amazing piece of software. But, you can overwhelm it by writing code that isn’t configured best to support it, such as nesting views within views within views, etc. Take the time to understand how the optimizer works and write your code in such a way that you help it, not hurt it.

Then, there are some very specific and common mistakes that people make in their T-SQL code. Just don’t do these and you’ll be much better off.

Using the wrong data types

This is surprisingly simple in concept, but seems to be incredibly difficult in practice. Here you go… use the data type that is in your database. Use it in your parameters and in your variables. I know that SQL Server can implicitly convert from one to another. But when you get implicit conversions, or you have to put in explicit conversions, you’re performing a function on your columns. When you perform a function on your columns in any of the filtering scenarios, that’s a WHERE clause or JOIN criteria, you’re looking at generating table scans. You may have a perfectly good index, but because you’re doing a CAST on the column in order to compare to a character type that you passed in instead of a date, that index won’t get used.

Don’t believe me? Check this query:

Nice & simple. Should be covered by an index on the table. But here’s the execution plan:

1534-d379693b-0b07-4f4d-bd17-c9c9b091989

This query ran fast enough, and the table is small, so there were only 4 reads to scan the entire thing. Notice that little exclamation point on the SELECT operator though. What do the properties tell us:

1534-d7d6ca29-b008-406a-9508-8a5a3aa5cdc

That’s right. It’s a warning (new in SQL Server 2012) that you have a “PlanAffectingConvert.” In short, because I didn’t use the correct data type:

 

I didn’t get a plan that looked like this:

1534-f5aaad96-0a1c-44ab-b3ec-06cddfa3dc7

And there were only 2 reads this time instead of 4. And yes, I realize I only cut performance from a little to a very little, but what if that table had millions of rows instead of only a few? Ah, now I’m suddenly a hero.

Use the correct data types.

Using Functions in Comparisons within the ON or WHERE Clause

Speaking of functions, many, most, of the functions that you run against your columns in WHERE and ON clauses will prevent the proper use of indexes. You will see slower performance since SQL Server has to perform scans against the data in order to take into account your function. For example this:

 

That function, the LEFT operation on the column, results in this execution plan:

1534-af37c84b-78ea-4c9a-8382-653f17432db

This took 316 reads to find the correct data over about 9ms (I have very fast disks), all because it must process each and every row to determine where the information matches the function. It can’t simply scan within the existing data. But, not all functions are created equal. You can do something like this:

Which results in a very different execution plan:

1534-35670272-57b3-4340-b768-db89d829609

This resulted in 3 reads and 0ms. Let’s call it 1ms just to be fair. That’s a huge performance increase. All because I used a function that could work with search arguments. The old term for this, not used much any more, is sargeable. Either don’t use functions in comparisons, or use ones that are sargeable.

Employing Multi-Statement User Defined Functions (UDFs)

These things are a trap. It appears, at first blush, like this marvelous mechanism that’s going to allow you to use T-SQL like a real programming language, not just a scripting language. You can build these functions and then have them call each other and your code will be very reusable, unlike those nasty old stored procedures. It’ll be a glory… Until you try to run your code with more than a few rows.

The problem here is that the multi-statement user defined function (UDF) is designed around the table variable. Table variables are very cool things for what they’re meant to do. They have one distinct difference from temporary tables, table variables have no statistics. This can be a time saving, even a life-saving difference… or… it can kill you. If you don’t have statistics, the optimizer assumes that any query run against the table variable, or UDF, will result in a single row. One (1) row. That’s great if you’re only returning a few rows. But once you start returning hundreds, or thousands, and you’ve decided to join one UDF to another… performance drops very, very quickly and very, very far.

The examples here are somewhat long. Here are a series of UDFs:

This is a great structure. It makes putting together queries very simple. Here’s an example:

That is one very simple query. Here’s the execution plan, also very simple:

1534-67795d2d-632f-47f3-948a-eca758518c1

It only took 2170ms to run, returning 148 rows with 1456 reads. Note, that the function is listed as having zero cost and it’s just the scan of the table, a table variable, that is paid for in the query. Or is it? Here’s a little query that will let me get a look at what’s going on behind that zero cost function operator. This pulls the plan for the function from the cache:

And here is what is actually going on:

1534-0f661262-e283-4b75-948b-2ccf76b1139

Oops, looks like even more of those little functions and scans that are almost, but not quite, free. Plus a Hash Match operation, which writes out to tempdb and has actual real costs for the SQL engine. Let’s look at one more of those UDF execution plans:

1534-fc94449b-8b41-4a2c-8f01-59728c400a2

Whoops! Now we’re getting a Clustered Index Scan and quite a large number of rows being scanned. This is not pretty. And there is more than one of these processes occurring. Suddenly, the UDF looks a lot less attractive. What if we were to, oh, I don’t know, just query the tables directly like this:

Now, running this query returns exactly the same rows, but it does it in 310ms instead of 2170ms. We also end up with only about 911 reads instead of the 1456. Clearing, it’s not very hard to have problems with table valued user-defined functions.

The “Run Faster” Switch: Allowing ‘Dirty Reads’

Back in the old days of computing, for some reason, our 286 units came with a button on the front called the Turbo button. If you accidently hit it, suddenly your machine was running extremely slowly. So you made sure that silly thing was always switched on so that you received maximum throughput. Well, a lot of people look up on READ_UNCOMMITTED isolation level and the NO_LOCK query hint as the turbo button for SQL Server. Make sure they’re on every query and your system will run faster. This is because these commands make it so that no locks are taken out by shared reads. Fewer locks mean faster processing, easy. But…

When you run this, you will end up with dirty reads. Everyone assumes this means that ‘dog’ will read ‘cat’ until it gets updated, which is true. But, you’ll also get extra rows, fewer rows, duplicated rows, as the pages are rearranged underneath your query where you have no locks to prevent it. I don’t know about you, but most of the businesses I’ve worked for expect that the most of the queries on most of the systems will return consistent data. Same set of inputs against the same set of values and you get the same results. Not with NO_LOCK. For testing this, I strongly recommend you read this blog post from Jason Strate. He illustrates the issue in perfect clarity

Applying Query Hints indiscriminately

People are just a little too quick to pull the trigger on query hints. The most common situation I’ve seen is when a hint is used to fix one, very distinct problem on one query. But, when the database professionals involved see the improvement in speed, they immediately apply that hint… everywhere.

For example, a lot of people are under the impression that the LOOP JOIN operator is the best for joins. This is because it’s most frequently seen in small, fast queries. So, people will try to force it. It’s not hard:

This query will run in about 101ms.It has 4115 reads. Not bad performance, but if we drop the hint, the same query runs in about 90ms, but it only has 2370 reads. As the system gets under more & more load, this is going to show itself as the much more efficient query.

Here’s another example. This one is slightly contrived, but it gets the point across. People often put an index on their table, expecting it to solve the problem. Here we have a query:

The problem is that you’re running a function against the column, not that any index that has been created is not adequate. The performance stinks because it’s doing a full scan of the clustered index. But when people see that the index they just created is not getting used, they do this:

Now, they’re getting an index scan instead of a clustered index scan, so the index is being “used” now right? But performance changed, going from 11 reads to 44 reads (time was both near 0ms, I have fast disks). Used is the term, but not the way that was meant. The solution of course is to restructure the query:

Now the number of reads drops to 2 because the index is being used in a seek, as it was meant to be.

Query hints need to be considered a last resort after all other choices have been eliminated.

Allowing ‘Row By Agonizing Row’ processing

The phrase, Row By Agonizing Row, is shortened to RBAR (pronounced rebar). This comes from using cursors or WHILE loops instead of set-based operations. This leads to extremely slow performance. The use of cursors comes from two sources. Either developers who, appropriately so, are used to row by row processing in their code, or Oracle people, who think that cursors are a good thing (guys, SQL Server cursors are hidden in the execution plan). Either way, cursors are quick performance death.

The example is pretty straight-forward. We’re going to update the color of products where they match a certain criteria. Is this contrived? No, actually, it’s based on real code that I had to tune:

Not only is there no real possibility of code reuse, but we’re doing 2 reads on every single loop of this operation, for all the rows in the system for a total in the hundreds. Execution time on my machine with no other load in place is well over a second. It’s insane. The set-based operation is pretty simple:

Here you get 15 reads and 1ms execution. Don’t laugh. People really do stuff like this, and worse. Cursors are just something to be avoided outside of areas where you have no choice, like maintenance routines that have walk tables or databases.

Indulging in Nested Views

Views which call views that join to views which are calling other views… A view is nothing but a query. But, because they appear to act like tables, people can come to think of them as tables. They’re not. What happens when you combine a view with a view and then nest them inside each other, etc., is that you’ve just created an incredibly complex execution plan. The optimizer will attempt to simplify things. It will try to come up with plans that don’t use every table referenced. But, it will only attempt to clean up your plans so many times. The more complex they get, the less likely that you’ll get a cleaned up plan. Then, performance becomes extremely problematic and inconsistent.

Here are a series of fairly simple queries defined as views:

 

You get a query that runs in about 155ms and does 965 reads against two tables. Here’s the execution plan:

1534-a7150199-7f47-49d3-8aa0-fbf258e6eec

Looks pretty benign really. You’re returning 7000 rows, so this is not completely out to lunch. But what if we just ran this query instead:

Now we’re looking at 3ms and 685 reads. That’s radically different. Here’s the execution plan:

1534-43526e64-6949-4275-9a29-c039048557b

What you’re seeing is the fact that, try though it might, the optimizer couldn’t eliminate every table from the query as part of it’s simplification process. So you have two extra operations in your query, the Index Scan and the Hash Match join that puts it all together. You’d be so much better off avoiding that work by writing your queries directly rather than using views. And remember, this example is very simple. Most examples I see of this are much more complex, resulting in much worse performance.

If you have enjoyed this article, we suggest you take a look at Grant’s humorous video on the importance of verifying your backups.

For more articles like this, sign up to the fortnightly Simple-Talk newsletter.

Tags: , , , ,

  • 165047 views

  • Rate
    [Total: 350    Average: 4.6/5]
  • SqlNightOwl

    For the App Devs
    Posting this on our company wiki. Our dev’s complain that the database is slow — from what I see in the plan cache, it’s their code and not the database. I see this all over the place in the form of single use plans. ORM’s aren’t evil, they’re just used that way.

  • Grant Fritchey

    ORMs
    ORM tools are absolutely fantastic. I think it’s great what they can do. But it does seem to be a very common case for people to use them in horrifically inappropriate ways. Clearly more education is needed.

  • Alex Fekken

    Cursors
    Thanks Grant, useful tips that are often ignored.

    But I have always wondered about why “Oracle people[, who] think that cursors are a good thing”. Isn’t it so that there is nothing inherently bad about cursors (except perhaps from an educational perspective when they are easily avoided) other than that the SQL Server implementation for them is so incredibly inefficient?

  • callcopse

    You call that a nested view?
    I personally love ORMs. If used to eliminate the grind program they are an effective tool for speeding up development. If used in clever ways they tend to degrade performance.

    The Nested Views bit made me laugh mirthlessly though. I’ve got some embedded 7 deep – that use random UNIONs with magic numbers and distinct clauses. I curse life and mankind every time I need to look at them (often after 3 days without even a tentative effort at editing). Performance is – as you would expect. Unfortunately with no clues as to the overall desired effect in terms of comments etc I have not yet dared rewrite as mildly sensible code.

  • Grant Fritchey

    Orace Cursors
    The reason there is so much confusion is because Oracle has a couple of different kinds of cursors. They do have the same one we have in SQL Server, but I don’t they like that any more than we like ours. They also have cursors that we have & use every day, we just don’t call them cursors. Look up the definitions for Loop Join, Hash Join and Merge Join. Those are cursors too. They’re the ones that Oracle people are constantly looking for within SQL Server.

  • Grant Fritchey

    Nested Views
    Yeah, that’s not going to be fun. Nothing for it but to do the hard work & replace them.

  • waqas

    clustered index,composite primary key, Common table expression
    Very useful tips Grant,Thanks

    What would be your opinion in terms of performance when unique clustered index is used on more than two columns to create composite primary key?

    What would you say about common table expression based queries?

  • Anonymous

    missing nested view “bad query”
    Looks like you’re missing the “bad query” for nested views. You define the views and have the execution plan, but don’t state the query.

    Great article nonetheless. Thanx!

    Gary

  • Grant Fritchey

    Missing query
    Sorry about that. It’s just the same query as the one below it but against the view:
    SELECT soh.OrderDate

    FROM dbo.CombinedSalesInfoView AS soh

    WHERE soh.SalesPersonID = 277 ;

  • G

    “fairly simple queries defined as views”
    Re: nested views

    Apart from missing the query you are missing the data model. Table definitions, constraints, data types – as you seem suggest in your first example – are all relevant, especially when re-writing queries.

    For example, the fact that the Address table doesn’t appear in your first execution plan suggests to me that SalesOrderHeader.ShipToAddressID is mandatory and there is a FK constraint on that table-column to Address.AddressID. Take any of these conditions away and these 2 queries are no longer equivalent:

    SELECT soh.OrderDate
    FROM Sales.SalesOrderHeader AS soh
    WHERE soh.SalesPersonID = 277 ;

    SELECT soh.OrderDate
    FROM dbo.SalesInfoView AS soh
    WHERE soh.SalesPersonID = 277 ;

    As for the query you posted above:

    SELECT soh.OrderDate
    FROM dbo.CombinedSalesInfoView AS soh
    WHERE soh.SalesPersonID = 277 ;

    I’m not sure why you consider it equivalent to

    SELECT soh.OrderDate
    FROM Sales.SalesOrderHeader AS soh
    WHERE soh.SalesPersonID = 277 ;

    Not unless you know, and don’t tell us, that for every SalesOrderHeader record there always is one and only one SalesOrderDetail record. I’m not sure most of your readers will find that 1-to-1 relationship "obvious".

    Until one shows proof that 2 queries are equivalent, any talk of "better way" is a moot point.

    PS: Not sure what you were trying to achieve with the "Oracle people, who think that cursors are a good thing" comment. It just doesn’t make you look very knowledgeable.

  • Ian Stirk

    using correct datatypes…
    Hi,

    The following article tackles the problem at its source: http://www.sqlservercentral.com/articles/Admin/65138/

    The utility described in the article will allow you to quickly identify which columns have mismatched datatypes across tables in a database, correcting these will improve performance, integrity and maintainability.

    Additionally, you can use the following utility to search queries/plans that have implicit conversions:
    http://www.sqlservercentral.com/articles/Performance+Tuning/66729/

    Thanks
    Ian

    Reply

  • Ian Stirk

    using correct datatypes…
    Hi,

    The following article tackles the problem at its source: http://www.sqlservercentral.com/articles/Admin/65138/

    The utility described in the article will allow you to quickly identify which columns have mismatched datatypes across tables in a database, correcting these will improve performance, integrity and maintainability.

    Additionally, you can use the following utility to search queries/plans that have implicit conversions:
    http://www.sqlservercentral.com/articles/Performance+Tuning/66729/

    Thanks
    Ian

    Reply

  • Grant Fritchey

    to waqas
    Common table expressions (CTE) are an excellent tool for your queries. They’re not magic (except with recursion) but instead are simply a different way of defining a derived table. Use them where appropriate, they’re fine. I have seen them overused. Like anything, use the hammer to drive nails = good, smashing my toes = bad.

    Compound indexes are another thing that are just fine but can be abused. Want to index on, oh, I don’t know, zip code and eye color? Great if it helps the queries, but I probably wouldn’t make eye color the first column since that’s what drives the indexes statistics and shows how useful it might be for the query optimizer. See what I mean?

  • Grant Fritchey

    G
    The data model is Adventureworks2012. I don’t know how much real-world business functionality I’d derive from that.

  • Grant Fritchey

    Address Table & the View Example
    Hey G,

    The reason that you don’t see the Address table in the first example is because the optimizer goes through a process that it calls simplification. That process can, and did in this case, eliminate tables that are not needed with a query. But, as was shown in this case, you can’t rely on the simplification process as a perfect engine. It eliminated many, but not all of the unnecessary tables. To satisfy the simple query run against the view, all that was needed was a single table, but the optimizer didn’t pick that out.

  • Bryant

    Seven Sins
    Thanks for the distillation. I’ve learned most of these and tried to teach many. It is amazing how much performance you can gain in eliminating function calls and explicit conversions from WHERE clauses.

    I might suggest you also take up OR in WHERE clauses as number 8. I have seriously improved many performance issues centered on an OR-induced table scan by UNIONing the individual conditions together. It doesn’t work every time but it works more often than not.

    As for the nested views, I have inherited some code that has that issue, specifically because of the hierarchical relationship of the underlying tables and the use of views to control security, filter soft-deleted rows and establish row counts to be passed upward. I looked into an indexed view but the underlying views are not deterministic.

    The upshot here is to plan ahead and not fall into the trap of committing yourself to using views for everything because they are also used for security purposes on data exposed to users for manipulation. You will end up with unwieldy code that is poorly performing or a long project to extract the essence into better code.

  • chuckh1958

    Oracle People and RBAR
    I’m a SQL/Oracle DBA. That code would be bad in Oracle too! It would run much better by putting "WHERE Weight < 3" in the cursor defintion, but a single properly written UPDATE will perform much better in any RDBMS I’ve ever worked on (SQL, MySQL, and Oracle).

    Regarding RBAR – you can make this example orders of magnitude worse by putting the row processing on the other end of a network connection. I’ve seen apps where millions of rows are processed this way introducing millions of network round trips with network latency accumulating on every single one. Its a performance nightmare and usually the same work can be done with a single round trip by using properly written DML.

  • Grant Fritchey

    RE: Or
    Hi Bryant,

    Excellent point. It’s pretty cool how well you can improve performance by converting an OR into a UNION. I wish I’d added it in now. Thanks for the feedback.

  • Grant Fritchey

    RE: Network REBAR
    Oh yeah. I’ve seen that too. Scary stuff. Thanks for the feedback.

  • b83thrapp

    CONTEXT_INFO
    To simplify my maintenance when code changes are required I have reverted to using views in concert with work tables (not temporary) and CONTEXT_INFO. Standard edition of SQL Server (2008 from 2005).

    I populate the work tables with parameter values supplied to the stored procedure, then join the work tables with CONTEXT_INFO and finally the desired tables. The views are then able to work with a minimal number of rows without slowing down to much.

    This has simplified my coding significantly enough that I’m not concerned with the decrease in performance. Some of these views are nested 4 or more deep.

    I have not been able to find much about CONTEXT_INFO and performance, so I’m not sure exactly how much of a hit I have introduced. So far it has worked well from my perspective, but I’m always looking out for an alternative.

  • Anilkumar P

    Great article
    Till now what ever work i did, i never thought of performance, but from now i will consider doing it.

  • donjones3412

    Mssing where a value is derived from a funcation and then table is hit.
    Mssing where a value is derived from a funcation and then table is hit in the where clause. I have a database with a TimeStamp(int), and need to take real datetime data type and convert in function to TimeStamp to search on.

    This Works fine, index used for binary lookup:
    select <col> from <t> where <int> = 11111

    This Does not work, index is scaned:
    select <col> from <table> where <int> = funcation(‘<value>’)