Grant Fritchey’s SQL Server Howlers

We decided to ask various well-known SQL Server people to write about their favourite SQL Server Howlers. These are the common misunderstandings about how SQL Server works that end in tears, and plaintive forum questions. Grant Fritchey opens the series with some of his favourite howlers.

SQL Server is a pretty amazing piece of software, almost any way you look at it. Further, it comes with an amazing set of documentation in the Books Online (BOL). I’m going to say the following, despite the fact that I publish books and articles for a living; you really don’t have to have any other documentation to run SQL Server in most instances and in most environments, reasonably well. Yeah, there are edge cases and extremes where the BOL can be inadequate, but by & large, they really do a decent job of explaining things. And then there are the howlers. These are the functions that… well, really they’re just not quite documented correctly or thoroughly. How do I know this? Because questions keep coming up in various forums and out on Twitter, the same questions on the same topics.

I ran DBCC CHECKDB and now my database is missing a lot of data. What do I do now?

This will be common amongst most of the howlers, but the truth is, you never get the complete story in the first question. This person did a consistency check of their database, and nothing else, and the database is now offline? Possible, but highly unlikely. No, with investigation, read, additional questions, it’s determined that they ran CHECKDB twice. Once, which discovered errors, and the second time, to repair the errors found. Wait, did you say “repair.” “Yeah, we ran CHECKDB with REPAIR_ALLOW_DATA_LOSS because the business said they didn’t mind if they lost a row or two of the database.”

OK, I know that command actually says ALLOW_DATA_LOSS, and I know that the BOL says “These repairs can cause some data loss.” But you know what? It’s that word “some” that’s killing people out there. They equate it with “a few” or “very little” or “go ahead, what the heck could go wrong, you’re only going to lose SOME data.” Oh, right.

When you tell DBCC to perform the repairs and let it allow for data loss, it’s going to link back together pages any way it can. Minimum, you’re likely to lose a page, which is 8k worth of data. Do the math on your own tables to figure out exactly how many rows that represents, but it’s certainly more than “a few.” But, that’s the best case scenario. Worst case is that it has trouble linking together the pages on the table at all and you lose the whole thing.

What do you do instead? First, you identify the object that has the corrupted page. If it’s a non-clustered index, you can just drop it & recreate it (or run one of the REPAIR options other than allow data loss, they do the same thing). If it’s some other allocation errors, there are other approaches, outlined nicely here in this article by Gail Shaw. If it’s a heap table or a clustered index, then, yes, you’ve lost data. Time to go to your backup. What’s that? You don’t have good backups, or even worse, the backups have never been tested in any way and the corruption exists there as well… That’s a different problem entirely.

For even more advice on how to fix corruption issues, I usually go to the man who wrote DBCC, Paul Randal. This is a good starting point.

There are warnings in the BOL “Use the REPAIR options only as a last resort.” But nothing else. If you don’t know what you’re doing and part of your database is unavailable due to a consistency error, you’ve gone to the one place you know to get help and it offers you these nice little statements that say REPAIR. Well, I’m at my wit’s end and I want to fix this, so this is my last resort.

The BOL needs to tell the truth. “These repairs can go through your database like the Mongol horde through a small village.” Or “These repairs can cause some data loss where higher degrees of “some” is equal to the Greek national debt.” Then, maybe, people would be a little more leery of immediately running these statements.

I’ve implemented everything the Database Tuning Advisor told me to and the server is still slow. How can I speed this up?

The Database Tuning Advisor (DTA) is an attempt from Microsoft to put in place a mechanism to help people who just don’t have the time to learn all the ins & outs of performance tuning. Accidental DBAs all over the planet need a way to quickly tune their databases, but they don’t have all the knowledge they need. Here’s where the DTA enters the picture. There are just a few problems with the DTA.

First, the degree to which the DTA can help is completely limited by the quality of the input. People will frequently take a single query and run it through the DTA assuming that the values and parameters in that query are representative of their system as a whole, or just because that’s the one that seems to be running slow. Or, they’ll take a small set of trace data generated through a server-side trace (or through the Profiler GUI, more on that later), again, assuming that this is representative of their system. The problem is, you need to feed the DTA a lot of very good data in order for it’s recommendations to be worth much.

Second, people will connect the DTA to a development database instead of to one that closely mirrors their production environment. So there will be a lot less data with radically different statistics which causes the DTA to make choices based on bad information. If you want the DTA to figure out why a query or queries are having problems, you need to close match the environment in order to supply it with the best information you have.

Finally, people assume that the DTA is right. It isn’t always. As a matter of fact, it’s frequently wrong. You cannot simply put faith in the fact that the DTA is going to come up with a set of recommendations that are 100% accurate. You must take everything, everything, it recommends and test it. Validate those recommendations against your own systems in order to be sure they’re accurate.

The problem here is, if you open a support call to Microsoft, they will frequently require to you to do two things, run the query, or queries, through the DTA, and implement everything it says before they will begin troubleshooting your problem. That’s just flat out wrong of Microsoft Support.

Alternatives? First and foremost, test everything it suggests to you. Second, learn about query tuning and fix the problem queries yourself, or hire someone who does that.

What does the BOL say? Anything about testing? Not a word. This is as close as I can find to a caution, “Your specific environment and implementation of SQL Server determine the exact issues you must consider when using Database Engine Tuning Advisor.” And that’s it. Nothing like, “Adding multiple manually generated sets of statistics onto one table, maybe, needs some testing to validate that it’ll actually do anything to help your performance” or “Just because we suggested that you add this index with four, rather large INCLUDE columns to help your SELECT query, doesn’t mean we knew about your INSERT query that’s going to be negatively impacted.”

Ever since we implemented the NOLOCK hint on all our queries, things are faster, but some of the users are complaining about the quality of the data. Why is that?

In certain areas, on certain discussion forums, on certain blogs, the NOLOCK hint has become equivalent to a RUN_FASTER hint. It just isn’t true. The issue with NOLOCK is that it allows dirty reads. The example that comes to everyone’s mind is, “I’ve started a transaction to change the value on this column from ‘dog’ to ‘cat’ and because my transaction hasn’t committed, people can see the value ‘dog’ even though I’m changing it to ‘cat’ and I don’t care.” And you know what, if that’s all there was to it, you’d be right. You wouldn’t need to care. But, as I’m sure you’ve already guessed if you’ve read this far, that’s not all there is to it.

The reason NOLOCK appears to make queries run faster is because it won’t take out shared lock hints, which means when modifications to a page are occurring, your query just keeps going, reading what it can, avoiding contention, reducing memory consumption just a tiny bit, and running faster. All good right? Wrong. Remember, you’re reading from a page, not from a row. Ever heard of page splits? It’s when data that fit on a single page is now moved onto two pages because otherwise it would have over-filled the page, some of the rows are rearranged, etc. Now, think about the fact that we’re not putting a shared lock on that page. Now think about the fact that you’re trying to read rows from a page that suddenly splits, you don’t have locks on it, and the data disappears. Now your query returns fewer rows than it ought to. Or, you’re reading from multiple pages, as page splits occur and suddenly, you’ve read the same rows more than once (again, you’re not locking them, so if they go out ahead of you, you’ll never know) and now you have duplicate data. I don’t know about your business, but most businesses are going to get upset when every so often, completely randomly, they lose rows or duplicate rows. What’s that? You don’t care? Cool, then no complaining when it’s your paycheck that misses payroll because it was lost because of your NOLOCK hint. Oh, and one more, if a row gets deleted just as you’re reading it you get an error. So a benign query that does nothing wrong, other than having NOLOCK, just dies for no good reason.

The BOL get this one exactly right:

Allowing dirty reads can cause higher concurrency, but at the cost of reading data modifications that then are rolled back by other transactions. This may generate errors for your transaction, present users with data that was never committed, or cause users to see records twice (or not at all).

But for some reason, people just don’t listen or don’t read. Maybe a flashing neon sign about records duplicating or being missing is required.

What do you do instead? First, there are edge cases where NOLOCK works. But, those exceptional situations should be just that, exceptional. I’ve seen and heard about places where NOLOCK is put on every query, even people trying desperately to put it on INSERT/UPDATE/DELETE queries (guys, you can’t modify data without locking, sorry). Don’t use it everywhere. You’re just begging for problems. Most of your data doesn’t update? Great, move it to a different filegroup and make it read only. Then you won’t get any locking either. Can’t do that but still dealing with contention? Then you ought to look into READ_COMMITTED_SNAPSHOT for your isolation level and take advantage of what versioned rows can do for you.

I know that some of you just won’t believe that your magic “run faster” switch is dangerous, so here’s some further reading on the topic.

What are the limits on the number of indexes you can put on a table? I’ve been implementing all the indexes suggested in the Missing Index DMOs and I’m concerned I might hit that limit.

Really? Just out of curiosity, pick any one of the missing indexes indicated in the DMOs and tell me which query it’s for? Take your time. I’ll wait… That’s right. There is absolutely no way to know which query you just “tuned” by adding an index.

Let’s talk about statistics for a moment. It’s the statistics on your tables and indexes that drive the query optimizer into making certain choices. Statistics are just a sampling of the data in your database. They can be created by doing a full scan of the data which is then distributed into 200 little buckets that describe your multi-million row table, or they can be a more or less random sampling of that same table split into 200 little buckets. In other words, the statistics can be estimates. Further, statistics are only updated when a sufficient quantity of data changes.

What do you care about statistics, you just want to create some indexes, right? Here’s the deal, because the statistics can be out of date or can be inaccurate estimates, the resulting index suggestions in the missing index tables can be flat out wrong.

Further, as I said at the beginning, show me which queries you’re building indexes for. It matters. The missing index information could be for a query that gets run once a month, takes about 90 seconds to complete and doesn’t bother anyone, but you adding the suggested index could negatively impact other queries that are called all day long.

So what am I saying? I’m saying that simply slapping an index on the table doesn’t make sense. You need to know what you are doing to the system and why you are doing it. Rather than simply take a chance, you should investigate and test the indexes suggested. Testing is fairly simple, add a single index at a time and, if possible, in an offline system, separate from your production environment, test the impact of that query. Investigation is slightly more work.

First, you need to check the suggested index to see if it duplicates the functionality of existing indexes or if an existing index can be modified. I’ve seen the missing indexes suggest an index on ColA with an INCLUDE on ColB. This is immediately followed by a suggestion for an index on ColA with an INCLUDE on ColB and ColC. The fact is, we don’t need both those indexes. Assuming we need them at all (see testing above) the second suggestion will satisfy the first. Second, you should look into use XQuery to use the missing index information in execution plans to connect a missing index to a query. I have a blog post here that creates the foundation for this type of querying. You can also read some of the work by Jonathan Kehayias.

We keep rebuilding our indexes and updating our statistics, but the queries are still running slow. Can someone help?

This one is fun, you can read and reread that sentence and still not figure out what’s going on for a while, or, it’ll jump right out at you.

When you rebuild an index, effectively it is dropped and recreated (that’s not entirely true, otherwise, you’d just drop and recreate the index, not rebuild it, but go with me). In doing so, all the data in the table directly related to that index is examined and stored as part of the index. Along with it, a brand new set of statistics is created, based on this full examination of the data, a full scan.

At this point, what you have is a new index, completely defragmented and a thoroughly up to date set of statistics… then, you run sp_updatestats… Which will go through the whole table, determining if any data has changed and the statistics need to be updated. “Any data” reads, one row. So if even a single row has been modified since the last time the statistics were updated (which you just finished doing) you’re going to get new statistics built. Ah, but how are they built? They’re built by sampling the data. That’s what sp_updatestats does by default.

In short, you built a brand new, fully scanned, set of statistics for your indexes, which should help problematic execution plans, and then you’re replacing some or all of them with a sampled set of statistics which, because they’re less accurate, could hurt performance.

The problem is, the Maintenance Plan Wizard doesn’t give you any guidance on this at all. Each of these steps is well described, but there is no indications regarding ordering the processes matters. They absolutely have to have some kind of warning like the ones about mixing drugs in order to prevent people making this mistake.

You need to completely separate these two operations. If you’re rebuilding indexes, don’t update those statistics because they’re up to date. If you’re using Maintenance Plans, instead of using the wizard you might need to drill down and build the plans yourself in order to make the manipulation of statistics dependent on the previous step. If you’re interested in an even better way to do this, I’d suggest reading up on Michelle Ufford’s database maintenance scripts.

Our server was running slow, so I started watching the queries with Profiler and now I’m starting to get errors, why would that be?

So you’ve pointed the Profiler GUI at your production environment, which was already under stress, and now things are worse. This is why language is so important.

All over the place, on blogs, in books, in the BOL, the suggestion is, if you need to monitor query performance you should use Profiler to gather that information. And that’s right, as far as it goes. But, there is a problem. The Profiler GUI puts a different kind of lock on the system to read the information out than you’d get if you set up a server-side trace (scripted trace, doing the same thing as the gui, but with TSQL). Microsoft has it well documented here. But they don’t have it documented at all in BOL. In fact, here’s the opening description for the Profiler documentation:

Microsoft SQL Server Profiler is a graphical user interface to SQL Trace for monitoring an instance of the Database Engine or Analysis Services. You can capture and save data about each event to a file or table to analyze later. For example, you can monitor a production environment to see which stored procedures are affecting performance by executing too slowly.

For example you can connect this to production? No, for example, you can’t. For example, you really should use the GUI for browsing results and for combining Performance Monitor data with the trace output data, but both from files. You shouldn’t be pointing this at production at all.


So that’s my collection of howlers. I again want to emphasize that I like the Books Online and use them constantly, all day long. They are an excellent resource of material for what to do in SQL Server. There are just some gaps there that need to be filled in to avoid these pernicious errors.

Related Content:

Tags: , , , , , , ,


  • Rate
    [Total: 50    Average: 4.4/5]
  • Deliverance

    Free Sharing
    This “free sharing” of information seems too good to be true.


    Fantastic article, but nolock has its place
    As always it was a fantastic article.

    But I think your warnings about nolock might be tempered a bit. It is true that nolock risks dirty reads, and that phrase encompasses more severe problems than many people realize. I also know that nolock is overrecommended and often recommended without warnins about the problems it can cause.

    But it does have its place. Over the years it has saved me hours when a table was going through a massive update or other operation that was going to keep it locked up for extended periods of time when I needed to pull out test data or when I for various reasons needed to test queries against the live server (never a good situations to be in, but it comes up at times, especially in small companies that might not have as robust a test environment as they would like).

    I have even used it to pull real reports for management when the table was locked and they were in a hurry, but only when it was aggregates where a rough order of magnitude was enough and even losing or adding dozens of rows would not have made a difference for their purposes.

    I agree it must be used with extreme caution and should only be used with a full understanding of the risks. But it has still been extremely helpful to me in a number of cases, especially when working with companies where the test and reporting infrastructure was not as robust as we would have liked.

  • Grant Fritchey

    Excellent comments. Thank you.

    I agree. It has it’s place. It can be useful and functional. But, I think the dangers of using it are very high and grossly understated. That’s why so many people use it innapropriately.

    I’m pretty much anti-hint anyway. I think query hints are frequently a sign of a failure to tune in other ways. That said, when you need them, you need them. This is the same.

    Do I over-state my case? Probably. But since it’s been previously under-stated…

  • bernie black

    Good Stuff.
    I wasn’t even aware of page splitting and NOLOCK. And now you’ve put a bit of religious guilt about using profiler on production. Great article !

  • Elizabeth

    awesome explanations
    Thanks for writing this. In some of your sections, I knew the “right” answer, but I didn’t have a very good grasp of the “why” behind the right answer. You did a great job of explaining concepts quickly. Thanks!

  • GrumpyOldDBA

    Real Time Diagnostics
    Whilst I sort of agree with the comments about Profiler “after the event” diagnostics are not half as effective as real time. To be honest if your system cannot support some levels of real time disgnostics then most likely the server is under resourced.
    However what you don’t mention is that usually on a busy system profiler ( GUI ) can’t actually keep up, it certainly can’t on systems I support so you’re forced to use filters which should reduce the overhead. It’s all well and good talking about how you should do things is maybe a class room guide way but the realisty is often different if you have a badly performing critical system and you have a bunch of directors/managers ranting and raving around you. ( I can’t talk specifics for obvious reasons! ) I admit that I generally use profiler only for specific tuning out of the production environment – but it does have its place that’s all I’m saying.

  • GrumpyOldDBA

    I hadn’t heard the Checkdb one before, I’ll store that one away for future use! Thanks, good article, sorry should have said in previous comment. apologies for spelling mistakes.

  • Frank Hell



  • Grant Fritchey

    Ooh, typo’s. I’ll ignore everything you say now…kidding.

    You’re right on both points. Filtering the criteria in Trace is going to reduce the load and having your cube filled with management means you start doing stuff you might not normally do.

    But, setting up scripting on trace or, better still, using extended events, ahead of time should allow you to simply avoid the issues that might arise.

    Again, I do agree, but it’s always best to understand the dangers of the choices you make, and Profiler’s dangers aren’t terribly well documented, so I like to caution people.

  • Grant Fritchey

    Frank Hell
    Yes, READ UNCOMMITTED is just as error prone as NOLOCK. They’re the same thing really. READ UNCOMMITTED is just a way to set that behavior for the connection rather than for each individual table.

  • ktrock

    Great Article
    Must admit I’ve been updating statistics in some cases on the same tables that just got reindexed in a maintenance routine. Ok, it stops here. I’m going to risk having out of date statistics for small reference tables that may not get reindexed rather than have only sampled stats on large, data tables that have just had their indicies rebuilt.


  • ktrock

    Great Article
    Must admit I’ve been updating statistics in some cases on the same tables that just got reindexed in a maintenance routine. Ok, it stops here. I’m going to risk having out of date statistics for small reference tables that may not get reindexed rather than have only sampled stats on large, data tables that have just had their indicies rebuilt.


  • Ray Herring

    Update Stats
    I think it was Kimberly but it may have been Paul Randall who presented a session about index maintenance. The recommendations included updating Statistics PRIOR to rebuilding/reorganizing an index. As I recall, one of the reasons was that SQL uses statistics to determine how to parallelize the index operation and if stats are out of date the plan can be very inefficient. I may have the detail wrong but that’s what I remember. I have found it a useful habit.
    As for No Lock (argghh) I am dealing with several different vendors of “top line” business software (HR, Payroll, AP, AR, etc) who insist it is not just advisable but necessary. One of them has No Lock on 99+% of their production queries and 100% of their reporting queries.
    Makes you wonder why we have a support call every 2 – 3 weeks to ‘fix’ the data. 🙂

  • ianstirk

    You can link the missing indexes to the underlying queries…
    Hi Grant,

    It is possible to link the missing indexes with queries…

    You can search the cached plans for the text <MissingIndexes>, together with the names of the missing index columns (see this article for how to search the cached plans: The cached plans are reflections of the underlying SQL source code.


    Author of: SQL Server DMVs in Action

  • ianstirk

    You can link the missing indexes to the underlying queries…
    Hi Grant,

    It is possible to link the missing indexes with queries…

    You can search the cached plans for the text <MissingIndexes>, together with the names of the missing index columns (see this article for how to search the cached plans: The cached plans are reflections of the underlying SQL source code.


    Author of: SQL Server DMVs in Action

  • Grant Fritchey

    Linking queries to missing indexes
    Only way to do this that I know of is to use the cached plans.

  • Grant Fritchey

    Update Stats
    Yeah, I think paying the price of updating statistics twice, once when you update them and the second time if the index is rebuilt, is less costly than dumping the nice clean stats provided by the index rebuild.

    Check Michelle’s script out. It handles these problems nicely.

  • steveh99999

    sp_updatestats threshold for change


    I’m interested in your comment ‘sp_updatestats… Which will go through the whole table, determining if any data has changed and the statistics need to be updated. “Any data” reads, one row.
    Is this documented anywhere officially by MS that the threshold is one row ?
    I should say that I believe you – but this does surprise me that threshold is so low…

  • Grant.Fritchey

    re: sp_updatestats
    I’m not sure where it is in the documentation, but if you look at the code for sp_updatestats, it looks for a rowmodctr <> 0. So, you only need a single row modified to trigger the event.