15 May 2013

The DBA Detective: The Case of the Missing Index

When problems arise in SQL Server, we're faced with a server full of suspects, including disk I/O, memory, CPU, incorrect or missing indexes, badly written T-SQL code, out of date statistics, and full disk drives. All of these have motive and opportunity to murder the performance of our databases, and it's the DBA's job to collar the culprit, quickly without relying on luck or heroics.

The Interminable Hourglass

‘Joe Dee Beeay?’

I looked up. The cubicle smelled of old coffee, dust from old technical manuals, the tang of heated electronics and fear.

‘Yeah?’ Something in the boss’s voice told me that he hadn’t come to tell me I was doing good work.


‘The application’s had a wet job. It’s dead meat’

The application, THE application, you know the one. The beautiful one, the one everyone wants. She had the habit occasionally of running as if she had had her hair parted by lead piping. Some in the business say she’s brought troubles on herself, but it’s just the way she was built. Sure, she’s a little older now, and is showing the signs. She’s been beat down and roughed up more times than he cares to remember and it shows, but she was still The Application, and she needed me.

The Fat Man’s voice was calm, but he had sweat on his brow.

‘What’s the layout?’ I asked.

‘Dunno. Just sort it out, Joe, please.’

‘I wasn’t worried until I heard the word ‘please’. Maybe, too, it was that humorless smile he had before he stumped out of the room that made me wince. I know what he was thinking: ‘There are no bad DBAs. There are only some DBAs that aren’t as good as others.’

Now what?

I tapped the spacebar, lost in gloomy thought. I looked out the window. Suddenly the world seemed dark with more than the night.

Every possible kind of problem had happened to this application over the years. She’s been put back together so many times it’s surprising that she’s still so desired and attractive instead of looking like Frankenstein’s monster on the outside as well as the inside. Regardless, Joe Muggins had to identify the problem.

False Dawn


 Better go see Dawn in Accounts’ she knows when things aren’t right and likes to give bad news. Dawn was a dame with an ice-cold stare. From thirty feet away she looked like a lot of class. From ten feet away she looked like something made up to be seen from 30 feet away. She pouted when I told her I was on a mission to fix The Application, sipped from her plastic coffee-cup, and then sang like a canary. Well, like a canary who wanted to upset her audience. She told me that the issue was happening currently, but had also happened last night and, evidently earlier in the week.

‘Thanks’, I muttered. ‘You’re welcome’, she replied, meaning I was as welcome as an over-packed bin-liner.

I trudged moodily back to my pigpen in IT. No one had thought it was important enough to inform the DBA team. If only we’d put some monitoring software in place to identify issues when they were happening, instead of relying on other people, with other concerns and problems, to get the information to the team.

I reached for my faithful SSMS. “Hmm. Signs of life in the old crone?” Waiting for the connection to complete seemed as if I was  waiting for that next blow to land in a beating. Success! the server is online and accepting connections; That’s a positive sign. Without that reassuring open query-window, I’d have had to resort to the lo-down, dirty trick of using the Dedicated Administrator Connection to elbow my way onto the server. Also, if the server wasn’t accepting connections or, worse yet, was offline, I’d have had to trust that I could revive the cold corpse by restoring from backup. There were backups running, but, hell, when was the last time we’d verified them?

I shook my head, trying to shake loose that nagging doubt about those unverified backups. Something to check once this crisis is past.

Once I’d connected to the server, a wild world of lash-ups and software harvester-tape opened up me. I shuddered. A DBA’s work is sometimes unsavory. Where first? Travel down to the grit and dirt of the tempdb? Now there’s a dive where some of the worst of the low-life lurk. Look for the rot within the glizt-and-glammer of the CPU? Walk the mean streets of the I/O processes? Or just round up the usual suspects by taking a look at the currently running queries?

Let’s start with the usual suspects, I decided.

The Usual Suspects

Sure, I could look at wait statistics to understand specifically what is causing the server to slow down. It’s a great metric for understanding the system as a whole. I could maybe run a query against sys.dm_os_wait_stats and order by the number of currently waiting tasks, the cumulative wait time or the max wait time to see what, in general is causing the server to run slow. Yeah, that might flush the critters out.

I stared moodily at that empty query window. Nah. C’mon Joe, since we’re only getting calls from a single team on a specific application, the other approach, seeing what’s running on the server, might stir the pot more.

To see what’s currently running on the server, I could run a query against the Dynamic Management Object (DMO) sys.dm_exec_requests. I gave the server a burst of SQL through my trusty SSMS:

The server babbled out its secrets as though I’d promised it a nice waterboarding holiday. I saw immediately a full listing of all sessions running on the system. I had useful information at my fingertips, and then details that may, or may not, prove useful later. To narrow down to some of the more immediately useful data, I modified the query like this:

One session stood out in the results, about as inconspicuous as a tarantula on a slice of angel food cake. It was running long and using resources like a fat man pursuing a lead bird.

What I needed now was the text of this query this session was running, and its execution plan. I start to write another query, thinking there must be an easier way to pull all this information together in a hurry.

I sat back in my chair with a creak and a whistle. I whistled, and the chair creaked, but it might have been the other way around.

No query should be running like this, I mused, and especially not this one. I tuned it recently, I remembered, and added a new index in order to make it more efficient.

Looking at the execution plan, I noticed a scan where an Index Seek used to be. Could something have happened to the index or might the statistics simply be out of date? Goddam it, was there a rogue table? I soon had my SSMS looking straight at that table.

Well, if I hadn’t homed straight in on a pretty sinister problem. That index I’d created earlier had vanished, done a runner, as if I’d never done all that tuning work. I don’t like seeing things like that. It makes the hair on the nape of my neck prickle. What else in the database had been given the big sleep? Things like that don’t vanish by themselves.

I swore. I needed to get that index back, and find out what else was missing. Compare with what’s in source control? I’m a production guy; I don’t put stuff in source control. Sure, it gets there eventually, but the devs do it on wet Fridays. There was all sort of healing stuff we’d done on that database that hadn’t been fed back into source control. Did I save my build script? I didn’t feel like answering that question, but hurled my damp plastic coffee-cup over the top of the pigpen. It cheered me a bit.

I smiled grimly. Wouldn’t it be a fine time to find that the backup that run just after my database-tuning session failed or was corrupt? I’d be able to find out if only I had a place to restore the database. I’d always meant to get to using compressed backups.

I didn’t want to eat up more disk space on the production server, with the restore; I’d have to do it somewhere else and, luckily, the QA team wasn’t around to object. This is a production issue, and even if they were screaming from their pigpens, I’d do it. When they’re slapped, they’ll take it and like it. It is not a fragrant world.

If only there was a way to directly access the backup files to either pull the code out or simply connect them to the server without having to use so much disk space.

I went to the first backup, last night. After setting up the restore process, I had a moment to reflect on the cruelty of the human condition while the restore operation ran. A rush of errors on the screen soon told me I was about to experience this first-hand. Last night’s backup really was corrupt.

I sat down. It was a good start, but it didn’t go far enough. I ought to have turned out the light and hidden under the desk

The Screen Painted Red

How many backups do they keep locally for this server? Ah, it goes back three days. No need, at least, to try to contact their offsite storage company to retrieve a backup. Crossing my fingers, I tried restoring the older database. It worked. I browsed down to the index and exported it to a query window.

Quick as a rabbit, I changed the connection to the production server (change requests are for those squares who shall inherit the earth eventually) and fired off the script to recreate the necessary index. I waited, and waited. Suddenly, I felt like a fugitive from the laughing house; the screen is painted red like the lips of a cheap woman, as error messages pop up. There’s no space on the hard drive for the index.

I knew that whatever malicious spirits inhabit our datacenter had conspired to ensure that were running low on storage but, heck, I didn’t know we were that low. Then it all clicks. Wasn’t there a message from earlier this week? One of the junior DBAs, Timmy, had dealt with an issue where they had run out of space on a drive overnight. He fixed it somehow, but I hadn’t been watching that closely, he had problems of his own.

The Smoking Gun

Digging back through the email, there it is, the smoking gun, the drive had run out of room and the junior DBA had decided that the new index wasn’t needed, had dropped it like a murder weapon over the side of a bridge. There is no trap for a DBA so deadly as the trap set by an idiot colleague.


I contacted the SAN admins and somehow managed to persuade them to get a new LUN allocated to the server. There I added a filegroup for the database and was able to recreate that index with room to spare. Now to have a conversation with that junior DBA that ought to have stuck at least four inches out of his back.

The End?

The fat man seemed pleased as punch. A punch in the mouth. He decided against it. Instead he gave me that smile again.


“OK Joe. So, plenty of heroics to retrieve this situation, but what caused it?”

Hmm. That’s a hard one. “The application’s no good”

“My life’s no good, but I’m stuck with it.” I didn’t like the look he gave me. “Instead of being told by the business people that there was a problem, we should have found out ourselves before them, and found out in more detail. There might have been several applications complaining about a slow server, meaning bottlenecks in the hardware or misconfiguration of the server. You got lucky. Maybe some up-front planning, with the right software tools, might have entailed less heroics. I need fewer heroes and less luck”

I shrugged. I’m like any good detective; I gotta go where the clues lead me. A true DBA Detective relies on his trusty SSMS and shoots from the hip. Or, maybe the fat guy had a point.

When problems arise in SQL Server, we’re faced with a server full of suspects, including disk I/O, memory, CPU, incorrect or missing indexes, badly written T-SQL code, out of date statistics, and full disk drives. All of these have motive and opportunity to murder the performance of our databases, and it’s our job to collar the culprit, quickly.

It requires a methodical, levelheaded approach, a refusal to jump to conclusions until all of the facts and clues are exposed, and it requires the right set of tools to get the job done, and  increase my luck.

  First published as lesson 1 of the DBA Team series on the Red-Gate.com site. See lesson 4, ‘Disturbing Developments’, for the continuing saga of Joe Dee Beeay

For more articles like this, sign up to the fortnightly Simple-Talk newsletter.

Tags: , , , ,


  • Rate
    [Total: 19    Average: 4.6/5]

Grant Fritchey is a SQL Server MVP with over 20 years’ experience in IT including time spent in support and development. Grant has worked with SQL Server since version 6.0 back in 1995. He has developed in VB, VB.Net, C# and Java. Grant has authored books for Apress and Simple-Talk, and joined Red Gate as a Product Evangelist in January 2011. Find Grant on Twitter @GFritchey or on his blog.

View all articles by Grant Fritchey

  • mbourgon

    Which product for mounting backups?
    Grant, just curious, but what product were you talking about when you mentioned "directly access the backup files"? I’d have said Red-Gate’s SQL Virtual Restore, except it’s been retired. I know Idera’s Virtual DB will do it, as will the 7.0.4 version of Litespeed. Is there you you like over others? Thanks.

  • Phil Factor

    Re: Which product…
    My reading of the story is that Joe was trying to get to the source in order to get that index back, and find out what else was missing. He then admits that the previous version wasn’t in source control. Had he been ‘tooled up’, he could have directly accessed the backup files quickly with SQL Compare and got the entire script out without doing any kind of a restore.

  • Grant Fritchey

    Access to backup files
    Well, when I wrote this, we still had Virtual Restore. I was talking about that, yes, but also what Phil says. You can compare directly to the backup file from SQL Compare.

    • mbourgon

      Re: Access to backup files
      Son of a… Never realized thought about comparing backups to prod. Have to go see if that’s Schema Compare and Data Compare. Cool, thanks, gents!

  • Phil Factor

    Re: Comparing backup to prod…
    Well, I’ve never been allowed to compare anything to prod, but you can compare two backups directly, or compare backup to scripts folder (source control). It is fine running in command-line mode to do this and so you can catch all sorts of changes
    See Source Control for Bandits and Anarchists’ by Grant Fritchey, the man himself. https://www.simple-talk.com/sql/database-administration/auditing-ddl-changes-in-sql-server-databases/

  • mbourgon

    Ha! Indeed. (Plus, Event Notifications for the win)
    Phil – thanks for that link! Very nice, Grant. We’re doing that right now, though we’re finding it takes several hours to do all the servers we care about. However, hadn’t tried using backups, and maybe the way you’re invoking runs it faster than the way we are.

    But, indeed, that’s a great way to solve that problem. We’re using Event Notifications (http://thebakingdba.blogspot.com/2012/10/master-of-all-i-survey-using-event.html) on 50 servers to get real-time notification of changes. If my session gets accepted for PASS, I’ll use that as an excuse to finish up my check-into-TFS-in-real-time functionality. The eventual plan is weekly full checkins (in case something gets missed), and near-live checkins with info about who did it, emails, etc.

  • Grant Fritchey

    If you do present that at PASS, scream at me to go and see it. I’m always trying to learn new and better ways to automate deployment and development.

  • RWims

    What troubleshooting is
    What a great way to present what could be the dry topic of performance troubleshooting! Raymond Chandler is smiling upon you from the Great Beyond.

  • James

    Great article.
    Great article! Very well written and extremely entertaining.

    Keep up the good work.

  • Rich

    Great article Grant, fun to read.

    Not sure Brent Ozar will appreciate being called an ox, though! (The link at the bottom of your article to Six Scary Surprises…)
    (Ed: Now fixed. despite having a certain poetic resonance. )