Click here to monitor SSC
  • Av rating:
  • Total votes: 18
  • Total comments: 10
Grant Fritchey

The DBA Detective: The Case of the Missing Index

15 May 2013

When problems arise in SQL Server, we're faced with a server full of suspects, including disk I/O, memory, CPU, incorrect or missing indexes, badly written T-SQL code, out of date statistics, and full disk drives. All of these have motive and opportunity to murder the performance of our databases, and it's the DBA's job to collar the culprit, quickly without relying on luck or heroics.

The Interminable Hourglass

'Joe Dee Beeay?'

I looked up. The cubicle smelled of old coffee, dust from old technical manuals, the tang of heated electronics and fear.

'Yeah?' Something in the boss's voice told me that he hadn't come to tell me I was doing good work.

'The application's had a wet job. It's dead meat'

The application, THE application, you know the one. The beautiful one, the one everyone wants. She had the habit occasionally of running as if she had had her hair parted by lead piping. Some in the business say she's brought troubles on herself, but it's just the way she was built. Sure, she's a little older now, and is showing the signs. She's been beat down and roughed up more times than he cares to remember and it shows, but she was still The Application, and she needed me.

The Fat Man's voice was calm, but he had sweat on his brow.

'What's the layout?' I asked.

'Dunno. Just sort it out, Joe, please.'

'I wasn't worried until I heard the word 'please'. Maybe, too, it was that humorless smile he had before he stumped out of the room that made me wince. I know what he was thinking: 'There are no bad DBAs. There are only some DBAs that aren't as good as others.'

Now what?

I tapped the spacebar, lost in gloomy thought. I looked out the window. Suddenly the world seemed dark with more than the night.

Every possible kind of problem had happened to this application over the years. She's been put back together so many times it's surprising that she's still so desired and attractive instead of looking like Frankenstein's monster on the outside as well as the inside. Regardless, Joe Muggins had to identify the problem.

False Dawn

 Better go see Dawn in Accounts' she knows when things aren't right and likes to give bad news. Dawn was a dame with an ice-cold stare. From thirty feet away she looked like a lot of class. From ten feet away she looked like something made up to be seen from 30 feet away. She pouted when I told her I was on a mission to fix The Application, sipped from her plastic coffee-cup, and then sang like a canary. Well, like a canary who wanted to upset her audience. She told me that the issue was happening currently, but had also happened last night and, evidently earlier in the week.

'Thanks', I muttered. 'You're welcome', she replied, meaning I was as welcome as an over-packed bin-liner.

I trudged moodily back to my pigpen in IT. No one had thought it was important enough to inform the DBA team. If only we'd put some monitoring software in place to identify issues when they were happening, instead of relying on other people, with other concerns and problems, to get the information to the team.

I reached for my faithful SSMS. "Hmm. Signs of life in the old crone?" Waiting for the connection to complete seemed as if I was  waiting for that next blow to land in a beating. Success! the server is online and accepting connections; That's a positive sign. Without that reassuring open query-window, I'd have had to resort to the lo-down, dirty trick of using the Dedicated Administrator Connection to elbow my way onto the server. Also, if the server wasn't accepting connections or, worse yet, was offline, I'd have had to trust that I could revive the cold corpse by restoring from backup. There were backups running, but, hell, when was the last time we'd verified them?

I shook my head, trying to shake loose that nagging doubt about those unverified backups. Something to check once this crisis is past.

Once I'd connected to the server, a wild world of lash-ups and software harvester-tape opened up me. I shuddered. A DBA's work is sometimes unsavory. Where first? Travel down to the grit and dirt of the tempdb? Now there's a dive where some of the worst of the low-life lurk. Look for the rot within the glizt-and-glammer of the CPU? Walk the mean streets of the I/O processes? Or just round up the usual suspects by taking a look at the currently running queries?

Let's start with the usual suspects, I decided.

The Usual Suspects

Sure, I could look at wait statistics to understand specifically what is causing the server to slow down. It's a great metric for understanding the system as a whole. I could maybe run a query against sys.dm_os_wait_stats and order by the number of currently waiting tasks, the cumulative wait time or the max wait time to see what, in general is causing the server to run slow. Yeah, that might flush the critters out.

I stared moodily at that empty query window. Nah. C'mon Joe, since we're only getting calls from a single team on a specific application, the other approach, seeing what's running on the server, might stir the pot more.

To see what's currently running on the server, I could run a query against the Dynamic Management Object (DMO) sys.dm_exec_requests. I gave the server a burst of SQL through my trusty SSMS:

SELECT  *

FROM    sys.dm_exec_requests;

The server babbled out its secrets as though I'd promised it a nice waterboarding holiday. I saw immediately a full listing of all sessions running on the system. I had useful information at my fingertips, and then details that may, or may not, prove useful later. To narrow down to some of the more immediately useful data, I modified the query like this:

SELECT
 
  der.session_id , --internal identifier for the running session

    der.status ,     --determines if the query is active or waiting

    der.start_time , --gives you an idea when the query started

    der.command ,      --the type of command involved

    der.database_id ,  --which database you're connected to

    der.user_id ,      --which login is running the command

    der.blocking_session_id , --session id of blocking session

    der.wait_type ,    -- what is the waiting session it waiting on

    der.wait_time ,    --how long has it been waiting

    der.last_wait_type , --what caused it to last wait

    der.cpu_time ,      --how much of the CPU has been used

    der.total_elapsed_time , --how long has the command been running

    der.reads ,        --has the command hit the disk for information

    der.writes ,       --how much information was written to the disk

    der.logical_reads  --how many reads came out of memory

FROM    sys.dm_exec_requests AS der;

One session stood out in the results, about as inconspicuous as a tarantula on a slice of angel food cake. It was running long and using resources like a fat man pursuing a lead bird.

What I needed now was the text of this query this session was running, and its execution plan. I start to write another query, thinking there must be an easier way to pull all this information together in a hurry.

SELECT  SUBSTRING(dest.text, ( der.statement_start_offset / 2 ) + 1,

                  ( der.statement_end_offset - der.statement_start_offset )

                  / 2 + 1) ,

        deqp.query_plan

FROM    sys.dm_exec_requests AS der

        CROSS APPLY sys.dm_exec_query_plan(der.plan_handle) AS deqp

        CROSS APPLY sys.dm_exec_sql_text(der.sql_handle) AS dest

WHERE   der.session_id = 442;

I sat back in my chair with a creak and a whistle. I whistled, and the chair creaked, but it might have been the other way around.

No query should be running like this, I mused, and especially not this one. I tuned it recently, I remembered, and added a new index in order to make it more efficient.

Looking at the execution plan, I noticed a scan where an Index Seek used to be. Could something have happened to the index or might the statistics simply be out of date? Goddam it, was there a rogue table? I soon had my SSMS looking straight at that table.

Well, if I hadn't homed straight in on a pretty sinister problem. That index I'd created earlier had vanished, done a runner, as if I'd never done all that tuning work. I don't like seeing things like that. It makes the hair on the nape of my neck prickle. What else in the database had been given the big sleep? Things like that don't vanish by themselves.

I swore. I needed to get that index back, and find out what else was missing. Compare with what's in source control? I'm a production guy; I don't put stuff in source control. Sure, it gets there eventually, but the devs do it on wet Fridays. There was all sort of healing stuff we'd done on that database that hadn't been fed back into source control. Did I save my build script? I didn't feel like answering that question, but hurled my damp plastic coffee-cup over the top of the pigpen. It cheered me a bit.

I smiled grimly. Wouldn't it be a fine time to find that the backup that run just after my database-tuning session failed or was corrupt? I'd be able to find out if only I had a place to restore the database. I'd always meant to get to using compressed backups.

I didn't want to eat up more disk space on the production server, with the restore; I'd have to do it somewhere else and, luckily, the QA team wasn't around to object. This is a production issue, and even if they were screaming from their pigpens, I'd do it. When they're slapped, they'll take it and like it. It is not a fragrant world.

If only there was a way to directly access the backup files to either pull the code out or simply connect them to the server without having to use so much disk space.

I went to the first backup, last night. After setting up the restore process, I had a moment to reflect on the cruelty of the human condition while the restore operation ran. A rush of errors on the screen soon told me I was about to experience this first-hand. Last night's backup really was corrupt.

I sat down. It was a good start, but it didn't go far enough. I ought to have turned out the light and hidden under the desk

The Screen Painted Red

How many backups do they keep locally for this server? Ah, it goes back three days. No need, at least, to try to contact their offsite storage company to retrieve a backup. Crossing my fingers, I tried restoring the older database. It worked. I browsed down to the index and exported it to a query window.

Quick as a rabbit, I changed the connection to the production server (change requests are for those squares who shall inherit the earth eventually) and fired off the script to recreate the necessary index. I waited, and waited. Suddenly, I felt like a fugitive from the laughing house; the screen is painted red like the lips of a cheap woman, as error messages pop up. There's no space on the hard drive for the index.

I knew that whatever malicious spirits inhabit our datacenter had conspired to ensure that were running low on storage but, heck, I didn't know we were that low. Then it all clicks. Wasn't there a message from earlier this week? One of the junior DBAs, Timmy, had dealt with an issue where they had run out of space on a drive overnight. He fixed it somehow, but I hadn't been watching that closely, he had problems of his own.

The Smoking Gun

Digging back through the email, there it is, the smoking gun, the drive had run out of room and the junior DBA had decided that the new index wasn't needed, had dropped it like a murder weapon over the side of a bridge. There is no trap for a DBA so deadly as the trap set by an idiot colleague.

I contacted the SAN admins and somehow managed to persuade them to get a new LUN allocated to the server. There I added a filegroup for the database and was able to recreate that index with room to spare. Now to have a conversation with that junior DBA that ought to have stuck at least four inches out of his back.

The End?

The fat man seemed pleased as punch. A punch in the mouth. He decided against it. Instead he gave me that smile again.

 

"OK Joe. So, plenty of heroics to retrieve this situation, but what caused it?"

Hmm. That's a hard one. "The application's no good"

"My life's no good, but I'm stuck with it." I didn't like the look he gave me. "Instead of being told by the business people that there was a problem, we should have found out ourselves before them, and found out in more detail. There might have been several applications complaining about a slow server, meaning bottlenecks in the hardware or misconfiguration of the server. You got lucky. Maybe some up-front planning, with the right software tools, might have entailed less heroics. I need fewer heroes and less luck"

I shrugged. I'm like any good detective; I gotta go where the clues lead me. A true DBA Detective relies on his trusty SSMS and shoots from the hip. Or, maybe the fat guy had a point.

When problems arise in SQL Server, we're faced with a server full of suspects, including disk I/O, memory, CPU, incorrect or missing indexes, badly written T-SQL code, out of date statistics, and full disk drives. All of these have motive and opportunity to murder the performance of our databases, and it's our job to collar the culprit, quickly.

It requires a methodical, levelheaded approach, a refusal to jump to conclusions until all of the facts and clues are exposed, and it requires the right set of tools to get the job done, and  increase my luck.

  First published as lesson 1 of the DBA Team series on the Red-Gate.com site. See lesson 4, 'Disturbing Developments', for the continuing saga of Joe Dee Beeay

Grant Fritchey

Author profile:

Grant Fritchey, SQL Server MVP, works for Red Gate Software as Product Evangelist. In his time as a DBA and developer, he has worked at three failed dot–coms, a major consulting company, a global bank and an international insurance & engineering company. Grant volunteers for the Professional Association of SQL Server Users (PASS). He is the author of the books SQL Server Execution Plans (Simple-Talk) and SQL Server 2008 Query Performance Tuning Distilled (Apress). He is one of the founding officers of the Southern New England SQL Server Users Group (SNESSUG) and it’s current president. He earned the nickname “The Scary DBA.” He even has an official name plate, and displays it proudly.

Search for other articles by Grant Fritchey

Rate this article:   Avg rating: from a total of 18 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Which product for mounting backups?
Posted by: mbourgon (view profile)
Posted on: Wednesday, May 15, 2013 at 9:36 PM
Message: Grant, just curious, but what product were you talking about when you mentioned "directly access the backup files"? I'd have said Red-Gate's SQL Virtual Restore, except it's been retired. I know Idera's Virtual DB will do it, as will the 7.0.4 version of Litespeed. Is there you you like over others? Thanks.

Subject: Re: Which product...
Posted by: Phil Factor (view profile)
Posted on: Thursday, May 16, 2013 at 2:35 AM
Message: My reading of the story is that Joe was trying to get to the source in order to get that index back, and find out what else was missing. He then admits that the previous version wasn't in source control. Had he been 'tooled up', he could have directly accessed the backup files quickly with SQL Compare and got the entire script out without doing any kind of a restore.

Subject: Access to backup files
Posted by: Grant Fritchey (view profile)
Posted on: Thursday, May 16, 2013 at 5:17 AM
Message: Well, when I wrote this, we still had Virtual Restore. I was talking about that, yes, but also what Phil says. You can compare directly to the backup file from SQL Compare.

Subject: Re: Access to backup files
Posted by: mbourgon (view profile)
Posted on: Thursday, May 16, 2013 at 7:02 PM
Message: Son of a... Never realized thought about comparing backups to prod. Have to go see if that's Schema Compare and Data Compare. Cool, thanks, gents!

Subject: Re: Comparing backup to prod...
Posted by: Phil Factor (view profile)
Posted on: Saturday, May 18, 2013 at 5:15 AM
Message: Well, I've never been allowed to compare anything to prod, but you can compare two backups directly, or compare backup to scripts folder (source control). It is fine running in command-line mode to do this and so you can catch all sorts of changes
See Source Control for Bandits and Anarchists' by Grant Fritchey, the man himself. https://www.simple-talk.com/sql/database-administration/auditing-ddl-changes-in-sql-server-databases/

Subject: Ha! Indeed. (Plus, Event Notifications for the win)
Posted by: mbourgon (view profile)
Posted on: Saturday, May 18, 2013 at 3:10 PM
Message: Phil - thanks for that link! Very nice, Grant. We're doing that right now, though we're finding it takes several hours to do all the servers we care about. However, hadn't tried using backups, and maybe the way you're invoking runs it faster than the way we are.

But, indeed, that's a great way to solve that problem. We're using Event Notifications (http://thebakingdba.blogspot.com/2012/10/master-of-all-i-survey-using-event.html) on 50 servers to get real-time notification of changes. If my session gets accepted for PASS, I'll use that as an excuse to finish up my check-into-TFS-in-real-time functionality. The eventual plan is weekly full checkins (in case something gets missed), and near-live checkins with info about who did it, emails, etc.

Subject: Automation
Posted by: Grant Fritchey (view profile)
Posted on: Sunday, May 19, 2013 at 5:09 AM
Message: If you do present that at PASS, scream at me to go and see it. I'm always trying to learn new and better ways to automate deployment and development.

Subject: What troubleshooting is
Posted by: RWims (not signed in)
Posted on: Monday, May 20, 2013 at 7:11 AM
Message: What a great way to present what could be the dry topic of performance troubleshooting! Raymond Chandler is smiling upon you from the Great Beyond.

Subject: Great article.
Posted by: James (not signed in)
Posted on: Tuesday, May 28, 2013 at 4:45 AM
Message: Great article! Very well written and extremely entertaining.

Keep up the good work.

Subject: Typo
Posted by: Rich (not signed in)
Posted on: Tuesday, May 28, 2013 at 6:34 AM
Message: Great article Grant, fun to read.

Not sure Brent Ozar will appreciate being called an ox, though! (The link at the bottom of your article to Six Scary Surprises...)
(Ed: Now fixed. despite having a certain poetic resonance. )

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Highway to Database Recovery
 Discover the best backup and recovery articles on Simple-Talk, all in one place. Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.