Click here to monitor SSC

Rodney

DBA Decision #1 - New Hardware, Index Tuning or Application

Published Thursday, October 16, 2008 8:37 PM

So I am sitting here doctoring (not nursing) a beer listening to Jimmy Eat World and Nerf Herder and thinking about a huge decision I had to make this week.  The more I thought about it, the more I wanted to pose this scenario, this quote unquote real-world scenario to get some feedback on how other DBAs or managers might handle the same situation.  I know from doing this for so long and the people I have met, that this happens quite a bit to DBAs. 

The scenario:

You have a slow running front end application. It has been getting progressively slower over the past few months, however, it has always had issues with certain processes. You, as the DBA,  know you are dealing with millions of records in some tables. These tables have a tossed salad approach to indexes, one clustered index and up to 30 or more non-clustered indexes. You discover that a former DBA has added non-clustered indexes based on the sound advise of the index tuning wizard or database tuning advisor (dta). When things run slow you analyze and see thousands of sustained shared locks on one of the dta indexes; no blocking but users complain of slowness. You also know that developers are working on a solution to address a potential bottleneck with code you can not see.

Business is...well...pissed. A decision is made to move to a new, bigger server. This always fixes problems with code, right?  The other variable is that since we are moving to a new server, why not go ahead and go from a 32-bit SQL 2000 Enterprise install to a 64-bit Standard SQL 2005 install? It would be WAY easier to manage and should gain a 20 to 30% performance increase OOTB.

Okay, so User Acceptance Testing (UAT) has been performed on SQL 2005 but for functionality only.

You also learn late in the game that the server you want to move to is a single RAID 5 array of local disks, not SAN attached.

Wait...you also just learn the vendor does not officially support SQL Server 2005. They have customers there but do not know how it will work for us.

Two days prior to moving to this new server with double the processors with the same version of SQL (2000) the vendor responds and says they now will support 2005 and beckon us to go there.

Now you have to decide:

1.) Do you stall the upset users and work to address the slow running code by working with the developers and tuning/deleting the dta indexes in hopes that this will buy enough time to test SQL Server 2005 for a future upgrade (within two weeks)

2.) Do you go ahead with the SQL Server 2000 migration to the new server over the next weekend in hopes that it will address the users slowness and loss of productivity, though the problem may still exist on the new faster server.

3.) Do you move forward with SQL Server 2005 64 bit Standard, knowing, wow, knowing there is no easy way back if there are problems.

4.) Do you spend the next two days testing with the developers, delete the ferel index from dta, re-create all of the large clustered indexes late on a non-maintenance weeknight knowing that SQL 2000 does not have an online rebuild functionatliy and will require whatever is not completed to be rolled back and could impact production?

5.)  Add you own decision here.

I will relay what I did (ed - will have to do) with this scenario in a reply.

by Rodney

Comments

 

Jason Haley said:

October 17, 2008 9:17 AM
 

Andrew Clarke said:

Amused by the idea of a Tossed Salad indexing policy. We used to use the similar term  'Chop suey' (Chinese for  'mixed pieces')
October 17, 2008 12:17 PM
 

Phil Factor said:

There is always a difference between what you want to do and what is possible in the complex circumstances of real-life. I suspect that what you really want to do is to create a test copy of the current state of the database complete with data, apply a simulation of the sort of usage/loading  that causes the locks, strip out all unnecessary indexes (sometimes I do all in an index bloodbath), get beer and calming music, and impose a new rational indexing policy, bit by bit, until the database hums sweetly. Whenever one can do this, it is amazing how fast one can make everything run. The problem is that the test harness takes longer to create than the new indexing strategy, and usually requires developers, but I'd say it was worth it every time.
October 18, 2008 12:43 PM
 

MVV said:

Do your users really need 30 million records ? can't you move historical data to other tables and then make views to attend reporting needs while keeping your tables trim and in good order?

I agree with phill , too much indexes makes me itchy on a table that gets insertions , updates and/or deletes. No sweat it they are only for reporting.

And i wouln't rule out Sql Server 2008. Maybe you would be better serverd with a server that has nearly all the kinks of 2005 ironed out.
October 19, 2008 7:02 AM
 

Arles said:

There's a lot to be said about "Root Cause Analysis", with that said, I'd bite the bullet and implement option 4.  An overwhelming number of application issues related to sluggish server performance are due to poor coding practices and the creation of indexes as an afterthought of design (assuming that there was design to begin with).  

Migrating to users to a bigger and faster server is akin to taking ibuprofen for a tooth that’s needing a root canal.  Assuming that the server migration is a tourniquet that will stop the bleeding, server migration is not totally devoid of issues within itself.  Bottom line - “pay me now or pay me later.”
October 20, 2008 3:04 PM
 

randyvol said:

There are a few questions I'd like to ask on top of the information provided, such as:

1) Is the application home grown or purchased?  If home grown, perhaps the developers could assist the DBA in solving the performance issues.  If purchased, I'd be inclined to find the problem spots in the application and have some testing done on the testbed with profiler running to see if I could spot performance issues.
In my experience commercial transaction processing application developers get "cursor happy" in their stored procedures; worse, they love to call procedures from within procedures, from within procedures.  I have found both to be potential bottlenecks.

2) Have you observed the users during these 'performance bottleneck' times?  My experience teaches me that users have their own job to do, are not a computer-saavy bunch, and quite often innocently toss more fuel on a quickly growing bonfire... example: quite used to responses in the subsecond to 1 second range, when users encounter a slow down, they are quite happy to "help" the system "hear" their request by continually click-pounding the save or execute button at a frantic pace.

If this is a factor in the problem, then I have found that spending some time with the user community explaining the impacts such actions will help somewhat.

3) I am loathe to throw more hardware at a problem when there is evidence that indexing problems are possibly a cause.  Where I work now we have a decently sized IBM 3850 with a fibre-attached SAN.  People were trying to run a query of workorders and were literally having an hour glass wait of 6 to 8 hours!!  Needless to say this got my attention as well as the software vendor's attention.  We found that their product, which was an add-on to our ERP software was trying to join a table with a table in ERP.  Even though these vendors are 'partners', apparently no engineer on either side had thought through the integration aspects of joining these two tables as there was no index or apparent relationship defined between the two, by any other means.  The developer this vendor had dispatched was more of a application developer and would not have caught the problem (full table scans).
I proposed an index to create on that vendor's table, suggested he contact them and get approval to create it.  Meanwhile I contacted the ERP vendor and proposed an index for their table and got their permission to create it.  We did this on a test system (an identical server config).  We built the indexes and modified the query slightly and had the end user group run the dreaded query - results were populated in the windows form in 2 seconds!

That said, judicious use of indexes can be an answer, whereas indexing everything is going to become a problem.  I'd be inclined to do an exhaustive analysis of the indexes and why they are there.  Probably outcome would be the removal of some, possibly several.

4) I don't have a huge staff - if all I have is 2 weeks before a "mandated" upgrade, I'll usually punt and stipulate that I'll just perform the mandated upgrade.
Where I work I have an understanding with the owners of the company which amounts to this: "You hired me to run the servers because I know what I'm doing.  You may however, at any time direct me to take some action that I don't agree with because it is your company and you pay me.  I will tell you why I don't agree with it, and if you decide you want it done anyway, then we will both understand that the outcome is a result of your decision, not mine.  If it is good, you will be happy and proud of your IT decision-making ability.  If bad, you will be reminded of the fact that you hired me to run the servers because I know what I'm doing."  Around here hardware updates are not usually mandated, they are requested by me.  I know that many DBAs don't have that luxury - they work in a larger scale operation and deal with scale and scope issues I have the ability to avoid.

That said, I await your reply - I'm always interested in what other people's experiences reveal  !!

October 21, 2008 7:06 AM
 

imassi said:

Since the new server has a RAID 5 array of local disks, there is a bit more risk there.  I'd also wager that it is implied that these disks are slower?  If this huge database has a bottleneck, then it's likely a disk I/O bottleneck and the new server might slow things down even more.  So option #2 is out.

Upgrading to a version that is unsupported by the vendor (SQL Server 2005), is also risky even though it would probably work fine and would likely speed things up a bit.  I doubt it would fix the problem though.  If something went wrong, the vendor would wash their hands of any support agreement and any issues would be on your plate and your plate alone both in the near term and long term.  That means option #3 is out.

I'd be running Profiler during peak time, saving the results to a database on another machine.  If I can, I'd get some users to show me what is running slowly.  Then I'd go over the trace results, if I had assistance from users, I'd look at the queries that they were showing me.  If I didn't I'd look at the longest running queries.  Taking a look at the execution plans, identify what's going wrong and fix the indices (removing some and the others probably need a good defrag to boot).  It'd be tedious, and may require assistance from the developers but it'd probably solve the problem.  You could likely stall the upgrade demands if you can show some progress.  Depending on change procedures at this place, you could show some progress within a day or so.  Sounds a bit like option #1 with a twist of #4.

I wonder how you tackled it?
October 21, 2008 9:05 AM
 

OneTallCampbell said:

I would think going to the newer, bigger server would be a bad idea.  It may absolve the business by seeing some brand new shiny hardware, but it wouldn't make your job any better.  At best, it'd just mask the problem short term until your dbase hits critical mass again.  At worst, it'll be a waste of a time because the new server will have the same performance issues, but you've just added in an additional X factor for 64 bit.  You can polish a turd all you want, but in the end all you have is a shiny turd.

In a perfect world, I'd play whack-a-mole on the feral indexes and rebuild/reindex the dbase to run on the current hardware.  It may not give the same satisfaction as a new server, but it'd be the more correct run.  It'd also prep you for an easier upgrade for 2005.  When you go for the upgrade, you'd be able to truly test the performance gain as you're original benchmarks would be from a finely tuned dbase instead of one with a shotgun approach (index everything and something will hit.  Ha-ha!)

There also seems to be a lot of emphasis on just the indexes - from a developer standpoint, is it possible to tune their processes to be less of a burden on SQL?  (IE Cheater tables, cleaner SPs, etc)



October 21, 2008 8:52 PM
 

Rodney said:

I really like everyone's comments. While the ball is still in play, I can tell you I have been loathing the idea of moving to the new server just to plug a hole that we know can only be truly remedied with code enhancement and a rational approach to index creation.  A full bore index rebuild which took 18 hours garnered some good results but they were short lived. I have determined that moving to the new box with SQL 2000, while not the best decision is a necessary stop-gap because it will buy time to focus on finding and resolving the core issue with the vendor code, custom modification in-house, mercurial and ill-performing indexes and WAY too many cursor calls.  This move will happen tomorrow evening with full user acceptance pre and post. Several issues were discovered that tilted the scale toward a rapid-fire move to a new server maintaining SQL Server 2000 version is that the same reindexing jobs that took 18 hours on the old server took 3 on the new, so there is some as yet undiscovered issue.
Performance monitor shows good across the board for SQL - Buffer Cache Hit Ratio at +99%, Average Disk Queue Lengths sustained between .001 and .009.  There are some Writelog wait types we noticed but these are not sustained, however they could point to over zealous commits.
When work is backing up and the pressure builds to put a solution in place, the DBA must act, right or wrong, turd or tourniquet.
We will see on Thursday.
Rodney
October 21, 2008 11:46 PM
You need to sign in to comment on this blog
<October 2008>
SuMoTuWeThFrSa
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678
Migrating from OCS 2007 R2 to Lync: Part 4
 Having migrated the rest of our users and legacy resources across, and start getting ready to... Read more...

Automated Script-generation with Powershell and SMO
 In the first of a series of articles on automating the process of building, modifying and copying SQL... Read more...

Seth Godin: Big in the IT Business
 Seth Godin has transformed our understanding of marketing in IT. He invented the concept of 'permission... Read more...

Using SQL Test Database Unit Testing with TeamCity Continuous Integration
 With database applications, the process of test and integration can be frustratingly slow because so... Read more...

Converting String Data to XML and XML to String Data
 We all appreciate that, in general, XML documents or fragments are held in strings as text markup. In... Read more...