Click here to monitor SSC

Tony Davis

Simple-Talk Editor
News, views and good brews

Index Fragmentation Anxiety: a Doctor Speaks

Published Thursday, October 01, 2009 5:47 PM

Do you suffer from Index Anxiety? Do you lie awake, worrying about whether you are neglecting to defrag your indexes? We’ve all heard the chestnuts: rebuild all indexes that are over 30% fragmented; reorganize those that are between 10-30% fragmented; don't bother at all for small indexes. If you have the database equivalent of an eating disorder, worrying whether your indexes are too fat and fragmented, then I’d like to provide reassurance: they are often designed to be that way.

If Index rebuilding is the database equivalent of a body-image neurosis, then the SQL Server Maintenance Plan Wizard often delivers slimmer's snake-oil: if in doubt, rebuild! Even in cases where at least some effort is made to track index fragmentation, using DBCC SHOWCONTIG or the sys.dm_db_index_physical_stat DMV, I wonder whether the index-rebuild task is done more for reassurance than in confident expectation of a sustained performance boost.

The way it was explained to me is that, like people, indexes gravitate towards a certain natural weight. Some are naturally "skinny" and others naturally "fat". Let’s suppose that you have an index on a "name" column for a table storing members of your website. Entries will be regularly added to and deleted from the index column, space will be created and the index will grow fatter. However, this space will be reused quite effectively. As index entries are removed, the space will be reused as similar entries are added, and entries will naturally migrate from place to place in the index. If you regularly remove that space with a rebuild (or reorganization), all that will happen is that the engine will have to recreate it – working hard in the process – and the index will naturally grow fat again. It's the ultimate yo-yo diet, and it is all to little avail.

I'm not arguing that there aren't valid reasons to rebuild indexes – just betting that over ninety percent of index rebuilds are, at best, a waste of time, if not positively harmful. Too often, it seems to me, there is a missing task, which is to understand exactly how a table and its indexes are used by an application, to know the I/O that the most expensive queries perform, and to determine whether this work really will be reduced, for any significant period of time, by a rebuild. If you are constantly rebuilding indexes to maintain performance, then you need to look at your underlying schema design, and at the way your queries work, and see if there is a way to tune them to reduce range scans, and so on.

Crash Maintenance Plan slimming diets, followed by binge index growth, won't help to give your indexes the perfect figure. It is much better to give your indexes a healthy lifestyle.

As always, we'd love to hear what you think. The best contribution, made as a comment to the blog, will win a $50 Amazon voucher.

Cheers,

Tony.

Comments

 

timothyawiseman@gmail.com said:

I think you have an excellent core point.  Indexes are often rebuilt for no good reason and at best this is a waste of time.  At worst, just as you said, it can make it less effecient for write operations as it has to rearrange things to fit more entries into the middle.

But proper use of the Fill Factor option when creating or rebuilding the index will mitigate much that.  It will permit the server to have room in the middle to add new entries immediately.  This does come at the price of letting the index take more space than it initially needs, but this would happen with time from deletes anyway if the table is frequently modified.  Of course, the best setting for the fill factor depends heavily on how the table is used.  

If the table is going to have frequent inserts or updates that will add new key values that are nonsequential, then the fill factor should be relatively low, permitting a large amount of space for new entries.  If new data on that index will normally be added to the end of a sequence, then it can be relatively high as there will be little need to insert data in the middle of the index.  And of course if dealing with a table that is nearly static then the fill factor probably should be 100% (the default).

MSDN summarizes fill factor options at http://msdn.microsoft.com/en-us/library/ms177459.aspx
October 2, 2009 8:00 PM
 

Rodney said:

I have four kids. Four hungry kids. Imagine going on an extended road trip with 4 hungry kids and two adults. There is a cooler full of sandwiches. Each kid and adult likes different sandwiches and two of the bigger kids and me, of course, will eat two sandwiches. The problem is that none of the kids like the same sandwiches. So, I make roast beef with cheddar for me, two of them; my oldest likes bologna (2), my second oldest likes ham (2) then tuna (1) and PB&J (1). Let's say we all get hungry at the same time and reach into the cooler; if the sandwiches have not been ordered in the cooler (clustered index) or color coded (non-clustered) then there will be a lot of hands reaching about trying to scan for the right sandwich. There will be conflict and I promise someone will be upset, because they had to wait longer, or someone put a booger on their sandwich, which of course did not really happen. But if we all get the sandwich we want on the first go round, even after some quibbling, the second round (for those that eat two) will be less chaotic because someone deleted half of the first layout of sandwiches. If we did not re-order the sandwiches after half the kids and 1 adult was sated, it might still result in a thrashing of appendages. This is a case where reordering is a good thing because fighting over sandwiches (data) makes for an unhappy trip.
October 2, 2009 9:20 PM
 

Rowland said:

1) Measure
2) Analyze
3) Repair
4) Rinse
5) Return to Step 1



October 5, 2009 7:54 AM
 

Tony Davis said:

Yes, the editorial should have at least mentioned the Fill Factor, but I left it out on the basis that many people will just accept the default (100% fill) and then fight a constant and losing rebuild battle as space gets created and the index grows. But yes, if you test your indexes thoroughly and determine the correct fill factor based on usage patterns, then that will definitely help.

I guess my main point was about testing. What performance metrics benefited from the rebuild? Was IO reduced significantly? How big an impact did it have on the performance of your most expensive queries? Was the impact sustained? If the answer to any of these is "no" then rebuilding is not your answer, so don't do it.

Tony.
October 6, 2009 3:34 AM
 

mcair said:

We added a utility table that stores a custom fill-factor by index and refer to it in the rebuild script. Allows us to customize fill factors as volumes on the database increase/decrease.
October 6, 2009 7:46 AM
 

Rowland said:

In my opinion some of this hand-wringing over fragmentation is misguided. I could be wrong, but I don't recall a single Microsoft resource that says x amount of fragmentation should be avoided. Measure and you will know what to do and why.
October 6, 2009 7:54 AM
 

Taersious said:

I am looking for a lazy developer way to track index performance without paying for a software solution. We have several customers with databases that grow the same schema at different rates, based on usage at their site. Most databases are between 10 and 30 gig, others are getting up over 45 gig, and production tables are fragmented. DBCC ShowContig shows many tables with over 1000 Pages Scanned, over 90% Avg Page Density, and Scan Fragmentation values of over 50%. A Maintenance Plan is in order, but what is an optimal target? What FillFactor works best?

Example
TABLE level scan performed.
       - Pages Scanned................................: 176707
       - Extents Scanned..............................: 22349
       - Extent Switches..............................: 136224
       - Avg. Pages per Extent........................: 7.9
       - Scan Density [Best Count:Actual Count].......: 16.22% [22089:136225]
       - Logical Scan Fragmentation ..................: 93.03%
       - Extent Scan Fragmentation ...................: 80.96%
       - Avg. Bytes Free per Page.....................: 1863.8
       - Avg. Page Density (full).....................: 76.97%
October 23, 2009 5:07 PM
You need to sign in to comment on this blog
<October 2009>
SuMoTuWeThFrSa
27282930123
45678910
11121314151617
18192021222324
25262728293031
1234567
How to Kill a Company in One Step or Save it in Three
 The majority of companies that suffer a major data loss subsequently go out of business. Wesley David... Read more...

Migrating from OCS 2007 R2 to Lync: Part 4
 Having migrated the rest of our users and legacy resources across and started getting ready to... Read more...

Automated Script-generation with Powershell and SMO
 In the first of a series of articles on automating the process of building, modifying and copying SQL... Read more...

Seth Godin: Big in the IT Business
 Seth Godin has transformed our understanding of marketing in IT. He invented the concept of 'permission... Read more...

Using SQL Test Database Unit Testing with TeamCity Continuous Integration
 With database applications, the process of test and integration can be frustratingly slow because so... Read more...