Click here to monitor SSC
  • Av rating:
  • Total votes: 260
  • Total comments: 19
Michelle Ufford

Effective Clustered Indexes

06 January 2011

As a guideline, clustered Indexes should be Narrow, Unique, Static and Ever Increasing (NUSE).  Michelle Ufford Explains why.

Clustered indexes are the cornerstone of good database design. A poorly-chosen clustered index doesn't just lead to high execution times; it has a 'waterfall effect' on the entire system, causing wasted disk space, poor IO, heavy fragmentation, and more.

This article will present all the attributes that I believe make up an efficient clustered index key, which are:

  • Narrow – as narrow as possible, in terms of the number of bytes it stores
  • Unique – to avoid the need for SQL Server to add a "uniqueifier" to duplicate key values
  • Static – ideally, never updated
  • Ever-increasing – to avoid fragmentation and improve write performance

By explaining how SQL Server stores clustered indexes and how they work, I will demonstrate why these attributes are so essential in the design of a good, high-performance clustered index.

How clustered indexes work

In order to understand the design principles that underpin a good clustered index, we need to discuss how SQL Server stores clustered indexes. All table data is stored in 8 KB data pages. When a table contains a clustered index, the clustered index tells SQL Server how to order the table's data pages. It does this by organizing those data pages into a B-tree structure, as illustrated in Figure 1.

Figure 1: The b-tree structure of a clustered index

It can be helpful, when trying to remember which levels hold which information, to compare the B-tree to an actual tree. You can visualize the root node as the trunk of a tree, the intermediate levels as the branches of a tree, and the leaf level as the actual leaves on a tree.

The leaf level of the B-tree is always level 0, and the root level is always the highest level. Figure 1 shows only one intermediate level but the number of intermediate levels actually depends on the size of the table. A large index will often have more than one intermediate level, and a small index might not have an intermediate level at all.

Index pages in the root and intermediate levels contain the clustering key and a page pointer down into the next level of the B-tree. This pattern will repeat until the leaf node is reached. You'll often hear the terms "leaf node" and "data page" used interchangeably, as the leaf node of a clustered index contains the data pages belonging to the table. In other words, the leaf level of a clustered index is where the actual data is stored, in an ordered fashion based on the clustering key.

Let's look at the B-tree again. Figure 2 represents the clustered index structure for a fictional table with 1 million records and a clustering key on EmployeeID.

Figure 2: A b-tree index for a 1-million row table

The pages in Level 1 and Level 2, highlighted in green, are index pages. In Level 1, each page contains information for 500,000 records. As discussed, each of these pages stores not half a million rows, but rather half a million clustered index values, plus a pointer down into the associated page on the next level. For example, to retrieve the details for Employee 500, SQL Server would read three pages: the root page in Level 2, the intermediate page in Level 1, and the appropriate leaf level page in Level 0. The root page tells SQL Server which intermediate level page to read, and the intermediate page tells it which specific leaf level page to read.

Index seeks and Index scans
When specific data is returned from data page, in this fashion, it is referred to as an index seek. The alternative is an index scan, whereby SQL Server scans all of the leaf level pages in order to locate the required data. As you can imagine, index seeks are almost always much more efficient than index scans. For more information on this topic, please refer to the Further Reading section at the end of this article.

In this manner, SQL Server uses a clustered index structure to retrieve the data requested by a query. For example, consider the following query against the Sales.SalesOrderHeader table in AdventureWorks, to return details of a specific order.

SELECT  CustomerID ,

        OrderDate ,

        SalesOrderNumber

FROM    Sales.SalesOrderHeader

WHERE   SalesOrderID = 44242 ;

This table has a clustered index on the SalesOrderID column and SQL Server is able to use it to navigate down through the clustered index B-tree to get the information that is requested. If we were to visualize this operation, it would look something like this:

Root Node

SalesOrderID

PageID

 

 

 

 

NULL

750

 

 

 

 

59392

751

 

 

 

Intermediate level

(Page 750)

SalesOrderID

PageID

 

 

 

 

44150

814

 

 

 

 

44197

815

 

 

 

 

44244

816

 

 

 

 

44290

817

 

 

 

 

44333

818

 

 

 

Leaf level

(Page 815)

SalesOrderID

OrderDate

SalesOrderNumber

AccountNumber

CustomerID

 

44240

9/23/2005

SO44240

10-4030-013580

13580

 

44241

9/23/2005

SO44241

10-4030-028155

28155

 

44242

9/23/2005

SO44242

10-4030-028163

28163

In the root node, the first entry points to PageID 750, for any values with a SalesOrderID between NULL and 59391. The data we're looking for, with a SalesOrderID of 44242, falls within that range, so we navigate down to page 750, in the intermediate level. Page 750 contains more granular data than the root node and indicates that the PageID 815 contains SalesOrderID values between 44197 and 44243. We navigate down to that page in the leaf level and, finally, upon loading PageID 815, we find all of our data for SalesOrderID 44242.

Characteristics of an effective clustered index

Based on this understanding of how a clustered index works, let's now examine why and how this dictates the components of an effective clustered index key: narrow, unique, static, and ever-increasing.

Narrow

The width of an index refers to the number of bytes in the index key. The first important characteristic of the clustered index key is that it is as narrow as is practical. To illustrate why this is important, consider the following narrow_example table:

CREATE TABLE dbo.narrow_example
    (
      web_id      INT IDENTITY(1,1), -- unique

      web_key     UNIQUEIDENTIFIER , -- unique

      log_date    DATETIME , -- not unique

      customer_id INT -- not unique
    ) ;

The table has been populated with 10 million rows and table contains two columns that are candidates for use as the clustering key:

  • web_id – a fixed-length int data type, consuming 4 bytes of space
  • web_key – a fixed-length uniqueidentifier data type, consuming 16 bytes.

TIP:
Use the DATALENGTH function to find how many bytes are being used to store the data in a column.

So, which column will make a better clustered index key? Let's take a look at the B-tree structure of each, shown in Figure 3.

Figure 3: The b-tree levels for clustered indexes based on int and uniqueidenitifier key

The most obvious difference is that the uniqueidentifier key has an additional non-leaf level, giving 4 levels to its tree, as opposed to only 3 levels for the int key. The simple reason for this is that the uniqueidentifier consumes 300% more space than the int data type, and so when we create a clustered key on uniqueidentifier, fewer rows can be packed into each index page, and the clustered key requires an additional non-leaf level to store the keys.

Conversely, using a narrow int column for the key allows SQL Server to store more data per page, meaning that it has to traverse fewer levels to retrieve a data page, which minimizes the IO required to read the data. The potential benefit of this is large, especially for range scan queries, where more than one row is required to fulfill the query criteria. In general, the more data you can fit onto a page, the better your table can perform. This is why appropriate choice of data types is such an essential component of good database design.

However, our choice of clustering key can affect the performance of not only the clustered index, but also any non-clustered indexes that rely on the clustered index. As shown in Figure 4, a non-clustered index contains the clustered index key in every level of its b-tree structure, as a pointer back into the clustered index. This happens regardless of whether or not the clustering key was explicitly included in the nonclustered index structure, either as part of the index key or as an included column. In other words, whereas in the clustered index the leaf level contains the actual data rows, in a nonclustered index, the leaf level contains the clustered key, which SQL Server uses to find the rest of the data.

Figure 4: Non-clustered indexes also store the clustering key in order to look up data in the clustered index

So, let's see how our choice of clustering key impacts the potential performance of our non-clustered indexes. We'll keep the example pretty simple and create a non-clustered index on customer_id, which is an int data type.

CREATE NONCLUSTERED INDEX IX_example_customerID
ON dbo.narrow_example (customer_id) ;

Figure 5 shows the resulting B-tree structures of our nonclustered index, depending on whether we used the uniqueidentifier or the int column for our clustered index key.

Figure 5

While we have the same number of levels in each version of the index, notice that the non-clustered index based on the int clustering key stores 86% more data in each leaf-level data page than its uniqueidentifier counterpart. Once again, the more rows you can fit on a page, the better the overall system performance: range-scan queries on the narrow int version will consume less IO and execute faster than equivalent queries on the wider, uniqueidentifier version.

In this example, I've kept the table and index structures simple in order to better illustrate the basic points. In a production environment, you'll often encounter tables that are much, much wider. It's possible that such tables will require a composite clustered index, where the clustering key is comprised of more than one column. That's okay; the point isn't to advise you to base all of your clustered keys on integer IDENTITY columns, but to demonstrate that a wide index key can have on a significant, detrimental impact on a database's performance, compared to a narrow index key. Remember, narrowness refers more to the number of bytes consumed than the number of columns. For example, a composite clustered key on three int columns would still be narrower than a uniqueidentifier key (4 + 4 + 4 = 12 bytes for the former vs. 16 bytes for the latter).

Unique

Index uniqueness is another highly desirable attribute of a clustering key, and goes hand-in-hand with index narrowness. SQL Server does not require a clustered index to be unique, but yet it must have some means of uniquely identifying every row. That's why, for non-unique clustered indexes, SQL Server adds to every duplicate instance of a clustering key value a 4-byte integer value called a uniqueifier. This uniqueifier is added everywhere the clustering key is stored. That means the uniqueifier is stored in every level of the B-tree, in both clustered and non-clustered indexes. As you can imagine, if there are many rows using the same clustering key value, this can become quite expensive.

What's more, the uniqueifier is stored as a variable-length column. This is important because if a table does not already contain any other variable-length columns, each duplicate value is actually consuming 8-bytes of overhead: 4 bytes for the uniqueifier value and 4 bytes to manage variable-length columns on the row. The following example demonstrates this. We create a table with a non-unique clustered index, insert into it a single row, and then retrieve minimum and maximum record sizes (which currently refer to the same, single record) from the sys.dm_db_index_physical_stats DMV:

CREATE TABLE dbo.overhead ( myID INT NOT NULL ) ;
 
CREATE CLUSTERED INDEX CIX_overhead -- not unique!
ON dbo.overhead(myID) ;
 
INSERT  INTO dbo.overhead
        ( myID )
        SELECT  1 ;
 

SELECT  min_record_size_in_bytes ,

        max_record_size_in_bytes

FROM    sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(N'dbo.overhead'),

                                       NULL, NULL, N'SAMPLED') ;

 

min_record_size_in_bytes   max_record_size_in_bytes

------------------------ ------------------------

11                         11

 

(1 row(s) affected)

Although we only have a single column in the table, there is a minimum of 7 bytes of overhead per row, in SQL Server. While this overhead may increase with the addition of NULL or variable-length columns, it will never be less than 7 bytes per row. The other 4 bytes are used to store the int column, myID.

Now let's insert a duplicate value into the table:

INSERT  INTO dbo.overhead

        ( myID )

        SELECT  1 ;

 

SELECT  min_record_size_in_bytes ,

        max_record_size_in_bytes

FROM    sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(N'dbo.overhead'),

                                       NULL, NULL, N'SAMPLED') ;

 

min_record_size_in_bytes   max_record_size_in_bytes

------------------------ ------------------------

11                         19

 

(1 row(s) affected)

The duplicate value requires the addition of a uniqueifier, which consumes an extra 4 bytes. However, since a variable-length column, such as a varchar() column, does not already exist on the table, an additional 4 bytes are added by SQL Server to manage the variable-length properties of the uniqueifier. This brings the total uniqueifier overhead to 8 bytes per row.

TIP:
The sys.dm_db_index_physical_stats DMV runs in three modes: LIMITED, SAMPLED, or DETAILED. The min_record_size_in_bytes and max_record_size_in_bytes columns are only available in SAMPLED or DETAILED mode. Be careful when running this DMV in production or on large tables, as the SAMPLED mode scans 1% of pages and DETAILED modes scans all pages. Refer to Books Online for more information.

So, returning to our original narrow_example table, let's see what would happen if the clustering key was changed to customer_id, which is a non-unique int. Although the uniqueifier is not readily visible and cannot be queried, internally the leaf-level page might look something like this:

web_id

web_key

log_date

customer_id

uniqueifier

1

6870447C-A0EC-4B23-AE5F-9A92A00CE166

12/15/2010

1

NULL

2

5AB480CF-40CD-43FD-8C3D-5C625875E143

12/15/2010

1

1

3

95C312B9-83AF-4725-B53C-77615342D177

12/15/2010

1

2

4

88AA4497-9A20-4AB7-9704-1FDFAE200564

12/15/2010

2

NULL

5

E3EA3014-FC23-48B6-9205-EE6D06D37C5B

12/15/2010

2

1

6

9F6A8933-F6EC-416F-AACA-1C3FF172151C

12/15/2010

3

NULL

7

B16406A8-649B-4E7A-A234-C7B7D8FCE2D3

12/15/2010

4

NULL

8

443B627B-21CE-4466-AD15-1879C8749225

12/15/2010

4

1

9

2F3757DE-3799-4246-BA88-944C5DA3683E

12/15/2010

4

2

10

25D9F2AA-6610-48CD-9AC4-4F1E29FDED1C

12/15/2010

4

3

The uniqueifier is NULL for the first instance of each customer_id, and is then populated, in ascending order, for each subsequent row with the same customer_id value. The overhead for rows with a NULL uniqueifier value is, unsurprisingly, zero bytes. This is why min_record_size_in_bytes remained unchanged in the overhead table; the first insert had a uniqueifier value of NULL. This is also why it is impossible to estimate how much additional storage overhead will result from the addition of a uniqueifier, without first having a thorough understanding of the data being stored. For example, a non-unique clustered index on a datetime column may have very little overhead if data is inserted, say, once per minute. However, if that same table is receiving thousands of inserts per minute, then it is likely that many rows will share the same datetime value, and so the uniqueifier will have a much higher overhead.

If your requirements seem to dictate the use of a non-unique clustered key, my advice would be to look to see if there are a couple of relatively narrow columns that, together, can form a unique key. You'll still see the increase in the row size for your clustering key in the index pages of both your clustered and nonclustered indexes, but you'll at least save the cost of the uniqueifier in the data pages of the leaf level of your clustered index. Also, instead of storing an arbitrary uniqueifier value to the index key, which is meaningless in the context of your data, you would be adding meaningful and potentially useful information to all of your nonclustered indexes.

A good clustered index is also built upon static, or unchanging, columns. That is, you want to choose a clustering key that will never be updated. SQL Server must ensure that data exists in a logical order based upon the clustering key. Therefore, when the clustering key value is updated, the data may need to be moved elsewhere in the clustered index so that the clustering order is maintained. Consider a table with a clustered index on LastName, and two non-clustered indexes, where the last name of an employee must be updated.

Figure 6: The effect of updating a clustered key column

Not only is the clustered index updated and the actual data row moved – most likely to a new data page – but each non-clustered index is also updated. In this particular example, at least three pages will be updated. I say "at least" because there are many more variables involved, such as whether or not the data needs to be moved to a new page. Also, as discussed earlier, the upper levels of the B-tree contain the clustering key as pointers down into the leaf level. If one of those index pages happens to contain the clustering key value that is being updated, that page will also need to be updated. For now, though, let's assume only three pages are affected by the UPDATE statement, and compare this to behavior we see for the same UPDATE, but with a clustering key on ID instead of LastName.

Figure 7: An UPDATE that does not affect the clustered index key

In Figure 7, only the data page in the clustered index is changed because the clustering key is not affected by the UPDATE statement. Since only one page is updated instead of three, clustering on ID requires less IO than clustering on LastName. Also, updating fewer pages means the UPDATE can complete in less time.

Of course, this is another simplification of the process. There are other considerations that can affect how many pages are updated, such as whether an update to a variable-length column causes the row to exceed the amount of available space. In such a case, the data would still need to be moved, although only the data page of the clustered index is affected; nonclustered indexes would remain untouched.

Nevertheless, updating the clustering key is clearly more expensive than updating a non-key column. Furthermore, the cost of updating a clustering key increases as the number of non-clustered indexes increases. Therefore, it is a best practice is to avoid clustering on columns that are frequently updated, especially in systems where UPDATE performance is critical.

Ever-Increasing

The last important attribute of a clustered index key is that it is ever-increasing. In addition to narrow, unique, and static, an integer identity column is an excellent example of an ever-increasing column. The identity property continuously increments by the value defined at creation, which is typically one. This allows SQL Server, as new rows are inserted, to keep writing to the same page until the page is full, then repeating with a newly allocated page.

There are two primary benefits to an ever-increasing column:

  1. Speed of insert – SQL Server can much more efficiently write data if it knows the row will always be added to the most recently allocated, or last, page
  2. Reduction in clustered index fragmentation – this fragmentation results from data modifications and can take the form of gaps in data pages, so wasting space, and a logical ordering of the data that no longer matches the physical ordering.

However, before we can discuss the effect of the choice of clustering key on insert performance and index fragmentation, we need to briefly review the types of fragmentation that can occur.

Internal and external index fragmentation

There are two types of index fragmentation, which can occur in both clustered and non-clustered indexes: extent (a.k.a. external) and page (a.k.a. internal) fragmentation. First, however, Figure 8 illustrates an un-fragmented index.

Figure 8: Data pages in an un-fragmented clustered index

In this simplified example, a page is full if it contains 3 rows, and in Figure 8 you can see that every page is full and the physical ordering of the pages is sequential. In extent fragmentation, also known as external fragmentation, the pages get out of physical order, as a result of data modifications. The pages highlighted in orange in Figure 9 are the pages that are externally fragmented. This type of fragmentation can result in random IO, which does not perform as well as sequential IO.

Figure 9: External fragmentation in a clustered index

Figure 10: Internal fragmentation in a clustered index

Figure 10 illustrates page fragmentation, also known as internal fragmentation, and refers to the fact that the there are gaps in the data pages, which reduces the amount of data that can be stored on each page, and so increase the overall amount of space needed to store the data. Again, the pages in orange indicate an internally fragmented page.

For example, comparing Figures 8 and 10, we can see that the un-fragmented index holds 15 data rows in 5 pages. By contrast, the index with internal fragmentation only holds 9 data rows in the same number of pages. This is not necessarily a big issue for singleton queries, where just a single record is needed to fulfill the request. However, when pages are not full and additional pages are required to store the data, range-scan queries will feel the effects, as more IO will be required to retrieve those additional pages.

Most indexes suffering from fragmentation will often have both extent and page fragmentation.

How non-sequential keys can increase fragmentation

Clustering on ever-increasing columns such as identity integers will result in an un-fragmented index, as illustrated in Figure 8. This results in sequential IO and maximizes the amount of data stored per page, resulting in the most efficient use of system resources. It also results in very fast write performance.

Use of a non-sequential key column can, however, result in a much higher overhead during insertion. First, SQL Server has to find the correct page to write to and pull it into memory. If the page is full, SQL Server will need to perform a page split to create more space. During a page split, a new page is allocated, and half the records are moved from the old page to the newly-allocated page. Each page has a pointer to the previous and next page in the index, so those pages will also need to be updated. Figure 11 illustrates the results of a page split.

Figure 11: Internal and external fragmentation as a result of a page split

Initially, we have two un-fragmented pages, each holding 3 rows of data. However, a request to insert "coconut" into the table results in a page split, because Page 504, where the data naturally belongs, is full. SQL Server allocates a new page, Page 836, to store the new row. In the process, it also moves half the data from Page 504 to the new page in order to make room for new data in the future. Lastly, it updates the previous and next pointers in both pages 504 and 505. We're left with Page 836 out of physical ordering, and both pages 504 and 836 contain free space. As you can see, not only would writes to this latter scenario be slower, but both internal and external fragmentation of the table would be much higher.

I once saw a table with 4 billion rows clustered on a non-sequential uniqueidentifier, also known as a GUID. The table had a fragmentation level of 99.999%. Defragging the table and changing the clustering key to an identity integer resulted in a space savings of over 200 GB. Extreme, yes, but it illustrates just how much impact an ever-increasing clustering key can have on table.

I am not suggesting that you only create clustered indexes on identity integer columns. Fragmentation, although generally undesirable, primarily impacts range-scan queries; singleton queries would not notice much impact. Even range-scan queries can benefit from routine defragmentation efforts. However, the ever-increasing attribute of a clustered key is something to consider, and is especially important in OLTP systems where INSERT speed is important.

Summary

In this article, I've discussed the most desirable attributes of a clustered index: narrow, unique, static, and ever-increasing. I've explained what each attribute is and why each is important. I've also presented the basics of B-tree structure for clustered and non-clustered indexes. The topic of "indexing strategy" is vast topic and we've only scratched the surface. Beyond what I presented in this article, there are also many application-specific considerations when choosing a clustering key, such as how data will be accessed and the ability to use the clustered index in range-scan queries. As such, I'd like to stress that the attributes discussed in this article are not concrete rules but rather time-proven guidelines. The best thing to do if you're not sure if you've chosen the best clustering key is to test and compare the performance of different strategies.

Further Reading

Brad's Sure Guide to Indexes Brad McGehee's "ground level" overview of indexes and how they work

Defragmenting Indexes in SQL Server 2005 and 2008 Rob Sheldon on investigating, and fixing, index fragmentation using sys.dm_db_index_physical_stats.

SQL Server Indexes: The Basics, by Kathi Kellenberger

Performance Considerations of Data Types, by Michelle Ufford

Michelle Ufford

Author profile:

Michelle Ufford is a Senior SQL Server Developer at GoDaddy.com, the world's largest domain name registrar and web hosting company. She specializes in designing and tuning high-volume, multi-terabyte databases. She's perhaps best known for her index defrag script and for tuning a VLDB to support 27k transactions per second. Michelle is also a frequent speaker, published author, and award-winning blogger (http://sqlfool.com) on SQL Server topics. She can be found on Twitter at @sqlfool.

Search for other articles by Michelle Ufford

Rate this article:   Avg rating: from a total of 260 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: http://www.healthdurbar.com/
Posted by: http://www.healthdurbar.com/ (not signed in)
Posted on: Thursday, January 06, 2011 at 2:29 PM
Message: great information on clustered index

Subject: Excellent Article
Posted by: Mike From Minneapolis (not signed in)
Posted on: Friday, January 07, 2011 at 9:54 AM
Message: Very good article, and very well written.
I appreciate the level of detail, especially with the descriptions of the inner workings of non-clustered indexes.

Keep up the great work!

Subject: Nice article
Posted by: Celko (view profile)
Posted on: Friday, January 07, 2011 at 10:15 AM
Message: The only thing I would add is something on how covering indexes work and affect performance.

Subject: Question.
Posted by: pzhu1968 (view profile)
Posted on: Friday, January 07, 2011 at 3:14 PM
Message: Michelle:

You mentioned "The pages in Level 1 and Level 2, highlighted in green, are index pages. In Level 1, each page contains information for 500,000 records."

Are you sure? index page is 8k page. I did the math like this(roughly w/o considering overhead):
8096bytes/4byte(empid size) is about
2000 records. Not sure how you get 500000 records.

Subject: Excellent Article
Posted by: Fabricio Lima (not signed in)
Posted on: Friday, January 07, 2011 at 7:43 PM
Message: Thanks for this excellent information.

Subject: RE: Question.
Posted by: Michelle Ufford (not signed in)
Posted on: Saturday, January 08, 2011 at 4:11 PM
Message: Hi pzhu1968,

Sorry for the confusion. That example was meant more as a conceptual illustration of what happens in the B-tree. Let me try to clarify. If you have a table with only an int column, the most rows you can store per leaf level page is going to be around 700. That's 8096 (rows per page) divided by ( 4 plus 7 ) -- 4 bytes for the integer plus 7 bytes of overhead per row. That number would, in reality, be lower because there would be other columns in the table other than an integer. But, to keep it simple, let's say we only have an int in the table, and we store 700 records in the leaf level. Now, here's the part that I'm not 100% sure about. I believe the intermediate pages require 9 bytes of overhead per row instead of 7, in which case we can actually only store around 600 rows per intermediate page. Each of those rows in the intermediate page is actually pointing to a *leaf level page* that stores the aforementioned 700 rows. So, using some rough math here, a single, full intermediate page can actually point to around 420,000 rows. I used 500,000 in my example because I thought it'd be a little easier to understand.

This is all theoretical, of course. When I create a table with one million rows, I actually end up with 5 intermediate pages that are each around 50% full. Your results may vary, especially depending on index fillfactor settings, but you can check this out for yourself if you run the following code:

Set NoCount On;

Create Table dbo.test
(
myID int Not Null
Constraint PK_test
Primary Key(myID)
);

Declare @myID int = 0;

While @myID < 1000000
Begin
Insert Into dbo.test
Select @myID;

Set @myID += 1;
End;

Select index_level
, page_count
, record_count
, (record_count / (page_count * 1.0)) As 'rowsPerPage'
, avg_page_space_used_in_percent
, (page_count * (avg_page_space_used_in_percent / 100)) As 'pagesNeeded'
From sys.dm_db_index_physical_stats
(DB_ID(), OBJECT_ID(N'dbo.test'), NULL, NULL, N'Detailed');

Drop Table dbo.test;


I hope that helps answer your question. Sorry for the confusion, and thanks for the question! :)

Michelle

Subject: Excellent post.
Posted by: Amol (view profile)
Posted on: Monday, January 10, 2011 at 3:51 PM
Message: Excellent post!

Subject: Very timely article
Posted by: Chris Barker (not signed in)
Posted on: Monday, January 10, 2011 at 10:33 PM
Message: Michelle
I am currently teaching the beta course for SQL Server 2008 R2 for Microsoft Learning and the very topic we hit today is indexing (with Greg Low (the author) sitting in the back of the class)! Your article (which was highlighted in the Simple-Talk newsletter which arrived today) was very timely and much appreciated by my class--thank you!

Subject: Clarification
Posted by: pzhu1968 (view profile)
Posted on: Tuesday, January 11, 2011 at 8:56 AM
Message: Michelle:

I think you were wrong since for intermediate index pages say level 1, internally it should like

empid 1 : page 1
empid 2: page 1
empid 3: page 1
.
empid 300: page 2
..
i/o
empid id 1: page 1
empid id 300: page 2
...

each clustered key must have one entry in level 1 index page.

Subject: Awesome Post
Posted by: zalak (view profile)
Posted on: Monday, January 17, 2011 at 1:25 AM
Message: Thanks ...It's A Very Good Article...So Far I Have Read Many Articles But I Can Say First Time I Understand Difference Of Uniqueidentifier And Int And Plus Clustered Index And Non-Clustered Index In This Much Detail...

Subject: Asymmetrical B-tree
Posted by: PeterMarriott (view profile)
Posted on: Monday, January 17, 2011 at 3:09 AM
Message: Thank you,

I nearly always use an every increasing key but when you are getting a lot of new rows the b-tree becomes asymmetrical and need to be rebuilt. Which on very big tables may not be an easy option.

How do you get round this?

Subject: An absorbing read
Posted by: sTTuey (not signed in)
Posted on: Monday, January 17, 2011 at 3:39 AM
Message: Thanks for an informative article.

FYI, Peach comes before Pear, at least using the alphabet I am familiar with...

Subject: index
Posted by: raj (view profile)
Posted on: Monday, January 17, 2011 at 4:04 AM
Message: hey i want to knw, how sql server determines that to store which keys on the root page.. as got to know that it stores all the keys on the root page. so if it stores all the keys on root page then what is the need for intermedeate pages.. and is there any limit on the keys that is stored on any intermedeate page

Subject: http://faihofu.blogspot.com/2011/01/effective-clustered-indexes.html
Posted by: Anonymous (not signed in)
Posted on: Monday, January 17, 2011 at 4:26 AM
Message: Excellent post !

trackback:
http://faihofu.blogspot.com/2011/01/effective-clustered-indexes.html

Subject: quick question
Posted by: frank abagnale (view profile)
Posted on: Monday, January 17, 2011 at 7:45 AM
Message: u mentioned that extra space of min 7 bytes is there in every record . in my case when i ran this
CREATE TABLE dbo.overhead ( myID INT NOT NULL ) ;

CREATE CLUSTERED INDEX CIX_overhead -- not unique!
ON dbo.overhead(myID)

and the min record size was 25 bytes . so an extra 21 bytes is taken . when i add dup myid it(max record size) increased to 33bytes(25 + 4+ 4) as u explained . can u tell me how this extra 21 bytes is used i mean what is stored in this extra 21 bytes (in my case) or 7bytes( in ur case).
btw great article

Subject: Small correction
Posted by: hallidayd (view profile)
Posted on: Tuesday, January 18, 2011 at 1:31 AM
Message: "a non-clustered index contains the clustered index key in every level of its b-tree structure"
This is true in your example, but not true of every NCI. Unique NCIs do not have the CI values in the intermediate index pages since it is not required for uniqueness.
Good article though - it is difficult to cover this topic and actually add something fresh and new. I think come of the diagrams help illustrate some of the issues really well.

Subject: Loved it.
Posted by: byo (view profile)
Posted on: Tuesday, January 18, 2011 at 12:45 PM
Message: Hi, Michelle.

Thank you for such a detailed explanation and the inner aspects of indexes.

Just a minor correction: sys.dm_db_index_physical_stats is a DMF not a DMV.

Awesome article. Keep up the good work. :)

Regards,

Andre Guerreiro Neto (MCP-MCTS)
http://www.novosis.com.br

Subject: 1 Clustered Index misconception
Posted by: Bryant (view profile)
Posted on: Friday, January 28, 2011 at 5:13 AM
Message: I’ve spent a lot of time tuning SQL and learning what I can do (and get away with) regarding indexing. you description is better than many I’ve seen. through but clear.

One caution I would make. In your table display on page 2 you show the SalesOrderID increasing in order on a page. According to Microsoft SQL Server 2008 Internals, when a clustered index is applied, pages are sorted but entries on a page are not necessarily. They don’t need to be because the entire page is read so any access within the page is at memory speed. If you use a monotonically increasing primary key odds are are pretty high that the page content WILL be sorted. But the deletion of a row in an incomplete page will likely result in out of order rows on the page. It doesn’t affect the correctness of your article but is one of those misconceptions that often occur.

Subject: best explanation i've read
Posted by: sharif (view profile)
Posted on: Saturday, March 15, 2014 at 9:06 AM
Message: thank you.

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

SQL Server XML Questions You Were Too Shy To Ask
 Sometimes, XML seems a bewildering convention that offers solutions to problems that the average... Read more...

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.