19 February 2009

Why should I rebuild a table after dropping or adding a column?

The amount of space used by table rows can be important, and there are good articles on the web that show ways to estimate the amount of space that a table row uses. However, the actual space used by a table row could be more than what we would have calculated. The problems come when we modify the schema of a table. Sometimes it is necessary to add a column to a table or to drop a column from a table. This operation is apparently straightforward: when you drop a column, it should free up the space, and a new column should use more space. However, when a column is dropped, SQL Server will not go and modify all the database pages for your table, free the space, move rows around, etc. By dropping a column, you may end up just updating the metadata, and your rows will still consume the same amount of space as before. This could have unpleasant consequence, such as in the following example:

CREATE TABLE dbo.wastedSpace1
      id INT,
      columnWeWillDrop NCHAR(3500),
      lastColumn INT DEFAULT 255

We create a table that has three fixed length columns. One of these columns (columnWeWillDrop) will consume 7000 bytes (NCHAR uses UCS-2 encoding, which uses two bytes per character)
We insert a single row into this table like:

INSERT INTO wastedSpace1 VALUES (1,REPLICATE(N’a’, 3500), 255)

In this example I put the string of 3500 letter ‘a’ into the columnWeWillDrop column. This will help us to identify it easier on the database pages. Indeed, let’s look at the relevant database page:

SELECT  first_page
FROM    sys.partitions sp
        JOIN sys.system_internals_allocation_units siau
ON sp.partition_id = siau.container_id
WHERE   sp.OBJECT_ID = OBJECT_ID(‘dbo.wastedSpace1’)
        AND sp.index_id = 0

In my case the result is 0x590100000100. Note that I’m using SQL Server 2008, for earlier version you will need to use different system tables/views. This is the page id of the first page for our table, the page number is 0x159 and the file number 1 (0x590100000100). Since we have not added a clustered index to the table, it will be a heap, and data pages for heaps are doublelinked, so from the first page we will be able to explore the whole table. The content of the first page is (0x159 equals to 345 decimal):

DBCC PAGE (2,1,345, 3)

The first parameter is the database id (in my case, since I worked in tempdb it is 2), the second parameter is the file id, the third is the page id, and the last parameter influences the details returned by the DBCC PAGE command. We can see that it contains the character ‘a’, the contents of our column ‘columnWeWillDrop’

Now let’s drop this column:

ALTER TABLE wastedSpace1 DROP COLUMN columnWeWillDrop

If we look at the database page, we will be surprised to see the large number of ‘a’s. While a select * from our table will not return the dropped column, it still uses physical space!

So what will happen if we add a few columns to our table? SQL Server can accommodate fixed length columns with up to 8000 bytes of total length. We currently should use 8 bytes (two times four for the two integers) plust a bit of overhead.

ALTER TABLE wastedSpace1 ADD  newColumn1 NCHAR(300)

This works as expected. In theory we should have plenty of space left, as we are using about 608 bytes (well below the 8000 byte limit) plus some overhead. So let’s try to add another fixed length column:

ALTER TABLE wastedSpace1 ADD  newColumn2 NCHAR(300)

We get an error message:

Msg 1701, Level 16, State 1, Line 2

Creating or altering table ‘wastedSpace1’ failed because the minimum row size would be 8215, including 7 bytes of internal overhead. This exceeds the maximum allowable table row size of 8060 bytes.

Unfortunately the space used by our dropped column will remain to be wasted. We need to rebuild the table. In case of a heap, we could select into another table, or create a clustered index. In case of a table that has a clustered index, we could use the ALTER INDEX .. REBUILD

These operations will be expensive, because likely all the database rows will be modified. However, this is the price for the speed of the ALTER TABLE DROP COLUMN operation, which is just a meta data operation. It is nice to see however, that SQL Server 2005 and 2008 has improved since 2000 in respect to these schema operations, and many repeated column add/drop operations do not always result in wasted space.

With variable length columns the amount of wasted space is not that significant, but it still exists. It is worth while exploring the system_internals_partition_columns system view and see where fixed length columns start (these are the ones with a positive number in the leaf_offset column), and check if they form a contiguous row data area.

SELECT  sc.name, sipc.leaf_offset, sipc.max_inrow_length
FROM    sys.partitions sp
        JOIN sys.system_internals_partition_columns sipc
             ON sp.partition_id = sipc.partition_id
        JOIN sys.columns sc
             ON sc.column_id = sipc.partition_column_id AND sc.OBJECT_ID = sp.OBJECT_ID
WHERE   sp.OBJECT_ID = OBJECT_ID(‘dbo.wastedSpace1’)

If you are interested to hear more about the hows and whys of the above problem, come to my session at SQL Bits 4 in Manchester on 28 March 2009.

Also, if you happen to be in Neuss, Germany 22 April 2009, do come to my session at the European PASS Conference where I’ll be talking about the various ways to pass parameters in SQL Server.

For more articles like this, sign up to the fortnightly Simple-Talk newsletter.


  • Rate
    [Total: 0    Average: 0/5]

András Belokosztolszki is a software architect at Red Gate Software Ltd. He is a frequent speaker at many UK user groups and events (VBUG, NxtGen, Developer’s Group, SQLBits). He is primarily interested in database internals and database change management. At Red Gate he has designed and led the development of many database tools that compare database schemata and enable source control for databases (SQL Compare versions 4 to 7), refactor databases (SQL Refactor) and show the history of databases by analyzing the transaction log (SQL Log Rescue). András has a PhD from Cambridge and an MSc and BSc from ELTE, Hungary. He is also a MCSD and MCPD Enterprise. See my blogs on simple-talk.

View all articles by András Belokosztolszki