Richard Mitchell

Project Manager - Red Gate Software
Try the Exchange Server Archiver Beta - you know it makes sense!

Beware ye of ancient columns

Published Thursday, July 12, 2007 9:55 AM

I'm sure you all make changes to your schema over time adding columns to tables and occassionally dropping them. But were you aware that even though you drop a column from a table the space in the row continues to be taken up by the ghost of the column that you deleted. So if you make lots of schema changes to a database over time the row becomes much larger than it needs to be.

Of course this has the interesting effect that in the syscolumns table the colid for your rows aren't sequential so they can't be used as an index.

Just a little thing to be aware of and one of those things that doesn't seem to be mentioned in books on-line ( at least in the 5 minutes I spent looking for it this morning ).

CREATE TABLE ItsTooBig (col1 CHAR(100), col2 CHAR(100))
SELECT namecolidxoffset
   
FROM syscolumns
   
WHERE OBJECT_ID('ItsTooBig'id

DECLARE @i INT
SET 
@i 1
WHILE @i 100
BEGIN
   INSERT INTO 
ItsTooBig VALUES (1,@i)
   
DECLARE @command VARCHAR(50)
   
SET @command =
     
'alter table ItsTooBig drop column col'
     
CAST(@i AS VARCHAR(3))
   
EXEC(@command)
   
  
SET @i @i 1
   
  
SET @command =
     
'alter table ItsTooBig add col'
        
CAST((@i+1AS VARCHAR(3))
        + 
' char(100)'
   
EXEC(@command)

   
IF @i 10 0
      
SELECT @inamecolidxoffset
        
FROM syscolumns
        
WHERE OBJECT_ID('ItsTooBig'id
END
DROP TABLE 
ItsTooBig

Take care out there people - those columns are out to get you.

Comments

 

Phil Factor said:

Hmm. interesting. Can you please demonstrate the effect in TSQL? I've seen some wacky code in the past that regularly creates and drops columns in tables. (on a production database in a Government Department). This would cause havoc.
July 12, 2007 4:40 AM
 

Richard Mitchell said:

I've included the SQL in my article above....

As you can see from this the offsets keep on increasing until you can't insert any space any more and the add column fails. About the only time this won't happen is if you drop and add the last column of the table. Have a play with the SQL and see the effects for yourself.
July 12, 2007 6:56 AM
 

Adam Machanic said:

July 12, 2007 11:42 AM
 

Phil Factor said:

It looks like the error happens at the point where the total FIXED width of the colums that you have added, whether subsequently deleted or not, exceeds 8000 or so bytes.
July 12, 2007 1:09 PM
 

Richard Mitchell said:

I'm not sure it's a bug. I'm sure that Microsoft sat down and designed this compromise situation. As you can imagine if SQL Server tried to re-use the space it would potentially have to go through every row of the table and set the data that was in those bytes to be the new default or reset them to null. This could be a very long operation and you would end up eventually with fragmentation on the page itself. I suppose this is something for those people - like Phil mentioned - who have code that creates and removes columns automatically as this could be potentially catastrophic in the longer term.

You're right of course Phil in that the bug doesn't manifest on insert it manifests when the schema change attempts to happen - my poor English showing through again :)
July 13, 2007 2:01 AM
 

RobertChipperfield said:

As Richard says, I think they've gone for a trade-off. If you do this rarely, then the win you get by not having to rebuild every single row in the table is huge, and the cost of having some wasted space isn't tragic (most of the time).

Admittedly if this is something you do a lot, you should watch out - and it can be fixed:

use tempdb
create table abc (a int, x char(3000))
create clustered index CI_abc on abc (a)
go
alter table abc add y char(3000)
go
alter table abc drop column x
go
alter index CI_abc on abc rebuild -- This rebuilds the clustered index, freeing up the space
alter table abc add x char(3000) -- This now succeeds
go
drop table abc
go

Now, I'm not sure how to do this on a heap table, but I'd guess it's possible.
July 13, 2007 5:39 AM
 

Borba the Geek said:

Kalen Delaney mentions the issue on p245-248 of her "Storage Engine" volume. Possibly one minor reason that people don't come across this all that often is that it would only happen in a pure T-SQL ALTER scenario, and never when using the SSMS graphical tools, which would drop and recreate the table. Or am I wrong in thinking that SSMS recreates tables when dropping columns as well as when adding them?
July 13, 2007 5:11 PM
 

Richard Mitchell said:

Just thought I'd have a check using profiler. Nope SSMS doesn't rebuild the table and the same effect of the gradually increasing offsets is there for all to see. In fact the SQL that SSMS executes is...

ALTER TABLE dbo.ItsTooBig ADD
Col12 char(100) NULL

go
ALTER TABLE dbo.ItsTooBig
DROP COLUMN col10

go


So not really different from my example at all.
July 16, 2007 2:22 AM
 

Jerome said:

Hang on, Richard. How likely is it that anyone would choose to repeatedly drop and add a column that wasn't the last one he'd done it to? You can repeatedly add and drop  columns forever without any ill effect. It only causes problems if you repeadly choose to drop a different column to the last one you created, and so on....
July 16, 2007 12:43 PM
 

RobertChipperfield said:

Hi Jerome,

If it's a fixed width column, even if it is the last column you added, you still hit the problem. I say that assuming at least one row has been inserted with the now-dropped column in place - they might have optimised out the case where no inserts were made with a particular schema.

The reasoning behind this is that rows aren't tied to schema versions, so without scanning all rows in a table, you can't tell whether it's safe to reclaim the space.

With variable width columns the issue is less bad, because of course the column can be variable width (as the name kinda implies). However, even then it still consumes space in the null bitmap, so *theoretically* (and this really is a bit of an extreme case), you could fill up your rows with null bitmap! However this happens at a much slower rate, because each column only requires one bit...

Rob
July 16, 2007 1:52 PM
 

Eric Russell said:

I first stumbled across this side effect of dropping columns a few years ago while developing a large data warehouse. On a development server where columns may be occasionally added or dropped and maintenance plans are not regularly scheduled, it can add up quick. An even more serious problem is when data modification scripts are pushed to a production server without anticipating the issue.
When you consider how database files and pages are structured, the only way to recover this space is to rebuild the table. If it exists, rebuilding the clustered index will fix it, so it may be taken care of if DBREINDEX is part of the maintenace plan, which would probably run longer than usual immediately after deploying an add column script.

I wrote the following script to identify if the issue actually exists in a database.

select
so.name as TableName,
-- Number of logical columns:
count(*) as CountCols,
-- Highest column ID:
max(sc.colid) as MaxColID
from syscolumns as sc (nolock)
join sysobjects as so (nolock)
on so.id = sc.id and
-- Join only user tables
so.type = 'U'
group by
so.name
-- If the highest column id <> number of columns,
-- then this would possibly indicate the presence
-- of a dropped column and unreclaimed space.
having
max(sc.colid) <> count(*)
July 17, 2007 10:15 AM
 

RgR'us said:

An even more interesting fact.  I once built an error log import job using DTS.  I can't remember the exact reasons, but I needed to Import the data, THEN add an identity column for row identification.  I needed the Identity column to do the transforming and loading into the perm tables.  However I kept all the rows that failed in the base import table for future refference.  And for the next import to work, I had to DROP the identity column (must likely a setting I never found I could use to make the import work with an identity column).

I never had any problems with SPACE in that table.  But after a few weeks or months of running that DTS every few hours I ran into another problem.  The colid column had reached its 1024 limit.  So even if I still had room to make the new column, the colid would forbid me to add a new one.  The only workaround for this one was to DROP / Rebuild the table every so often.
July 22, 2007 8:59 AM
 

Richard Mitchell said:

That could well have been the old...
SET IDENTITY_INSERT {table} ON

You can now insert values directly into your identity column and then when you've finished...

SET IDENTITY_INSERT {table} OFF

Interestingly I've just read that you can only have one table per session with IDENTITY_INSERT turned ON. Never knew that.
July 23, 2007 1:10 AM
You need to sign in to comment on this blog


















<July 2007>
SuMoTuWeThFrSa
24252627282930
1234567
891011121314
15161718192021
22232425262728
2930311234
Niklaus Wirth: Geek of the Week
 It is difficult to begin to estimate the huge extent of the contribution that Niklaus Wirth has made to... Read more...

Building an Exchange Server 2007 environment
 Of course, changing a 32,000 mailbox system, based in 40 Exchange Servers, to a centralised 25,000... Read more...

Manage Stress Before it Kills You
 The key to a long career in IT is in learning how to cope adaptively with stress. Matt Simmons, like... Read more...

Expecting the Worst
 Optimists are often disappointed Read more...

To Boldly Ask IT for Development Work
 Phil has always been mystified by the way that, in Science-Fiction films, the crew of space-ships are... Read more...