Av rating:
Total votes: 32
Total comments: 7


Francis Hanlon
On the Trail of the Expanding Databases
11 October 2008

It is sometimes difficult for other IT people to understand the constraints that DBAs have to work under. So often the ideal or obvious solution is frustratingly out of reach because of 'political' reasons that seem baffling to the spectator. Francis Hanlon describes a typical incident: It is essential reading for anyone wishing an insight of the problems that DBAs face

On the Trail of the Expanding Databases

I look after a server that contains three identical production databases, used by three different health care clinics. Growth has always been fairly rapid but recently, I noticed that these databases were growing at a rate that far exceeded the norm. They were part of a purchased package, and I knew the vendor was in the process of implementing an interface from these databases to another system. Was this abnormal growth a side-effect of something they'd done while implementing this new interface, or completely unrelated?

It was time to investigate.

Monitoring Database Growth

Now, you might ask, how did I know that the database growth was accelerating at a greater rate than it had in the past? There are several available methods of tracking database growth but the one I use most often is based on a script I found at SQLServerCentral:

 http://www.sqlservercentral.com/scripts/Maintenance+and+Management/30217/

I won't go more into specifics except to say that it important not only to know how big is your database today but how big was it last month, and the month before, so that proper space planning is possible.

Of course, in this case, the database was experiencing abnormal growth so my careful planning was moot, and I was running out of free space.

Monitoring Table Space

Armed with the knowledge that my database was growing rapidly, I needed to gather more specific information. What was causing the rapid growth and did it seem likely to continue or was it just a one-time data load event?

I needed to check the size of all the tables in the database and find out if any new tables had been created, or if there were any old tables being more heavily used as a result of the vendor's new interface. Was the database experiencing overall growth amongst all the tables, or just in some history tables, or were there specific interface tables that were rather large?

I spoke to the vendor and was told that they were making no schema changes. What they were doing was implementing a stored procedure that wrote to several existing tables that, in the absence of this new interface, would remain unpopulated.

On SQLTeam.com, I discovered a script that shows a breakdown of space used by table, including the amount of free space. This script works well in both SQL 2000 and SQL 2005, although I often change the script to print the output by the unused column, as opposed to the table_name column.

NOTE
If you are using SQL Server 2005 SP2,then an alternative to the above script is to make use of the included reports. Simply right-click on the database, choose reports, then standard reports and Disk Usage by Table. While this is an excellent report, it does not allow you to sort the output, which is an advantage of the script.

Using this script, I discovered that one of the tables affected by the vendor's changes was the major culprit in the growth of the database. Although it contained only 194,000 rows, in terms of physical space it was hogging 9.2 GB, of which 8.5 GB was marked as unused. However, I also found other problem tables that were not affected by the vendor's changes. The next worst offender, with 23,000 rows, took up 710 KB of storage and was marked as having 620 KB of unused space. Using this same script, I found out that two of my other databases that were completely unaffected by the vendor's changes were also exhibiting, to a lesser degree, abnormal growth of unused space. It would seem the vendor's actions had helped me to notice the growth of unused space but were not the only cause of it. Some urgent action was required.

Reclaiming Unused Space

I continued to research. Several searches on terms such as 'reclaim space in SQL' returned results that pointed toward the DBCC CLEANTABLE command, which is said to 'reclaim space from dropped variable-length columns in tables'.

Well, given that I hadn't dropped any columns, I guessed that this was unlikely to work. However, I couldn’t see what harm it could do either so I gave it a try on a restored version of our production database that I put on a test box. Having run the command I found that (surprise, surprise) virtually no space was reclaimed.

My next thought was to run a DBCC DBREINDEX on this table to fix the unused space. Again, this had little effect. The index fragmentation improved but it never was very bad to begin with since indexes get defragged everyday in this database. In any case, my objective was to reclaim space and the DBCC DBREINDEX reclaimed very little space so it was not what I was after.

Clustered versus Non-Clustered Indexes

During these investigations, I noticed that each of the offending tables did not have a clustered index. What if I made the primary key a clustered index then changed it back to a non-clustered index so the vendor couldn't say I changed their schema?

On the 23K row table, I scripted out the primary key index, which was a unique non-clustered index, dropped it and re-added it as a clustered index, then dropped it again and returned it to its previous state:


ALTER TABLE [dbo].[HL7Message] DROP CONSTRAINT [PK_HL7Message]
ALTER TABLE [dbo].[HL7Message] ADD  CONSTRAINT [PK_HL7Message] PRIMARY KEY CLUSTERED
(   [HL7MessageID] ASC
)WITH FILLFACTOR = 90
ALTER TABLE [dbo].[HL7Message] DROP CONSTRAINT [PK_HL7Message]
ALTER TABLE [dbo].[HL7Message] ADD  CONSTRAINT [PK_HL7Message] PRIMARY KEY NONCLUSTERED
(   [HL7MessageID] ASC
)WITH FILLFACTOR = 90

The results were excellent – all unused space was reclaimed – and the schema was back the way it was before I did anything.

One table done; now for the big one! However, for the 194K-row table, the trick did not work as well as it did on the first table. The space used by the table shrank from 9 GB to 5.4 GB, of which 4.9G was still unused. I needed to reclaim more space.

Text versus Varchar versus Text in Row

I was intrigued to know why dropping then recreating the indexes on one of the problem tables reclaimed all of the unused space and on the other one only reclaimed half of it

One difference seemed to be that the 194K-row table had a column defined as type TEXT, whereas the other table was using varchars. Was there a problem with text columns and space?  I started searching and found a couple of forum postings that made reference to text and wasted space, and these led me to a Microsoft knowledge based article, called FIX: TEXT/IMAGE Data Storage Space Not Reclaimed Correctly. This article describes a space issue with tables having a text column and indicates the solution is to "Use a SELECT INTO statement to transfer the whole table to a new table in the same database". Now, I did notice that the article described this as a problem with SQL Server 7 and I was running SQL Server 2000. However, when I looked at the database properties I discovered the compatibility level was set to SQL Server 70.

So, I created a script, shown below, to define the 194K row table, and indexes, exactly the same as the original. Before running this script, I renamed the original table (and indexes) to tableOLD. Next, I ran the script to recreate my new table and copied all the rows from the old, renamed table, into the new one.


ALTER DATABASE MED_PANAM SET RECOVERY SIMPLE
GO
--if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[HL7Message]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
--drop table [dbo].[HL7Message]
sp_rename HL7Message,HL7MessageOLD
GO
ALTER TABLE dbo.HL7MessageOLD DROP CONSTRAINT PK_HL7Message
GO

ALTER TABLE [dbo].[HL7MessageOLD] ADD
   CONSTRAINT
[PK_HL7MessageOLD] PRIMARY KEY  NONCLUSTERED
  
(
      
[HL7MessageID]
  
) WITH  FILLFACTOR = 90  ON [PRIMARY]
GO

CREATE TABLE [dbo].[HL7Message] (
  
[HL7MessageID] [int] NOT NULL ,
  
[Entry_datetime] [datetime] NOT NULL ,
  
[Status] [varchar] (2)  NOT NULL ,
  
[To_Module] [varchar] (100)  NULL ,
  
[Direction] [char] (1) NOT NULL ,
  
[Filename] [varchar] (500)  NULL ,
  
[Message_ControlID] [varchar] (15) NULL
)
ON [PRIMARY]
GO

ALTER TABLE [dbo].[HL7Message] ADD
   CONSTRAINT
[PK_HL7Message] PRIMARY KEY  CLUSTERED
  
(
      
[HL7MessageID]
  
) WITH  FILLFACTOR = 90  ON [PRIMARY]
GO

DECLARE @HL7Messagelow INT
DECLARE
@HL7Messagehigh INT
SET  
@HL7Messagelow = 0
SET   @HL7Messagehigh = 25000
DECLARE @max INT
SELECT
@max = MAX(HL7MessageID) FROM HL7MessageOLD

WHILE   @HL7Messagelow <= @max
  
BEGIN
      INSERT INTO
HL7Message
      
SELECT * FROM HL7MessageOLD
      
WHERE HL7MessageID > @HL7Messagelow AND HL7MessageID <= @HL7Messagehigh
      
      
SET   @HL7Messagelow = @HL7Messagehigh
      
SET   @HL7Messagehigh = @HL7Messagehigh + 25000
      
  
END


ALTER DATABASE
MED_PANAM SET RECOVERY FULL
/*
select count(*) from HL7MessageOLD
select count(*) from HL7Message

drop table HL7MessageOLD
*/

Now, when I look at how much space the table was using, I found that there was no unused space at all.

Maybe if this column was defined as varchar instead of text, then I wouldn't have these expanding-table issues in future. I wrote an email to the vendor describing the problems we were experiencing and the solutions I'd explored. I suggested that the primary key indexes for these tables be redefined as clustered indexes, and that the text fields be changed to varchar. The vendor wrote back saying that changing the indexes permanently to clustered indexes would be OK, but that we could not change the text column to varchar as this would mean application coding changes. The Vendor also mentioned the compatibility level must remain at 70.

An alternative to swapping the text column to varchar, would be to instead swap it to use SQL Server 2000's Text in Row feature, a change that the vendor deemed acceptable. However, was this making things too complex? I'd seen articles both praising and condemning Text in Row. More worrying though, was the fact that Text in Row is to be removed from a future version of SQL Server. Microsoft's advice was:

"Avoid using this option in new development work, and plan to modify applications that currently use text in row. We recommend that you store large data by using the varchar(max), nvarchar(max), or varbinary(max) data types. To control in-row and out-of-row behavior of these data types, use the large value types out of row option."

This is all well and good if the database were running in SQL 2005, but my vendor still only certified SQL Server 2000, with the compatibility is set to 70, so using varchar(max) won't work in my case.

Still, I was not convinced I really wanted to go in the Text in Row direction. Was there any other possible solution?

The Solution, Part 1: More Memory, Fewer Tables!

I continued my research and stumbled across another knowledge base article that described circumstances whereby a SQL Server with low memory and lots of tables might significantly increase the unused space for some tables.

This sounded promising because the three databases experiencing this issue were on a server with a total of 50 databases and over 14,000 tables, and having only 2 GB of memory. Could the solution be as simple as moving the problem databases to a new server with fewer databases (fewer tables) and more memory? I had another available server with 8 GB of memory and only 1 user database, with about 1200 tables.

I decided to try moving one of these databases before looking further into Text in Row.  A month later, none of the tables in this database exhibited a high level of unused space. So, here I had the simplest and most effective solution to all the problems: move the problem databases to a server with fewer tables and more available memory.

However, as most DBAs will recognize, the simplest solution isn't always the one you can implement in all cases.

The Solution, Part 2: Implementing Text in Row

I still had two databases on the old server, both containing tables that continued to experience high levels of unused space. For political reasons, I was not allowed to move these databases to the less crowded server. For these databases, I was forced to explore the Text in Row option.

So, once again, I reclaimed the unused space using the previously described script, but this time, when I recreated the problem tables I used the Text In Row parameter:

sp_TABLEOPTION 'APPOINTMENTS_TABLE', 'TEXT IN ROW',200

Several more weeks passed and all was well in the databases; they had ceased to expand abnormally. So, though it was far from ideal due to deprecation of the Text in Row feature, I had a second workable solution to my problems.

Summary

In some ways, my experiences in solving this problem were typical of that of many DBAs. Often, the ideal solution is out of reach for "political" reasons or for other reasons outside the control of the DBA, such as use of vendor applications with specific restrictions, and so on.

My investigations had led me to the ideal solution: moving each of the problem databases to a server with fewer tables and more memory. However, this solution was only available to me in the case of one of the databases.

For the other two databases, I was left with a solution that "worked" but that may be rendered obsolete in future versions of SQL Server. In a bold move, I opted for the status quo, regardless. I continue to make judicious use of Text In Row and await the opportunity to either move to a new server or to upgrade to SQL 2005, and make use of varchar(max).

In any case, continued and close monitoring of the growth of these databases, both at a database level and at an individual table level, is definitely warranted.



This article has been viewed 2531 times.
Francis Hanlon

Author profile: Francis Hanlon

Francis works as the Senior SQL Server DBA for Manitoba eHealth; a Canadian organization providing IT solutions for health care facilities in the province of Manitoba. He has been work involved in IT since using punch cards was a common practice in programming. He has been involved with SQL Server databases since the 20th century.

Search for other articles by Francis Hanlon

Rate this article:   Avg rating: from a total of 32 votes.


Poor

OK

Good

Great

Must read
 
Have Your Say
Do you have an opinion on this article? Then add your comment below:


Subject: Fantastic Writing
Posted by: Anonymous (not signed in)
Posted on: Wednesday, October 15, 2008 at 6:55 AM
Message: Excellent article. The writing style and content made it an interesting and easy to follow read.

Subject: Yes, Very nice.
Posted by: Anonymous (not signed in)
Posted on: Wednesday, October 15, 2008 at 7:56 AM
Message: Thanks.

Subject: Outstanding!
Posted by: Anonymous (not signed in)
Posted on: Wednesday, October 15, 2008 at 9:52 AM
Message: Not only did you provide great information, but you have shown multiple resolutions and reasons for selecting each. We often must work around physical and "political" barriers to achieve, if not the best result, the working solution.

Subject: Good style and Content
Posted by: suomynonA (not signed in)
Posted on: Thursday, October 16, 2008 at 4:14 PM
Message: Did he say Manitoba IT was still using punch cards? (holy Hollerith)... and that it was still the 20th century in the provinces?
Tasty article.
Thanks.

Subject: Style and Content
Posted by: Francis Hanlon (not signed in)
Posted on: Saturday, October 18, 2008 at 12:28 PM
Message: Thanks for the comments suomynonA (aka Anonymous Backwards I presume). I appreciate it. Actually I said I have been involved in IT since punch cards were common. But that was a long time ago we don't use them now. I have been using SQL since the 20th century and now in the 21st I continue to be involved. Even in the provinces the new century arrived, even if that was 6-7 hours after it arrived in London.

Subject: Doubt
Posted by: Bharath Sharma (not signed in)
Posted on: Friday, October 24, 2008 at 8:05 AM
Message: "Well, given that I hadn't dropped any columns, I guessed that this was unlikely to work. However, I couldn’t see what harm it could do either so I gave it a try on a restored version of our production database that I put on a test box. Having run the command I found that (surprise, surprise) virtually no space was reclaimed."-Francis


Hi,

Under the section reclaim unused space you have said that you tried DBCC CLEANTABLE after you restored the database test BOX and then you noticed little improvement in unused space right? It was because when you had taken the backup from production BOX the backup statement itself would have removed the unused space that is why you did not notice improvement after using DBCC clean table command.

Please check Estimating the Size of a Full Database Backup section under link http://msdn.microsoft.com/en-us/library/ms191239(SQL.90).aspx

I appreciate the details given on text data type and alternate solution given by you.

Thank you
Bharath Sharma

Subject: Well Done
Posted by: MAzim (not signed in)
Posted on: Saturday, October 25, 2008 at 2:53 AM
Message: Good article. Simple to understand. The script for generating table usage stats is excelent.

Enter your comment here:

  Name: 
  Subject: 
  Message: 
 
 









Phil Factor
The Data Center that Exploded
 A while back, in a Simple-Talk editorial meeting, someone bet Phil that he couldn't come up with a Halloween story.... Read more...



 View the blog
SQL Toolbelt 2008: Predominantly an Engineering Task
 The conversion of the Red-Gate tools to be compatible with SQL Server 2008 might not seem, on first... Read more...

Audit Crosschecks
 In this short article, the second of a 2-part series, William suggests a solution, using SQL Data... Read more...

SQL Response: The dim sum interview
 Richard Morris met David and Nigel of the SQL Response team, in a dim sum Restaurant in Cambridge. They... Read more...

XML Jumpstart Workbench
 In which Robyn and Phil decide that the best way of starting to learn XML is to jump in and take a ride... Read more...

Discovering Security Uses for SQL Compare
 Much of the security of SQL Server is implemented as part of the database schema. This provides some... Read more...

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 Database design and implementation is the cornerstone of any data centric project (read 99.9% of... Read more...

SQL Server Full Text Search Language Features
 SQL Full-text Search (SQL FTS) is an optional component of SQL Server 7 and later, which allows fast... Read more...

Beginning SQL Server 2005 Reporting Services Part 2
 Continuing his in-depth tour of SQL Server 2005 Reporting Services, Steve Joubert demonstrates the most... Read more...

Executing SSIS Packages
 Nigel Rivett demonstrates how to execute all SSIS packages in a given folder using either an SSIS... Read more...

Over 150,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.

Join Simple Talk