|
|
Focus on SQL Server
Check out my Simple-Talk articles.
Published
Friday, February 27, 2009 10:57 AM
For a book project I am working on, I have started compiling a list of reasons why heaps aren't usually a good choice when designing a database. Now, I didn't say that heaps are always bad, but in most cases, I think they should be avoided (read the list to see why). As I compiled the list, in some cases I have had the opportunity to verify that they are true, but in other cases, I have not had the time to verify them. I would like your input on the list. Is there anything on the list that is not true, or only partially true (under some circumstances, but not others), and what other reason are there to avoid using heaps that have I left out? I look forward to your feedback. - If non-clustered indexes are not added to a heap, then all queries against a heap will require table scans to retrieve the requested data. If the heap is large, then these queries will be very resource intensive and hurt SQL Server's overall performance.
- Since the data in a heap is unordered, performing a table scan on a heap can cause a lot of extra I/O activity because inefficient random reads, not efficient sequential reads, are more the norm.
- While a non-clustered index can be added to a heap to speed up some queries, when the non-clustered index is non-covering, the use of a RID bookmark lookup is required. A RID lookup means that once the record(s) to be returned by the query are identified in the non-clustered index, additional reads (the RID bookmark lookup) must be made of the related rows in the heap, so that all of the data requested by the query is returned. This is not very I/O efficient, especially if many rows are returned. At some point, it may be faster for SQL Server to do a table scan than it is to use a non-clustered index when returning many rows. On the other hand, if the non-clustered index is covering, then the non-clustered can be used to immediately return the data to the user without having to lookup anything in the heap.
Heaps cannot be replicated using SQL Server replication. - If you want to create an XML index on an XML data column, a clustered index must exist on the table.
- If you want to create a spatial index on a spatial data column (GEOMETRY or GEOGRAPHY), a clustered index must exist on that table.
- If a heap has a non-clustered index on it (as the primary key), and data is inserted into the table, two writes have to occur. One write for inserting the row, and one write for updating the non-clustered index. On the other hand, if a table has a clustered index as the primary key, inserts take only one write, not two writes. This is because a clustered index, and its data, are one in the same. Because of this, it is faster to insert rows into a table with a clustered index as the primary key than it is to insert the same data into a heap that has a non-clustered index as its primary key. This is true whether or not the primary key is monotonically increasing or not.
- When data is updated in a heap, and the updated row is larger than the old row and can't fit into the old space, a forwarding record is inserted into the original location that points to the new location of the page. If this happens a lot, then there is a lot of space wasted in a database maintaining the forwarding records. This also contributes to additional I/O activity as both the pointer, and the row, have to be read.
- Even if data updated in a heap is not larger than the old row (the updated data is smaller or the same size than the original data), updating a heap with a non-clustered primary key is slower than updating the same table that has a clustered index as the primary key. This is because updating a table with a clustered index is less write intensive than updating a heap with a non-clustered index as its primary key.
- If a row is deleted from a heap with a non-clustered index as its primary key, it is slower than deleting the same row from the same table with a clustered index as its primary key. This is because it takes more I/O to perform this task on a heap than on a clustered index.
- When data is deleted from a heap, the data on the page is not compressed (reclaimed). And should all of the rows of a heap page are deleted, often the entire page cannot be reclaimed. This not only wastes space, it contributes to fragmentation of the data pages within a database.
- If you take two identical tables, one that is a heap with a non-clustered index as its primary key, and a table that has a clustered index as its primary key, the heap with the non-clustered index will be substantially larger, wasting valuable space and increasing disk I/O.
- The ALTER INDEX rebuild and reorganize options cannot be used to defragment and reclaim space in a heap (but they can used to defragment non-clustered indexes on a heap). If you want to defragment a heap in SQL Server 2005, you have three options: 1) create a clustered index on the heap, then drop the clustered index; 2) Use SELECT INTO to copy the old table to a new table; or 3) use BCP or SSIS to move the data from the old table to a new table.
You need to sign in to comment on this blog
About bradmcgehee
Brad M. McGehee is a MCITP, MCSE+I, MCSD, and MCT (former) with a Bachelor’s degree in Economics and a Masters in Business Administration. Currently the Director of DBA Education for Red Gate Software, Brad is an accomplished Microsoft SQL Server MVP with over 16 years SQL Server experience, over 7 years training experience, and has been involved in the industry since 1982.
Brad is a frequent speaker at SQL PASS, European PASS, SQL Server Connections, SQLTeach, devLINK, SQLBits, SQL Saturdays, TechFests, Code Camps, SQL Server user groups, and other industry seminars, where he shares his 16 years of cumulative knowledge and experience.
In 2009, Brad made 33 public presentations to a total of 1,853 attendees, in six different countries. In 2010, Brad made 31 public presentations to a total of 3,156 attendees in two different countries.
Brad was the founder of the popular community site SQL-Server-Performance.Com, and operated it from 2000 through 2006, where he wrote over one million words on SQL Server topics.
A well-respected and trusted name in SQL Server literature, Brad is the author or co-author of more than 15 technical books and over 275 published articles. His most recent books include How to Become an Exceptional DBA (2nd Edition), Brad's Sure Guide to SQL Server 2008: The Top Ten New Features for DBAs, Mastering SQL Server Profiler, and Brad’s Sure Guide to SQL Server Maintenance Plans. These books are available free in PDF format at: http://www.sqlservercentral.com/Books/. He blogs at www.bradmcgehee.com.
|
|