Click here to monitor SSC
  • Av rating:
  • Total votes: 42
  • Total comments: 13
Grant Fritchey

SQL Server Performance Testing

15 May 2006

Clustered indexes and primary keys

By Grant Fritchey

The company I work is for is implementing a custom system, built in Visual Studio and running against SQL Server 2000, to replace some old mainframe applications. It's an all new database design that will meet all of the old application data requirements and a whole slew of new ones. I was responsible for the initial design, which I completed and handed over to a development team. I had been working quite closely with them for some time on the application, but I got pulled in other directions and the app and database had proceeded without me. One day, the development team raised a concern that the database was far too slow to support the anticipated transaction load. I began my investigation.

The problem

The application in question has a few data requirements that are different than the average application. Basically, at its heart, it is a document management system. However, every change to large sections of the data has to be stored historically, so that we have multiple versions of pieces of a given document. Not all of the data stored is recreated for each version. We needed to know that change 'A' was included in version '1', but that change 'B' was in version '3' and, on a completely different piece of data, read another table, and know that change 'Q' was included in version '2'. The data looks something like this:

Table 1

Version

Value

1

'A'

2

3

'B'

Table 2

Version

Value

1

'Some other value'

2

'Q'

3

When queries are run, the latest data in Table 1 is 'B' from version three and the latest data in Table 2 is 'Q' from version 2.

The original design: clustered indexes and non-clustered PKs

The original database design started with a table, Document. Each Document had one or more Versions. The attributes that describe a Document were attached as children to the Version. Since we knew that data access would always be by Document and Version, I made the decision to denormalize the data a bit and keep the DocumentId and VersionId on each table. This is a sample of what the database would have looked like. Names have been changed and the structure simplified for the purposes of the article.

As you can see, my original design, which I refer to as the "switch-board" approach, placed a clustered index on each table on the DocumentId and VersionId in order to group the data for the anticipated access path. This means that in addition to the non-clustered primary key on each table, there is a clustered index.

Or, at least this was how it was supposed to be.

The actual design

What I found was that, after several months on their own, and with input from some other DBAs, the developers had dropped some of the indexes, changed some of the primary keys to be clustered and the indexes to be non-clustered, removed some of the denormalization, and so on. In other words, a total mess.

My initial reaction, after cursing and spitting, was to simply clean up the database and put the developers back on track. It wasn't that simple. During this same period we had a consultant from Microsoft helping out in the database area. He had been talking to the developers about their problems and had proposed, directly to the development lead, a radical change to the design of the database. Now I was presented with more than one solution to the problem.

The proposed new design: compound, clustered PKs

Everyone agreed that the existing database design wouldn't support the required user load, the expected transaction volume, or the amount of data needed. Despite my inclination to revert to my original design, I couldn't simply dismiss the consultant's idea, which was as follows: drop all the indexes, make the primary keys on all tables the clustered index for that table and finally redesign those keys so that each parent table acted as an identifier for the child table, thus creating compound primary keys. His proposed new design can be summarized as follows:

NOTE: The real design, being quite a bit more complex, had some tables with a clustered compound primary key that contained as many as eight fields. Colors and symbols are from ERStudio. Blue represents foreign keys, Red are identifying foreign keys.

This design bothered me. It violated one of the fundamentals that I'd learned and read about for years; namely keeping the primary key small and narrow. It also looked like it would be difficult to maintain. Finally, after arguing back and forth about the merits and drawbacks of each of the designs, we decided that testing them was the only way to be sure.

Implementing and testing the designs

In all, I had to implement and test three different designs:

  1. The current, "messed up" design (used as our baseline case).
  2. My original "switch-board" design, as described in Figure 1 (produced by cleaning up the current "messed up" design).
  3. The proposed new "Compound PK design, as described in figure 2.

Implementing the designs

Having created each basic design, I needed to load each with the various defined data loads under which we wished to test. I also had to modify the stored procedures that ran the queries against these designs, as appropriate.

Loading the data

I needed to test under multiple data loads, projecting two, four, and eight years worth of data, and therefore create multiple databases, for each of the three designs. Creation of these data loads was simple in concept but quite labor-intensive. I had a known client list from which to pull a data distribution that would accurately represent the real business data and provide us with some of the largest anticipated data sets. Further, by loading from known data, it made it possible to define our test transactions based on that same known data. In other words, I knew which organizations we would be calling and could make that list known to the testing tool. In order to facilitate the data load process, I took advantage of a tool from the Quest suite called Data Factory. I first defined a two year load, verified that it was working correctly, backed up the database, and then simply doubled the initial row counts across the tables in order to get to the four and then eight year data loads. Each went into a separate database. In the end, I had nine different databases representing the three designs to be tested.

Verifying the stored procedures

Next, I had to modify and verify the stored procedures to work with the "cleaned up" original design and the new Compound PK design (obviously, the procedures for the existing "messed up" design I simply left alone). The stored procedures consisted of a set of simple, required INSERT, UPDATE and SELECT commands plus some more complex SELECT statements needed to retrieve the latest values across the historical data, as described earlier. The cleaned up version required me to check a number of stored procedures and rewrite quite a few of them. The new "compound PK" architecture required me to completely rewrite the join criteria on all the procedures. I made the initial pass through this and then had the MS consultant clean up my work so that his design was represented in the best possible light. I wanted it to fail, but I wanted it to fail fairly.

Even in the original design the queries to retrieve the latest values across the historical, versioned data were rather cumbersome to write so we had spent a lot of time on getting them just right. The original queries, with only the switchboard style of primary-to-foreign key relationships, looked something like this:

SELECT …
FROM dbo.Document d
   INNER JOIN dbo.Version v
      ON d.DocumentId = v.DocumentId
   LEFT OUTER JOIN dbo.DocumentCountry dc
      ON d.DocumentId = dc.DocumentId
      AND dc.VersionId = (SELECT MAX(dc2.VersionId)
                          FROM dbo.DocumentCountry dc2
                          WHERE dc2.VersionId <= v.VersionId
                          AND dc2.DocumentId = d.DocumentId
                         )
   INNER JOIN dbo.DocumentLimit dl
      ON d.DocumentId = dl.DocumentId
      AND dl.VersionId = (SELECT MAX(dl2.VersionId)
                          FROM dbo.DocumentLimit dl2
                          WHERE dl2.VersionId <= v.VersionId
                          AND dl2.DocumentId = d.DocumentId
                         )
   INNER JOIN dbo.LimitQualifier lq
      ON lq.DocumentLimitId = dl.DocumentLimitId
      AND d.DocumentId = lq.DocumentId
      AND lq.VersionId = (SELECT MAX(dl2.VersionId)
                          FROM dbo.LimitQualifier lq2
                          WHERE lq2.VersionId <= v.VersionId
                          AND lq2.DocumentId = d.DocumentId
                         )

 These queries would get the latest value from each of the separate tables using the MAX function.

NOTE: We experimented with using TOP(1) instead of MAX, and in some situations it performed better. However, use of the aggregate function gave a more consistent performance across the system, so we standardized on it.

Clearly, as the number of table joins grows, these queries can become quite unwieldy. However, the new compound PK architecture produced different and even more unwieldy queries:

 SELECT …
FROM dbo.Document d
   INNER JOIN dbo.Version v
      ON d.DocumentId = v.DocumentId
   LEFT OUTER JOIN dbo.DocumentCountry dc
      ON v.DocumentId = dc.DocumentId
      AND dc.VersionId = (SELECT MAX(dc2.VersionId)
                          FROM dbo.DocumentCountry dc2
                          WHERE dc2.VersionId <= v.VersionId
                          AND dc2.DocumentId = v.DocumentId
                         )
   INNER JOIN dbo.DocumentLimit dl
      ON v.DocumentId = dl.DocumentId
      AND dl.VersionId = (SELECT MAX(dl2.VersionId)
                          FROM dbo.DocumentLimit dl2
                          WHERE dl2.VersionId <= v.VersionId
                          AND dl2.DocumentId = v.DocumentId
                         )
   INNER JOIN dbo.LimitQualifier lq
      ON dl.DocumentId = lq.DocumentId
      AND lq.VersionId = (SELECT MAX(dl2.VersionId)
                          FROM dbo.LimitQualifier lq2
                          WHERE lq2.VersionId <= v.VersionId
                          AND lq2.DocumentId = v.DocumentId
                         )
      AND dl.LimitTypeId = lq.LimitTypeId

In order to keep the article manageable, and because I can't show the real design, these queries seem almost as simple as for the original design. However, trust me, as the structure gets deeper and more complex, some of the joins are between six and eight columns wide rather than the currently displayed maximum of four. This causes not only the main part of the query to grow, but the subqueries to get extended as well.

Testing the designs

In order to test which design would facilitate the fastest queries and function in the most scalable way, I had to come up with a method of performing database-only testing. This was the only way to eliminate any concerns that we were seeing artifacts from the user interface or business façade, etc. As noted, I needed to test each design under three different data loads and I also needed to ramp up the simultaneous user load in order to test the scalability and measure the breakdown point of each design, for each data load. The tool that answered all these needs was Quest's Benchmark Factory.

I had to create a set of transactions to run the required queries against each of the database designs. Due to a limitation in Benchmark Factory (it can't handle output parameters in SQL Server) I was forced to create a wrapper procedure that encapsulated the required series of reads, writes, udpates, and finally a delete, that represented how the application would be making the database calls. This wasn't the optimal way for the database to perform since the queries were running within a single transaction rather than as a series of transactions and therefore not the best way to run the tests. However, as long as I did the same thing to each of the database designs, I'd be comparing apples-to-apples.

Test results

I set up the tests to run for 30 minutes at each of a variety of concurrent user loads; 2, 4, 8, 16, 20 (20 being the max for our license). This meant 2.5 hours at a pop for each test. What with reseting the databases between tests and gathering up all the test data, I could only complete two tests a day. Ultimately, it took most of a week to hack through all this stuff, on top of the three weeks it took to prepare it all. However, in the end, I had some answers.

Existing Design (baseline case)

The first tests were run against the existing design, messed up indexes and everything else that was wrong with it. It was hardly a shock to see that it didn't perform very well. The results for a two year data load showed an incredibly slow performance, even when one takes into account that the "transaction" displayed represents the wrapper stored procedure and not the execution of each of the individual procedures within the wrapper.

Userload

TPS

Errors

Avg Time

Min Time

Max Time

2

1.21

0

1.659

0.671

2.513

4

1.01

0

3.952

1.3

5.304

8

1.01

0

7.882

2.852

11.239

16

1.39

0

11.541

4.882

24.826

20

0.93

0

21.529

7.94

28.353

Things went from bad to worse. The four year data load started deadlocking immediately. I tried resetting the database, and even SQL Server, to clear up the problem, but nothing worked. I tried running the eight year data load and got a partial set of results before the deadlocks started again:

Userload

TPS

Errors

Avg Time

Min Time

Max Time

2

1.07

11

1.864

1.087

6.385

4

0.77

399

5.17

0.569

21.3

NOTE: When the information collected was reviewed, we found that we were getting lots of table scans, index scans, and lock escalation, all of which were contributing to the deadlocks.

Cleaned up "switch-board" design

I believed very strongly that the cleaned up indexes and primary keys would work well enough. The tests supported this quite strongly, as you can see for the two year data load case:

Userload

TPS

Errors

Avg Time

Min Time

Max Time

2

2.4

0

0.833

0.115

2.985

4

4.22

0

0.947

0.088

3.239

8

9.99

0

0.8

0.04

3.099

16

23.79

0

0.672

0.06

8.821

20

23.26

0

0.859

0.136

9.006

When compared to the baseline, I got a very consistent level of performance, with the transactions completing in an average time of between 0.6 and 0.9 seconds. The design produced minimum responses times down to 0.04 s (compared to around 0.7 s for the existing design). The maximum response time never exceeded 9 s, whereas for the existing design it exceeded 28 s for a single transaction.

I approached the test for the to the four year data set with some degree of trepidation, since the existing design had failed so spectacularly. I shouldn't have worried:

Userload

TPS

Errors

Avg Time

Min Time

Max Time

2

2.38

0

0.84

0.345

2.694

4

4.73

0

0.844

0.049

2.697

8

9.8

0

0.816

0.042

3.329

16

22.96

0

0.698

0.076

11.532

20

22.2

0

0.902

0.108

14.196

This showed that, despite doubling the amount of data in the system, not only did I avoid deadlocks, but I got results that were entirely consistent with the two year data test, including that odd sweet spot at 16 users. The maximum times were a bit higher, but not alarmingly so. I went on to the eight year test feeling more optimistic:

Userload

TPS

Errors

Avg Time

Min Time

Max Time

2

2.44

0

0.818

0.149

2.609

4

4.84

0

0.825

0.063

2.838

8

9.98

0

0.801

0.042

3.541

16

23.02

0

0.694

0.07

10.235

20

22.23

0

0.899

0.092

10.133

The results were the same. Use of two, four and eight years worth of data didn't cause a significant change in performance. I had captured PerfMon data as well and it showed how the load scaled as more users were added. This is from the eight-year test:

As you can see, as the number of users increased, CPU & disk I/O went up accordingly. The interesting moment occurs when going from eight to sixteen simultaneous users. The CPU goes quite high, but the disk queuing goes through the roof (more on this later).

If I hadn't had the consultant's design in hand, I would have declared the cleaned up design a success and walked away quite satisfied. But, I had to run the same tests against this new design.

The compound PK design

My initial estimate was that this new design might be about 10-20% faster, which quite honestly wouldn't justify the additional overhead of rewriting all the stored procedures with extra code, and so on. The two year data tests came back:

Userload

TPS

Errors

Avg Time

Min Time

Max Time

2

19.15

0

0.104

0.038

2.091

4

30.5

0

0.13

0.038

2.354

8

30.39

0

0.262

0.05

7.015

16

26.9

0

0.594

0.102

8.012

20

27.01

0

0.74

0.09

11.412

I was a little surprised. OK, more than a little surprised. At the lower end, 2-8 users, this was running between four and eight times faster. However, for the higher user loads the performance was only 10-20% better – in line with my original expectations.

Something was going on here that merited a lot more investigation. I didn't believe we'd hit deadlocks above 8 users, but given the failure of the baseline case at the four year data load, I wasn't sure:

Userload

TPS

Errors

Avg Time

Min Time

Max Time

2

20.38

0

0.098

0.036

1.933

4

29.32

0

0.136

0.038

2.105

8

29.22

0

0.276

0.05

8.447

16

28.57

0

0.562

0.08

11.983

20

27.74

0

0.721

0.09

11.079

The results were again consistent with those obtained for the two year data load, with the same performance trend as for the cleaned up design, but this time with a slight drop-off in TPS performance at sixteen users. Nothing for it but to get the eight year data load results:

Userload

TPS

Errors

Avg Time

Min Time

Max Time

2

15.99

0

0.124

0.038

2.004

4

27.55

0

0.145

0.038

2.789

8

29.29

0

0.276

0.044

6.723

16

29.83

0

0.536

0.078

8.834

20

29.14

0

0.687

0.103

8.695

These were also very consistent with the other results. Performance was decidedly better at low user loads, but only somewhat better at higher loads. The Perfmon data gave a better indication of why things were behaving as they were:

As you can see, disc queuing was quite variable, but on average, much lower than for the cleaned up design. It was the CPU that was getting the most use in this design. The strange decline in performance at 16 users directly corresponded to the CPU maxing out at about that value. Clearly, as CPU's were added or upgraded, this design would be faster than the original.

The casue of this radical increase in performance was found in the query plans.

The query plans

At the conclusion of the testing, a few things were very clear. The existing design was not only slow, but missing or improper clustered indexes were leading to table scans and unnecessary lock escalation, resulting in deadlocks. We could get adequate performance by simply cleaning up the indexes and stored procedures as they existed. But, when you can absolutely point to a four fold increase in performance, you have to assume that it's worth the time and effort to achieve it. The only question remaining was: why did the new design outperform the old one?

The answer was provided by the respective query plans. In my switch-board design, the clustered indexes were being used quite well. However, because the primary key structure was independent of the clustered index, we were seeing a bookmark lookup from the clustered index to the primary key. This radically increases the disk I/O (for a more thorough examination see, for example, Randy Dyess' article on SQL Server Central,

However, the compound primary key structure eliminated this processing. My original design required, at a minimum two indexes, the cluster and the primary key. The new design reduced this to one, which in turn reduced the actual size of the database noticeably. This is where the potential for added costs during inserts due to the wider key structures actually ends up in a performance improvement since only a single index is being maintained. We even saw, in some cases, the elimination of a clustered index scan because the more selective index created by adding multiple columns was able to find the data in more efficiently.

Conclusions

We decided to go with the compound key approach. It does add to the number of lines of code and every time we do a review with a different DBA, the size of the primary key's will start them twitching. That said, it's scaling very well and looks to be a success. A concern was raised that the larger and more complex queries would compile slowly, but testing again assured us that, while it was a bit slower, it was easily offset by the performance gains. We're now evaluating all new projects as they come on board to determine if they need quick and easy development (the switch-board approach), or reliable speed and scalability, the compound key approach. Oh, and we've implemented more frequent reviews of databases that we've turned over to development to ensure they are not deviating from the original design as far as this one did.

###
Grant Fritchey is a database administrator for a major insurance company with 18 years experience in IT. He has been working with SQL Server since 6.0 back in 1995 and has also developed in VB, VB.Net, C# and Java. He is currently working on methods for incorporating Agile development techniques into database design and development at our company.

Grant Fritchey

Author profile:

Grant Fritchey, SQL Server MVP, works for Red Gate Software as Product Evangelist. In his time as a DBA and developer, he has worked at three failed dot–coms, a major consulting company, a global bank and an international insurance & engineering company. Grant volunteers for the Professional Association of SQL Server Users (PASS). He is the author of the books SQL Server Execution Plans (Simple-Talk) and SQL Server 2008 Query Performance Tuning Distilled (Apress). He is one of the founding officers of the Southern New England SQL Server Users Group (SNESSUG) and it’s current president. He earned the nickname “The Scary DBA.” He even has an official name plate, and displays it proudly.

Search for other articles by Grant Fritchey

Rate this article:   Avg rating: from a total of 42 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: my co-worker
Posted by: Finalist (view profile)
Posted on: Thursday, July 06, 2006 at 9:27 AM
Message: I work with Grant and YES he does look that scary......

Subject: great
Posted by: Anonymous (not signed in)
Posted on: Wednesday, January 17, 2007 at 3:52 PM
Message: Hi Grant
thats a very good article and you explained very well.
am just wondering what happens if a particular Child table has more than one foreign key do we still make a compound primary pk including all those FKs ?

Subject: RE: Great
Posted by: Granted (view profile)
Posted on: Tuesday, February 20, 2007 at 2:19 PM
Message: Thanks. I sure tried. It was such a fun process that I had a blast writing the article.

Anyway, depending on the exact situation, yeah, I'd make all the FK's be the clustered PK.

Subject: statistics
Posted by: Anonymous (not signed in)
Posted on: Wednesday, April 04, 2007 at 9:43 AM
Message: What commands you using to get the statistics

Subject: re: statistics
Posted by: Granted (view profile)
Posted on: Thursday, April 05, 2007 at 12:53 PM
Message: The transaction counts came out of the Quest Benchmark Factory tool. The rest were either from Performance Monitor or Profiler.

Subject: Index Fragmentation
Posted by: Anonymous (not signed in)
Posted on: Sunday, July 01, 2007 at 12:04 PM
Message: re the MS design
How is it working 6 months in ?
My concern would be the levels of index fragmentation of the compound clustered primary keys when inserting new data

Subject: 6 mo in
Posted by: Granted (view profile)
Posted on: Wednesday, July 11, 2007 at 1:55 PM
Message: It's working very well. We do index maintenance, rebuilds & reindex, on a nightly basis only hitting those indexes that are actually fragmented. It's only hitting a sub-set of the indexes, running in about 90 seconds each night. So far, knock on wood, it's all good. Thanks for asking.

Subject: good one
Posted by: pai (not signed in)
Posted on: Friday, October 12, 2007 at 3:49 AM
Message: I found this is good
http://www.vsys.co.in/info

Subject: Bravo
Posted by: Mark D (not signed in)
Posted on: Thursday, December 13, 2007 at 5:12 PM
Message: This is the most informative, interesting, and easily followable articles on SQL Server tuning that I've come across ever. I'm about to start my first true DBA role after over 8 years in database and app development, and greatly appreciate your insights not only on this article but your others on Simple Talk as well. Keep it up! :)

Subject: re: Bravo
Posted by: Granted (view profile)
Posted on: Friday, December 28, 2007 at 4:04 PM
Message: Wow. Thanks.

BTW, 1.5 years in and it's scaling just fine. Not only no issues, but we've found some other areas where the data is of the same type and we're using this approach. We also still test our designs to ensure they will behave the way we expect.

Subject: still scaling?
Posted by: jt (not signed in)
Posted on: Friday, June 20, 2008 at 12:06 PM
Message: A very helpful article, thx!

still scaling well with the compound PK design?
fragmentation not an issue?

Did any of the tables have varchar columns as the part of the compound PK?

Subject: re: still scaling?
Posted by: Granted (view profile)
Posted on: Wednesday, July 02, 2008 at 7:46 AM
Message: Yes it is still scaling very well. It has worked out quite nicely and we are using the method in other designs now.

Occasionally we do see some statistics issues after a particularly large set of inserts, but it's not in any way a show-stopper.

No, none of the keys were using varchar columns. We've been very careful to use artificial keys in most of the root tables so that the compound keys consist almost completely of integers. We have had to put a datetime or smalldatetime into the key in some instances, but these continue to perform very well.

I've done some testing with varchars (I don't have all the numbers stored or I'd post them) and they just don't perform as well as integers, especially if the varchar fields are really long or have a high percentage of null values. The nulls really kill performance. I'll have to do new testing with SQL Server 2008 and the filtered indexes.

Subject: Great post
Posted by: AbeMiessler (view profile)
Posted on: Wednesday, January 27, 2010 at 10:40 PM
Message: Only thing I might recommend is clearing your cache often in your "Verifying the Stored Procedures" step. It's pretty simple, take a look at <a href="http://www.abemiester.com/post/Setting-up-your-testing-environment-when-performance-tuning-SQL-Scripts.aspx">this post</a>.

Keep up the good posts. Your articles on execution plans have been very helpful to me with a project I've been working on lately.

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Highway to Database Recovery
 Discover the best backup and recovery articles on Simple-Talk, all in one place. Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.