Click here to monitor SSC
Av rating:
Total votes: 22
Total comments: 17


Joe Celko
Unique Experiences!
18 November 2008

You'd have thought that a unique constraint was an easy concept.: Not a bit of it; it can cause a lot of subtle problems in database designs. Joe Celko goes over the ground of unique keys, primary Keys, foreign keys and constraints. It may seem simple, but  there is always something that can catch out even the seasoned database programmer, as evidenced by Lionel's SQL Puzzles!

This is my second article on DDL tricks. The first article ('Constraint Yourself!') dealt with DDL constraints for one or more columns within the same table. But the "unit of work" in SQL is not the table! It is the entire schema. In file systems, each magnetic tape or disk pack was physically and logically separate from the other files in the enterprise. It was the restrictions of those "silos of data" that provided the motivation for developing databases. Enterprises were like a man with a thousand mechanical watches and clocks who then suddenly needs to have them all synchronized and accurate to a micro-second. Oops! Impossible.


Enterprises were like
a man with a thousand
mechanical watches
and clocks who then
suddenly needs to have
them all synchronized
and accurate to a
micro-second.
Oops! Impossible.
                     ”

Data modeling is related to, but not actually part, of RDBMS. In an ideal world (especially if you are a data modeler or a enterprise level DBA who wants full employment) your entire enterprise would have a single, honking big data model, most of which could be programmed into a database. It leads to the idea that data elements have one, and only one, name and one definition over the entire enterprise or (better) the entire industry (i.e. VIN applies to all automobiles in the world).

At a basic level, we divide tables into those that either model entities or model relationships among those entities. A table cannot do both at once, please. SQL does not have explicit syntax for these concepts, so you have to do it yourself. The way you do this is with a relational key. Yes, you can declare a "table" in SQL without a key at all, in violation of the relational model. This means that you have a file system that happens to be written in SQL. This misses the whole point of RDBMS, and costs you all the advantages we gained from replacing file systems.

What is a Key?

Going back to foundations, every table has to have one or more keys. A key is defined as a subset of the non-NULL attributes (columns) in the table which are unique for every row in the table. A key can be one or more columns. Please notice this is a logical concept. A key has nothing to do with how the data is physically stored. A key is not the physical location of the data like a record number on a tape or a track and sector number on a disk drive. They have nothing to do with the order of insertions into physical storage, so the SQL Server IDENTITY and any other auto-numbering device cannot, by definition, be a relational key.

Tables have no concept of ordering like a magnetic tape file or other sequential storage media. Rows are not inserted, updated or deleted from tables. Entire sets of rows are inserted, updated or deleted from tables; if the set involved in the operation has zero, one or more rows in it, this is just fine.

However, go to newsgroups and see how many posters refer to the "next" or "previous" row in a table. This is a major conceptual problem left over from sequential files. Early programming languages (that means FORTRAN and COBOL) defined their data in terms of contiguous physical storage in their standards (Algol was the first abstract language standard). The split of logical versus physical is very hard for most people. In good SQL products, the physical location can be changed by the system on the fly, while the logical key is always the same. Bytes did not move around on their own in file systems!

In RDBMS, a key is how you find an instance of an entity (the table of people, like John and Mary) or a relationship (John & Mary's marriage). When Dr. Codd first invented the Relational Model, the sequential file model was the only mindset we had. You do have a "next" or "previous" record in a sequential file. Locating data depends on a sorted so you can find records without having to do a random search. This was especially true for punch card decks which could be read in one and only one direction.


If nothing else, when
you dropped a tape
on the floor, you did
not scream and beat
your head into a wall.
                     ”

While we make fun of magnetic tapes today, they were the next advance in sequential file systems. If nothing else, when you dropped a tape on the floor, you did not scream and beat your head into a wall. Magnetic tapes depend on sorted ordering -- you do not merge tapes randomly. Trust me, if all the tapes are not sorted the same way, then you are dead in the water. Please do not ask how I learned this simple truth; it is embarrassing (forget one field in the sort key and it almost works; almost is a deadly word, but I am a dinosaur and I digress).

That mindset continued when Dr. Codd invented the Relational Model, so he required that a table have a PRIMARY KEY. This mimicked the sort order key of a magnetic tape. That became the PRIMARY KEY concept in SQL engines, which worked well with sequential access and indexed file systems which early SQL products were built upon. Later, Dr. Codd realized that a key is a key as a key. Oops! Now it is too late and PRIMARY KEY is part of SQL. The file systems upon which early SQL were built could use that info for optimizing data access, so we left it in the language.

ANSI X3H2 Database Standards Committee added the UNIQUE constraint and gave us some help. This approach is called Declarative Referential Integrity (DRI), in contrast to enforcing data integrity via procedural code such as triggers, stored procedures and application code. A UNIQUE constraint can be can be one or more columns. You can have as many UNIQUE constraints on one table as you wish, but only one PRIMARY KEY. A PRIMARY KEY acts like a UNIQUE NOT NULL constraint and it is the default for foreign key references. But this does not imply that UNIQUE has a NOT NULL constraint with it. We debated this point in the committee for awhile.

There are two kinds of equivalence relations in SQL. The first one is simply a binary "equal" or = as defined for each data type. The only surprises are the rules for VARCHAR(n) data types. Standard SQL is case-sensitive and has various collations which can get elaborate, but the rule for equality is that the shorter string is padded with blanks then compared position by position with the longer string -- that means ('Smith' = 'Smith ') is TRUE. Not all programming languages work this way.

The second kind of equivalence relation is grouping, as in GROUP BY. This clause reduces subsets to single values based on equality; it additionally puts all the NULLs into one group, which makes sense with the way GROUP BY is used -- do you really want to have each NULL in its own group? A NULL should have one and only one meaning in the context of the attribute to which it belongs. The UNIQUE constraint is defined with grouping and not equality, so you can have one and only NULL in its columns. Once you are in the SQL mindset, this makes perfect sense.

The bad news is that SQL Server can go around this model with indexes instead of constraints and allow multiple NULLs. This also leads to thinking that uniqueness is somehow related to indexes. In fact, traditional tree structured indexes are far from the best approach with large amounts of data. Perfect hashing algorithms are a hot research topic in RDBMS, and require only one probe to determine uniqueness.

REFERENCES clauses

The real power of PRIMARY KEY and UNIQUE constraints is using the REFERENCES clause to enforce relationships among tables in the schema. But here is where you need to change your mindset again. A reference in RDBMS is not a pointer, a link or anything physical; it is an abstract concept. The idea is that a unique value appears once and only once in the schema, but it can be referenced in many places. Unfortunately, SQL Server repeats the physical data in the referencing (foreign key) tables, so it is hard to think of it correctly.

In fact, most new SQL programmers do not know that a UNIQUE constraint can be referenced just like a PRIMARY KEY. The only difference is the possibility of a NULL in the referenced table; this is one reason we allowed only one NULL in a UNIQUE constraint. A referencing row should reference one and only one row in the referenced table (say that three times fast!).

Another feature that beginners overlook is the multi-column foreign keys. In particular, anyone who does not know what a key is will use IDENTITY to mimic a pointer as if he were still in a 1970's network database. Pointers are scalars, so they never develop an abstract view of data. Pointers are of one type while keys can be all kinds of things! Too much thinking in RDBMS! A classic example of this is a table like:

CREATE TABLE Cities
    
(id IDENTITY NOT NULL PRIMARY KEY-- fake pointer, vague name
                
city_name VARCHAR(25) NOT NULL,
                
city_longitude DECIMAL (85) NOT NULL,
                
city_latitude DECIMAL (85) NOT NULL,
                
etc);
 

The city name might be unique, but (city_longitude, city_latitude) certainly is. But that would mean having to think of the co-ordinate pairs as single atomic unit of unique data. So much easier to type two letters than to have to cut and paste long names, I guess. The proper design would be

CREATE TABLE Cities
    
(city_name VARCHAR(25) NOT NULL UNIQUE,
                
city_longitude DECIMAL (85) NOT NULL,
                
city_latitude DECIMAL (85) NOT NULL,
                
PRIMARY KEY (city_longitudecity_latitude),
                
etc);  

Now other tables can reference my list of cities. My GPS system which I use to locate bars might have this structure:

CREATE TABLE Bars
    
(bar_name VARCHAR(25) NOT NULL PRIMARY KEY,
                
city_longitude DECIMAL (85) NOT NULL,
                
city_latitude DECIMAL (85) NOT NULL,
                
FOREIGN KEY (city_longitudecity_latitude)
                
REFERENCES Cities (city_longitudecity_latitude),
                
etc);  

DRI Actions

So far, violating a reference, such as trying to add a bar whose co-ordinates are not in my list has returned an exception and done a ROLLBACK. But RI can also take actions. When a change is made to the referenced table, the SQL engine can automatically make changes to the referencing tables for you. The two most useful options are ON DELETE CASCADE and ON UPDATE CASCADE.

The ON DELETE CASCADE option will remove all the rows that reference the deleted rows in the referenced table. For example, if I delete an order, I would want all the order details to disappear, too. If I do not do this clean up, the order items will orphans, floating around and messing up my reports, eating up storage space and in general making a mess.

The ON UPDATE CASCADE option will update all the rows that reference the updated rows in the referenced table. For example, if I change an item number in the Inventory, I would want all the order details to also reflect that change. If I do not make the change, then reports are wrong, and I have a non-existent item in my orders.

CREATE TABLE Orders
    
(order_nbr INTEGER NOT NULL PRIMARY KEY,
     
etc);
CREATE TABLE OrderDetails
    
(order_nbr INTEGER NOT NULL
       
REFERENCES Orders(order_nbr)
       
ON DELETE CASCADE,
    
item_nbr INTEGER NOT NULL
      
REFERENCES Inventory (item_nbr)
      
ON DELETE CASCADE,
      
ON UPDATE CASCADE,
     
PRIMARY KEY (order_nbritem_nbr),
     
etc);

A possible problem is that I can have a cycle of cascaded actions if I am not careful. Imagine that Table A changes Tables B and C via a DRI action. Furthermore, the changes in Table B cascade to Table C via a DRI action. How does a row in Table C look after all of this? The first attempts at putting this feature into SQL were not good; the answer was non-deterministic.

At another extreme, a table can reference itself, either on the same column or on different ones. This can result in cycles of elaborate cascades and be difficult to understand. But it is useful for certain integrity checks. Let me use a history table for price changes. The fact is that a price had duration. This is the nature of time and other continuums. So a basic history table looks like this:

CREATE TABLE PriceHistory
    
(item_nbr CHAR(13) NOT NULL
                
REFERENCES Inventory (item_nbr),
                
price_prev_date DATE NOT NULL,
                
price_start_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,
                
price_end_date DATE-- null means current price
                
CHECK (price_start_date price_end_date),
                
CHECK (price_start_date DATEADD(DAY1price_prev_date)), -- 
prevents gaps
                
PRIMARY KEY (item_nbrprice_start_date),
                
item_price DECIMAL (124) NOT NULL
                
CHECK (item_price 0.0000),
                
etc.);

The self-reference will prevent gaps in the history. You just have to handle it in a stored procedure that updates the current data and inserts a new row when a price changes.

Overlapping Uniqueness Constraints

Uniqueness constraints can overlap. A common technique is to model class hierarchies with overlapping uniqueness constraints. As an example, let's take the class of Vehicles, find an industry standard identifier (VIN), and add two mutually exclusive sub-classes, Sport utility vehicles and sedans ('SUV', 'SED').

CREATE TABLE Vehicles
    
(vin CHAR(17) NOT NULL PRIMARY KEY,
                
vehicle_type CHAR(3) NOT NULL
                
CHECK (vehicle_type IN ('SUV''SED')),
                
UNIQUE (vinvehicle_type),
                
..);

Notice the overlapping candidate keys. I then use a compound key (vin, vehicle_type) and a constraint in each sub-class table to assure that the vehicle_type is locked and agrees with the Vehicles table. Add some DRI actions and you are done:

CREATE TABLE SUVs
    
(vin CHAR(17) NOT NULL PRIMARY KEY,
     
vehicle_type CHAR(3DEFAULT 'SUV' NOT NULL
           
CHECK (vehicle_type 'SUV'),
     
UNIQUE (vinvehicle_type),
     
FOREIGN KEY (vinvehicle_type)
      
REFERENCES Vehicles (vinvehicle_type)
      
ON UPDATE CASCADE
      ON DELETE CASCADE
,
     
..);
CREATE TABLE Sedans
    
(vin CHAR(17) NOT NULL PRIMARY KEY,
      
vehicle_type CHAR(3DEFAULT 'SED' NOT NULL
                
CHECK (vehicle_type 'SED'),
      
UNIQUE (vinvehicle_type),
      
FOREIGN KEY (vinvehicle_type)
       
REFERENCES Vehicles (vinvehicle_type)
       
ON UPDATE CASCADE
       ON DELETE CASCADE
,
      
..);

Yes, there is redundancy between the two keys, but many SQL products will construct separate access structures that give the optimizer options. I can continue to build a hierarchy as deep as I wish following this pattern. If a vehicle can of more than one type, you can use "CHECK (vehicle_type IN (<list>))" instead. A better example for overlapping unique constraints would involve a typical class schedule for a school.

The rules we want to enforce are:

1) A teacher is in only one room each period.
2) A teacher teaches only one class each period.
3) A room has only one class each period.
4) A room has only one teacher in it each period.

The ideas are simple and clearly deal with uniqueness. One answer is to have two overlapping UNIQUE() constraints that share the period.

CREATE TABLE Schedule
    
(teacher_name VARCHAR(15) NOT NULL,
                
class_name CHAR(15) NOT NULL,
                
room_nbr INTEGER NOT NULL,
                
period_nbr INTEGER NOT NULL,
                
UNIQUE (teacher_nameperiod_nbr), -- rules #1 and #2
                
UNIQUE (room_nbrperiod_nbr)); -- rules #3 and #4
 

In closing, I invite you to look at a puzzle posted on this website by Lionel Clarke. You are given two tables which have two columns each. The first table has unique constraint on each column. The second table has one unique constraint on the pair of columns. The goal is to insert rows from one table to the other in one statement.

CREATE TABLE Source
    
(a INTEGER NOT NULL,
     
b INTEGER NOT NULL,
     
PRIMARY KEY (ab));
INSERT INTO Source
        
VALUES (11),  (12), (23),
               (
772), (24), (55),
               (
551), (53), (90), (112);
CREATE TABLE Destination
    
(a INTEGER NOT NULL UNIQUE,
&     
b INTEGER NOT NULL UNIQUE,
     
FOREIGN KEY (ab)
      
REFERENCES Source (ab));

There are six possible subsets of five rows each in the Source, so this is trickier than it looks. The answers there will give you some ideas for using the power of uniqueness for data integrity.



This article has been viewed 7098 times.
Joe Celko

Author profile: Joe Celko

Joe Celko is one of the most widely read of all writers about SQL, and was the winner of the DBMS Magazine Reader's Choice Award four consecutive years. He is an independent consultant living in Austin, TX. He has taught SQL in the US, UK, the Nordic countries, South America and Africa.
He served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards.
He has written over 800 columns in the computer trade and academic press, mostly dealing with data and databases. He is the author of eight books on SQL for Morgan-Kaufmann, including the best selling SQL FOR SMARTIES.
Joe is a well-known figure on Newsgroups and Forums, and he is famous for his his dry wit. He is also interested in Science Fiction.

Search for other articles by Joe Celko

Rate this article:   Avg rating: from a total of 22 votes.


Poor

OK

Good

Great

Must read
 
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Paris, Texas, or Paris in the Springtime?
Posted by: Anonymous (not signed in)
Posted on: Wednesday, November 19, 2008 at 9:26 AM
Message: I'm finding it hard to take the article seriously with such terrible examples of table design.

Lat / Long is never a good definition for a city, since the defined center of a city isn't something that most people can agree on and could even be changed.

City name is rarely unique. For example, many, if not most UK place names have a namesake in the US, Ausrtralia and New Zealand. Likewise Spanish city names in Latin America.

Subject: better coding style?
Posted by: Anonymous (not signed in)
Posted on: Wednesday, November 19, 2008 at 12:46 PM
Message: While the article is quite interesting, the coding style could be better. The best practice is to always qualify object names with schema names, and always provide names for all constraints, as follows:

CREATE TABLE Inventory.SUVs
(vin CHAR(17) NOT NULL CONSTRAINT SUVs_PK PRIMARY KEY,

Subject: Multiple Nulls
Posted by: Christian Buettner (not signed in)
Posted on: Wednesday, November 26, 2008 at 5:50 AM
Message: Hi Joe,

could you please elaborate a little on your following statement:
> The UNIQUE constraint is defined with
> grouping and not equality, so you can
> have one and only NULL in its columns
How did you draw this conclusion?
Why is the UNIQUE constraint defined with grouping?
The SQL Standard seems to allow multiple nulls
in a UNIQUE constraint, only SQL Server not.

Thanks!

Subject: Multiple Nulls
Posted by: Christian Buettner (not signed in)
Posted on: Wednesday, November 26, 2008 at 6:53 AM
Message: One additional inquiry regarding your following statement:
> The only difference is the possibility
> of a NULL in the referenced table; this
> is one reason we allowed only one NULL
> in a UNIQUE constraint. A referencing
> row should reference one and only one
> row in the referenced table
Even with multiple NULLs in the referenced table, your referencing row is still only referencing one row in the referenced table. I cannot see any difference between the multiple NULLs and single NULL scenario in this case.
If you enter a NULL value into your referencing table you are not referencing anything (you are not referencing the NULL row in the referenced table! You can try this by not allowing NULLs at all in the UNIQUE constrainted column, you can still have NULLs in the referencing table.). If you have a non-NULL value, you will be explicitly referencing exactly one row in the referenced table or you will get an error.

It would be great if you could elaborate on this a little as well.

Thanks!

Subject: Multiple Nulls
Posted by: Christian Buettner (not signed in)
Posted on: Wednesday, November 26, 2008 at 8:33 AM
Message: One additional inquiry regarding your following statement:
> The only difference is the possibility
> of a NULL in the referenced table; this
> is one reason we allowed only one NULL
> in a UNIQUE constraint. A referencing
> row should reference one and only one
> row in the referenced table
Even with multiple NULLs in the referenced table, your referencing row is still only referencing one row in the referenced table. I cannot see any difference between the multiple NULLs and single NULL scenario in this case.
If you enter a NULL value into your referencing table you are not referencing anything (you are not referencing the NULL row in the referenced table! You can try this by not allowing NULLs at all in the UNIQUE constrainted column, you can still have NULLs in the referencing table.). If you have a non-NULL value, you will be explicitly referencing exactly one row in the referenced table or you will get an error.

It would be great if you could elaborate on this a little as well.

Thanks!

Subject: Paris, Texas, or Paris in the Springtime?
Posted by: Anonymous (not signed in)
Posted on: Wednesday, November 26, 2008 at 8:54 AM
Message: > Wednesday, November 19, 2008 at 9:26 AM
> Message: I'm finding it hard to take the
> article seriously with such terrible examples
of table design.

>Lat / Long is never a good definition for a >city, since the defined center of a city isn't >something that most people can agree on and >could even be changed.

Wow.
I consider myself a champion nit-picker, but I bow to the master.
YOU ARE MY MASTER! PICK MY NITS!


Subject: Re: Multiple Nulls
Posted by: Some Random DBA Guy (not signed in)
Posted on: Wednesday, November 26, 2008 at 10:14 AM
Message: What the author means by "The UNIQUE constraint is defined with grouping and not equality" is that SQL Server treats NULL as a distinct value, so if your UNIQUE constraint is on one column, then only one row in the table can have a NULL value. IF you try to insert more than one row with NULL in that column you'd get a "Violation of UNIQUE KEY constraint" error.

If the constraint was on 2 columns, for example if vehicle_type was nullable in his example, you could have ony one record for each distinct VIN that had a null vehicle_type.

Subject: Missing self reference in PriceHistory example
Posted by: Anonymous (not signed in)
Posted on: Wednesday, November 26, 2008 at 2:36 PM
Message: I'm guessing the price_prev_date attribute is supposed to have "REFERENCES PriceHistory (price_end_date)" to show self-referencing.

Feel free to delete comment after fixing example.

Subject: IDENTITY is a Key - a convenient Key
Posted by: J.S. (not signed in)
Posted on: Wednesday, November 26, 2008 at 3:01 PM
Message: SQL Server IDENTITY columns and other unique-auto-numbering mechanisms are, by definition, Keys. They are simply a convenient way to assign unique values to an attribute - which I'm fairly certain brings that attribute under the definition of a Key. They are not "row numbers" since rows may be deleted, and they are not strictly "order of insertion" since multiple rows may be inserted at once. Whether you manually assign a unique value outside the DBMS or you use a tool like IDENTITY makes no difference as to the "Key-ness". If I use IDENTITY to assign my order numbers for tracking my orders, how is the IDENTITY-generated order number not a Key?

Subject: RE: Missing self reference in PriceHistory example
Posted by: Anonymous (not signed in)
Posted on: Wednesday, November 26, 2008 at 3:58 PM
Message: I guess if a self-reference were put in (which would be based on both item_nbr and price_end_date, as opposed to just price_end_date, which I offered before), there would be the problem of the first row for an item. Either the self-reference or the check constraint would need to be disabled to allow the first price for an item to be entered.

Subject: Sequentality
Posted by: Orcus (view profile)
Posted on: Wednesday, November 26, 2008 at 5:42 PM
Message: To Quote in reference to data rows:
"In good SQL products, the physical location can be changed by the system on the fly, while the logical key is always the same. "

Are there any RDMSs where the overhead for page splits, sorting for retrieval and the fragmentation of indexes are too trivial to be considered ?

I am intrigued by: "In fact, traditional tree structured indexes are far from the best approach with large amounts of data."

Are there more appropriate index structures for OLTP systems ? Given a slowly building rarely revising data access pattern with the majority of the CRUD occurring in a small recent range of the rows.


Subject: Price History
Posted by: puzsol (view profile)
Posted on: Wednesday, November 26, 2008 at 7:03 PM
Message: The constraint:

CHECK (price_start_date = DATEADD(DAY, 1, price_prev_date)),

interests me... so the start date of the current price has to be 24 hours after the previous one ended... so there is a 24 hour gap where the item is free?

I also share other readers concerns about the Identity field being dismissed so casually as a primary key... ok so it may not be the best index for searching the table's contents, but it can be useful as a primary key - and to imply that it can't be used is just plain wrong.

Andrew


Subject: Re: Multiple Nulls
Posted by: Christian Buettner (not signed in)
Posted on: Thursday, November 27, 2008 at 2:32 AM
Message: Hi "Some Random DBA Guy",

I understand that SQL Server treats NULLs like this. What I don't understand is "why".
As mentioned before, the SQL Standard seems to allow multiple NULLs per UNIQUE constraint which would mean that these constraints should be defined with equality, and not with grouping.

I am trying to understand the rationale behind the decision that UNIQUE constraints should be defined with grouping (in SQL Server).

Thanks!

Subject: Price History and IDENTITY "keys"
Posted by: Alex W (view profile)
Posted on: Thursday, November 27, 2008 at 10:26 AM
Message: Puzsol: Note that the price history table defines price_start_date and price_prev_date as DATE datatype not DATETIME. Therefore Joe's constraint does not give a 24 hour gap.

Regarding the "IDENTITY is/isn't a key" argument (it's always guaranteed to draw a crowd isn't it) I believe that there is often a misunderstanding that using an IDENTITY column reduces redundancy in the data in your schema i.e. not putting city_lat and city_long in the bar's table, but using a "pointer" integer surrogate key.

I think the point Joe makes regarding this is perfectly valid. A database is about the data and loads of surrogate keys dotted about the place obfuscate the true relationship between the entities. We know there is a relationship, but what is it? An integral part of the database can be lost when a IDENTITY key is used.

I think to say an IDENTITY column can never be a key is going a step too far. But in the example Joe gave, the ID column certainly not a key of the city entity. It's purely internal attribute that has nothing to do with a city; it doesn't define it.

Some data has no implicit key, e.g. logged events: an identity column is most useful for events that happen in the same timetick (so using the logged datetime isn't an option), but in an specific order!

I fallback to the usual cop out : it depends... :-)

Anyway, just my observations. Back to work...

Subject: The real world is dynamic, cost of implementing changes needs to be consisdered
Posted by: AndyC Manpowersoftware (view profile)
Posted on: Friday, November 28, 2008 at 11:34 AM
Message: The advantage of articifial keys rather than keys that represent data items such as locations comes when the system is being enhanced or maintained. Often things that were unique become no longer unique or names that were thought to be non variant need to be changed? This often happens when systems are interfaced to others or when systems are merged. Often the data architect assumes that their entities will be carved into stone and never changed. But businesses are dynamic, the real world is dynamic and the cost of implementing changes should not be underestimated.

What is a city is blown up and rebuilt, the keys would need to be updated? What happens if someone invents a better co-ordinate system or you work with a country who wants to use grid references rather than lat/long? Looking long into the future what would happen if this DB needed to support multiple planets? What if the teacher gets married or changes or two people with the same name join the school? What if a bar with a really long name is found? All these examples would be harder to handle if the data and keys are combined. The VIN and order number are interesting because these are artificial keys that are visible to the end user and defined by a standard but they are still essentially artificially defined.

From a practical point of view the compound key can cause coding headaches so I'd avoid those too.

Subject: nice write up, a few objections
Posted by: Alex Kuznetsov (view profile)
Posted on: Friday, November 28, 2008 at 9:54 PM
Message: Just for the record - clearly using IDENTITY as the only key is a mistake. However, not having a key on IDENTITY in addition to a natural key may also be a very expensive mistake. For instance, some designers designed their databases assuming that SSN (aka social security number) is a natural key and there is no need for identities. When the reality changed, they had to quickly replace SSN with surrogate keys. That was disastrously expensive. Doing it right the first time, including both surrogate and natural keys, woould be much cheaper.

Subject: Anonymous commenting disabled
Posted by: Chris Massey (view profile)
Posted on: Wednesday, December 17, 2008 at 8:13 AM
Message: Anonymous commenting has had to be disabled due to spam. If you want to leave a comment you're either have to sign in or sign up. Sorry for any inconvenience. 

 










Phil Factor
Automated Script-generation with Powershell and SMO
 In the first of a series of articles on automating the process of building, modifying and copying SQL Server... Read more...



 View the blog
Using SQL Test Database Unit Testing with TeamCity Continuous Integration
 With database applications, the process of test and integration can be frustratingly slow because so... Read more...

SQL Source Control: The Development Story
 Often, there is a huge difference between software being easy to use, and easy to develop. When your... Read more...

How to Import Data from HTML pages
 It turns out that there are plenty of ways to get data into SQL Server from websites, whether the data... Read more...

SQL Scripts Manager: An Appreciation
 SQL Scripts Manager is Simple-Talk's present to its readers. William Brewer was an enthusiastic... Read more...

Hosted Team Foundation Server 2010 Review
 Team Foundation Server (TFS) has expanded its remit to support the whole software development process,... Read more...

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 Database design and implementation is the cornerstone of any data centric project (read 99.9% of... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Beginning SQL Server 2005 Reporting Services Part 2
 Continuing his in-depth tour of SQL Server 2005 Reporting Services, Steve Joubert demonstrates the most... Read more...

Creating CSV Files Using BCP and Stored Procedures
 Nigel Rivett demonstrates some core techniques for extracting SQL Server data into CSV files, focussing... Read more...

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.

Join Simple Talk