Av rating:
Total votes: 21
Total comments: 6


Robyn Page and Phil Factor
Database Design Workbench - Keys
19 September 2008

/* Robyn Page and Phil Factor explore the innocent subject of Keys for their latest workbench. Everybody knows about keys. Oh yeah? Phil Factor ends up muting the immutable out of sheer devilry, and we learn how silly the British Secret Service were to tag James Bond with the code '007'...*/

/*

Key Issues

Some of the simplest concepts in relational databases can catch out even the more-experienced programmer. Surely, everyone understands database keys? Perhaps not. The usage of keys is a subject that has never quite bedded down to a happy consensus. It causes some heat, and raised voices, amongst database people. We'll try out a few things, and try to introduce a few concepts as we go. We've put a glossary at the end. You might need it.

Difficulties with James Bond

This workbench got started when we got arguing how the Natural Key 007 would have got allocated to James Bond, and then how they could distinguish James Bond from the previous 007s in order to do any accounting, such as calculating expenses or sending out a Bar (Mess) bill. We ended up concluding that it was completely useless, and probably masked a surrogate key.

From the stories, it would seem that there were only nine or ten agents who were licensed to kill, and they were all double-O (00x) agents. There are references in the Bond books to the fact that there was quite a turnover, with only ten agents active at a time. There could have been 999 agents at any one time using one of these codes. Presumably, with only one 0, the ninety nine were licensed to give a pretty nasty bruises, and the others were licensed to stick to satire, invective and irony. In the key 007, there is lurking a Smartkey, since two separate bits of information are shoehorned together

In fact to insert a new secret agent into a table, you'd probably want to allocate the lowest available number.

Imagine we had an agent table ...*/

--clear up, if you are re-running the workbench
IF OBJECT_ID(N'FK_PhoneNumbers_agent') IS NOT NULL
  
ALTER TABLE [dbo].[PhoneNumbers] DROP CONSTRAINT [FK_PhoneNumbers_agent]
IF OBJECT_ID(N'agent') IS NOT NULL DROP TABLE agent
--and create the agent table
CREATE TABLE agent(
  
[code]  
    
AS RIGHT('00000'+RIGHT(CONVERT(VARCHAR(15),[MyNumber]),3),3)
        
PERSISTED NOT NULL    ,--e.g. 007.
   --This is just a string representation of the MyNumber column.
  
MyNumber INT NOT NULL UNIQUE,
  
FirstName VARCHAR(50) NOT NULL,
  
Lastname VARCHAR(50) NOT NULL,
    
PRIMARY KEY (code)--here we define a calculated column as being a primary key
  
)
GO

/*
now you can get the agents into the table, just by executing the file we supply, ...*/

--load the agents here from the file attached.


/*
Allocating the next valid agent number would be interesting. it would depend whether they were
1/ Licensed to kill
2/ licensed to give a nasty bruise
3/ licensed to hurl invective.*/
/* we'll do a stored procedure for this */

IF OBJECT_ID(N'InsertNewAgent') IS NOT NULL DROP PROCEDURE InsertNewAgent
GO
CREATE PROCEDURE InsertNewAgent
   @FirstName
VARCHAR( 50),
  
@Lastname VARCHAR( 50),
  
@type INT
AS
DECLARE
@base INT,@Highest INT , @key INT
IF
COALESCE( @firstname+@lastname,'') =''
  
BEGIN
   RAISERROR
( 'an agent must have a name',16,1 )
  
RETURN 1
  
END
SELECT
@Base=CASE @type WHEN 1 THEN 1
                      
WHEN 2 THEN 10
                      
ELSE 100 END,
      
@highest=CASE @type WHEN 1 THEN 9
                      
WHEN 2 THEN 99
                      
ELSE 999 END
BEGIN TRANSACTION
--think of the mess if two agents were  created at once
SELECT @Key=  CASE WHEN EXISTS
   (
SELECT 1 FROM agent WITH (TABLOCKX)
      
WHERE MyNumber BETWEEN @Base AND @Highest )
  
THEN
  
(SELECT TOP 1 A.MyNumber+1--allocate the next unused number in the sequence
      
FROM  agent A (TABLOCKX)
        
LEFT OUTER JOIN  agent B
        
ON B.MyNumber=A.MyNumber+1
      
WHERE B.MyNumber IS NULL
       AND
a.MyNumber >= @base
      
ORDER BY a.MyNumber
    
)
  
ELSE @base END
IF
@key NOT BETWEEN @Base AND @Highest
  
BEGIN
   ROLLBACK TRANSACTION
   RAISERROR
(
    
'You can''t have any more agents of this type, (between %d and %d)'
      
,16,1,@base,@highest)
  
RETURN 1
  
END
INSERT INTO
agent(Firstname,Lastname,MyNumber)--at last we can insert
--the name into the table.

    
SELECT @FirstName, @Lastname,@key
COMMIT TRANSACTION
RETURN
0
GO
--and you would use it like this.
EXEC InsertNewAgent 'Jim','Carruthers',1--licensed to kill
EXEC InsertNewAgent 'Jane','Moneypenny',3--persuasion only
SELECT * FROM Agent WHERE lastname IN ('Bond','Carruthers','Moneypenny')


/*
code MyNumber    FirstName      Lastname
---- ----------- -------------- ----------------
002  2           Jim            Carruthers
007  7           James          Bond
063  63          Keri           Bond
105  105         Jane           Moneypenny */

--This insertion method  is very quick to run: We timed it at 15 milliseconds
--in a one million row table

/* This is OK as far as it goes. The trouble is that it doesn't go very far. The only way you could use this as a primary key is if you deleted all references to the previous agent with the same code. If you have referential constraints in place, then you are prevented from doing anything silly by mistake. Yes, you will find systems where it is done like this.

The trouble is that Databases generally need a history. No accountant would tolerate a system where all the records relating to an employee were deleted the moment the agent was exterminated by a villain. Nothing would balance. A payment to an employee remains a payment even after the employee has been eaten by a shark. Let's face it, you're going to need a surrogate key.

What do we do about the 007 nonsense? Ah, they want to keep it. They point out that they are unique because the code is only assigned to the current agent. It would cost too much for the business to change the system */

-- We then arrive at an uneasy truce. We use a compound key to ensure uniqueness
-- Just to make it even more scary, one part of the key is a derived field!  
-- but we'll also slip in a surrogate candidate key
IF OBJECT_ID(N'agent') IS NOT NULL DROP TABLE agent
CREATE TABLE agent(
  
[code]  
    
AS RIGHT('00000'+RIGHT(CONVERT(VARCHAR(15),[MyNumber]),3),3)
      
PERSISTED NOT NULL
   ,
--e.g. 007
  
MyNumber INT NOT NULL,
  
holder INT NOT NULL DEFAULT 0,-- 0 if the current holder,
                                           --otherwise 1..n
  
FirstName VARCHAR(50) NOT NULL,
  
Lastname VARCHAR(50) NOT NULL,
  
AGENT_id INT IDENTITY(1,1) UNIQUE,--the surrogate key
    
PRIMARY KEY (code,holder) /*we define a compound primary
                               key  made up of two columns */
  
)
GO
--insert sample data now---

/*Now Dion Prince gets sliced in two by a laser so we retire him by taking away his 0 in the holder column so someone else can now have his cherished 003 moniker. At this point, it is easy because the key hasn't been used!*/

UPDATE agent
    
SET holder=
      
(SELECT COUNT(*)
        
FROM agent a
        
WHERE a.MyNumber = agent.Mynumber)
-- he was the first holder
WHERE firstname='dion' AND lastname LIKE 'Prince' AND holder=0

/* This is unrealistic, because in a real database the primary key will have been used and so this will cause an error. We'll come back to this issue later. Normally we take great care to ensure that a candidate key is immutable.

Now the stored procedure can be altered slightly to add the fact that there can have been several holders of the 003 moniker but we want to reallocate it only if it isn't there, or it is currently vacant*/

--the actual logic goes like this (for a double-0 agent)
--what is the lowest currently available key in the 00 range?
SELECT  
  
CASE WHEN EXISTS
   (
SELECT 1 FROM agent
      
WHERE MyNumber BETWEEN 1 AND 9  )
  
THEN
  
(SELECT TOP 1 A.MyNumber+1--allocate the next unused number in the sequence
      
FROM  agent A (tablockx)
        
LEFT OUTER JOIN  agent B
        
ON B.MyNumber=A.MyNumber+1
        
AND a.holder=0--current agent
        
AND b.holder=0--current agent
      
WHERE a.holder=0 AND B.MyNumber IS NULL
      
ORDER BY a.MyNumber
    
)
  
ELSE 1 END

/*
Key Confusions

we've now jumped through a few hoops in order to allow the business their natural key. Why all the fuss?

You will get people arguing that keys must be natural (or, perversely, that they shouldn't be) or that they must be immutable (that you should never change its value). I've even heard it said that all joins must be on key values. Nope. Do what you think best. There are no 'best practices' or hard and fast rules, but you will have an easier life if you deviate from normal practice only when you have to.

We've sneaked in a surrogate key. Although the combination of moniker and holder number will be useful in tables which hold the agents' information, so it is easy to check things like addresses, contact numbers and so on, we've decided to use a surrogate key as well, for tables that pertain to entities that have no ‘public’ meaning.

Getting bulk data in place

What about a bulk insertion? No problem. As we are doing a once-only import on a new table, We'll drop the the Code field altogether for now, and then recreate and populate it after the bulk import. In fact, you can do bulk inserts later on an existing table but they’ll be slower. We'll have to assume that the first nine imported rows are 00 status, then the next ninety are 0 status and the following 900 are boring three-figure numbers. the next thousand will be replacements, and so on. */

IF OBJECT_ID(N'agent') IS NOT NULL DROP TABLE agent
CREATE TABLE agent(
  
FirstName VARCHAR(50) NOT NULL,
  
Lastname VARCHAR(50) NOT NULL,
  
MyNumber INT NOT NULL--a number between 1 and 999
  
)
GO

/* At this point onwards, we used SQL Data Generator. Apologies to anyone who hasn't already got it. To follow this bit, you'll need to download it for a fourteen day free trial! The reason is that, later on in the workbench, importing into tables that involve foreign key constraints is tricky to do by hand. Also the million agents were a big file, (to say nothing about two million phone numbers). */

--Add in the million rows here! (25 secs)

/* now we want to calculate the Holder value to make each of the composite keys unique. Notice that we need to create a clustered index so that the updates happen in the order we want (the order of the clustered index.)*/
 CREATE CLUSTERED INDEX idxMyNumber ON agent(MyNumber)
--4 seconds

ALTER TABLE agent ADD
  
[holder] INT NOT NULL DEFAULT 0
--6 seconds
GO
DECLARE @Holder INT, @MyNumber INT
SELECT
@MyNumber=0,@MyNumber=-1
UPDATE agent
  
SET @Holder= holder=CASE WHEN Mynumber<>@MyNumber
                      
THEN 0  ELSE @holder +1 END,
      
@Mynumber=Mynumber
--this took 4 seconds

SELECT COUNT(*) FROM agent GROUP BY MyNumber,holder HAVING COUNT(*)>1
--check for unique! (5 seconds)

--now we can construct the visible part of the moniker
ALTER TABLE  agent ADD
      
[code]  
    
AS RIGHT('00000'+RIGHT(CONVERT(VARCHAR(15),[MyNumber]),3),3)
      
PERSISTED NOT NULL
--6 seconds
-- and we can add a surrogate key as well.
-- The unique constraint provides an index
ALTER TABLE  agent ADD
      
[Agent_ID] INT IDENTITY(1,1) UNIQUE
--10 secs



DROP INDEX agent.idxMyNumber--we only needed this to do the 'holder'
-- part of the key
--2 secs
--we'll generally use the normal way of identifying people
CREATE CLUSTERED INDEX idxName ON agent(lastname)
--8 secs

--create a composite key
ALTER TABLE agent WITH NOCHECK
ADD CONSTRAINT
PK_AgentIdentifier PRIMARY KEY NONCLUSTERED (code,holder)
--7 seconds

--instantaneous!
--The first twenty agents whose lastname begins with 'Z'
SELECT  TOP 20 Firstname,lastname, code, holder
  
FROM agent WHERE lastname LIKE 'z%'
--The current holder of 007 followed by previous ones
SELECT  TOP 20 Firstname,lastname, code, holder
  
FROM agent WHERE code='007'
ORDER BY holder ASC
--current agents
SELECT TOP 20 Firstname,lastname, code
    
FROM agent
    
WHERE holder=0 ORDER BY MyNumber ASC

IF
OBJECT_ID(N'FK_PhoneNumbers_PhoneType') IS NOT NULL
  
ALTER TABLE [dbo].[PhoneNumbers] DROP CONSTRAINT [FK_PhoneNumbers_PhoneType]
IF OBJECT_ID(N'PhoneType') IS NOT NULL DROP TABLE PhoneType
CREATE TABLE dbo.PhoneType
  
(
  
PhoneType_ID INT NOT NULL IDENTITY (1, 1),
  
[Type] VARCHAR(10) NOT NULL
   )  
ON [PRIMARY]
GO
ALTER TABLE dbo.PhoneType ADD CONSTRAINT
  
PK_PhoneType PRIMARY KEY CLUSTERED
  
(
  
PhoneType_ID
  
)ON [PRIMARY]
GO
IF OBJECT_ID(N'PhoneNumbers') IS NOT NULL DROP TABLE PhoneNumbers
CREATE TABLE dbo.PhoneNumbers
  
(
  
PhoneNumber_ID INT NOT NULL IDENTITY (1, 1),
  
PhoneType INT NOT NULL,
  
Cli VARCHAR(30) NOT NULL,
  
Holder INT NOT NULL,
  
Code VARCHAR(3) NOT NULL
   )  
ON [PRIMARY]
GO
ALTER TABLE dbo.PhoneNumbers ADD CONSTRAINT
  
PK_PhoneNumbers PRIMARY KEY CLUSTERED
  
(
  
PhoneNumber_ID
  
)ON [PRIMARY]

GO
ALTER TABLE dbo.PhoneNumbers ADD CONSTRAINT
  
FK_PhoneNumbers_agent FOREIGN KEY
  
(
  
Code,
  
Holder
  
) REFERENCES dbo.agent
  
(
  
code,
  
holder
  
) ON UPDATE  NO ACTION -- CASCADE (if primary key changes! SQL 2000 or above) 
    
ON DELETE  NO ACTION
  
GO

--we now add the foreign key constraint
ALTER TABLE dbo.PhoneNumbers ADD CONSTRAINT
  
FK_PhoneNumbers_PhoneType FOREIGN KEY
  
(
  
PhoneType
  
) REFERENCES dbo.PhoneType
  
(
  
PhoneType_ID
  
) ON UPDATE  NO ACTION
    
ON DELETE  NO ACTION
  
---and here we'll add in two million telephone numbers
-- at this stage we have to use SQL Data Generator, really, as
--anything else is just too tedious.



CREATE NONCLUSTERED INDEX idxCli ON dbo.PhoneNumbers
(
  
[Cli] ASC
) ON [PRIMARY]

--get the first 20 agents whose phone-numbers start with '01'
SELECT     TOP 20 dbo.agent.FirstName, dbo.agent.Lastname, dbo.PhoneNumbers.Cli
FROM         dbo.agent INNER JOIN
                      
dbo.PhoneNumbers
ON dbo.agent.code = dbo.PhoneNumbers.Code
AND dbo.agent.holder = dbo.PhoneNumbers.Holder
WHERE cli LIKE '01%'

--get the agent(s) whose phone-number is '0142211065'
SELECT TOP 20 dbo.agent.FirstName, dbo.agent.Lastname,
      
dbo.PhoneNumbers.Cli,PhoneType.TYPE
FROM        
dbo.agent
INNER JOIN  dbo.PhoneNumbers
  
ON dbo.agent.code = dbo.PhoneNumbers.Code
      
AND dbo.agent.holder = dbo.PhoneNumbers.Holder
INNER JOIN dbo.PhoneType
  
ON PhoneType.PhoneType_ID=phoneNumbers.PhoneType
WHERE cli LIKE '0142211065'
/*9 ms

FirstName   Lastname  Cli                  Type
----------- --------- --------------------  -------
Elisa       Parrish   0142211065           Fax      

(1 row(s) affected)*/

Now we have the simplest possible database with foreign keys. We now hear the sad news that 005 has been cut in half by a laser.

We retire him by a stored procedure that involves changing a component of the primary key! We can do this bit automatically simply by means of a cascading update  on the FK_PhoneNumbers_agent constraint on PhoneNumber, but we'll do it explicitly as well for those poor souls on SQL Server 7 */

IF OBJECT_ID(N'RetireAgent') IS NOT NULL DROP PROCEDURE RetireAgent
GO
CREATE PROCEDURE RetireAgent --
@code
CHAR(3)
AS
DECLARE
@NewHolder INT
IF
NOT EXISTS (SELECT 1 FROM agent WHERE code=@code AND holder=0)
  
BEGIN
   RAISERROR
('The agent has already been retired',16,1)
  
RETURN(1)
  
END
SELECT
@Newholder=  COUNT(*) FROM agent a
              
WHERE a.MyNumber = @code
--in SQL Server 2000 up you would use a CASCADE condition on the
--ON UPDATE clause of the foreign key constraint for each table
BEGIN TRANSACTION--retire the agent along with his phone numbers
INSERT INTO agent (firstName, Lastname,MyNumber,Holder)
  
SELECT firstName, Lastname,MyNumber,@NewHolder
      
FROM agent WHERE code=@code AND holder=0
UPDATE Phonenumbers SET holder=@Newholder
  
WHERE code=@code AND holder=0
DELETE FROM agent WHERE code=@code AND holder=0
COMMIT TRANSACTION
GO

EXECUTE RetireAgent '007'--1.5 seconds.

IF OBJECT_ID(N'InsertNewAgent') IS NOT NULL DROP PROCEDURE InsertNewAgent
GO
CREATE PROCEDURE InsertNewAgent
   @FirstName
VARCHAR(50),
  
@Lastname VARCHAR(50),
  
@type INT
AS
DECLARE
@base INT,@Highest INT, @key INT
IF
COALESCE(@firstname+@lastname,'')=''
  
BEGIN
   RAISERROR
('an agent must have a name',16,1)
  
RETURN 1
  
END
SELECT
@Base=CASE @type WHEN 1 THEN 1
                      
WHEN 2 THEN 10
                      
ELSE 100 END,
      
@highest=CASE @type WHEN 1 THEN 9
                      
WHEN 2 THEN 99
                      
ELSE 999 END
BEGIN TRANSACTION
--think of the mess if two agents were  created at once
SELECT @Key=  CASE WHEN EXISTS
   (
SELECT 1 FROM agent WITH (TABLOCKX)
      
WHERE MyNumber BETWEEN @Base AND @Highest )
  
THEN
  
(SELECT TOP 1 A.MyNumber+1--allocate the next unused number in the sequence
      
FROM  agent A (TABLOCKX)
        
LEFT OUTER JOIN  agent B
        
ON B.MyNumber=A.MyNumber+1
        
AND a.holder=0--current agent
        
AND b.holder=0--current agent
      
WHERE a.holder=0 AND B.MyNumber IS NULL
      
ORDER BY a.MyNumber
    
)
  
ELSE @base END
IF
@key NOT BETWEEN @Base AND @Highest
  
BEGIN
   ROLLBACK TRANSACTION
   RAISERROR
(
    
'You can''t have any more agents of this type, (between %d and %d)'
      
,16,1,@base,@highest)
  
RETURN 1
  
END
INSERT INTO
agent(Firstname,Lastname,MyNumber)--at last we can insert
--the name into the table.

    
SELECT @FirstName, @Lastname,@key
COMMIT TRANSACTION
RETURN
0
GO
--and now we can add a new volunteer to die for his country!
EXEC InsertNewAgent 'Jim','Carruthers',1--licensed to kill

/*

Conclusion

So what have we learned? Although natural keys can be made to work, and can perform well, there is always a payoff. The database designer is right to use surrogate keys wherever he can but he should not be afraid of circumstances where that isn't possible. Where all the business operations are done through stored procedures, then the complexity can be hidden from the application programmers, and one can cheerfully commit 'crimes' such as changing things that ought, perhaps, to be immutable. We're inclined to agree with C J Date and Joe Celko that we should be less doctrinaire about keys. Above all, we reckon that this sort of approach should be tried out on a slow machine, with as much data as possible, with all metrics switched on!

Glossary


These terms are in common currency, but a few are more widely used in Oracle

Key
To be usable, in other words to be able to unambiguously identifify a database entity, a key must be unique and known (i.e. not NULL). It can consist of one or more attributes (columns). A key is no more than a 'candidate' until you use it and you can have as many candidate keys in a table as you like.
Fat key
A key that contains more than 16 bytes. It is said that these do not perform as well, but if this is true, it is usually too slight to notice.
Primary key
the default candidate key for a table. Primary keys are simply the default keys for a table. They are no longer required by relational theory for a table, just a candidate key, and are, according to CJ Date, included only for historical reasons. However, they should be used as there is no other way to specify to other processes which key should be used by default.
A Primary key is no more than an indexed key whose value, taken together across all component columns, is unique and known. SQL Server creates a clustered unique index when you specify a primary key, but this is not always the best solution, since a clustered index works best when it is not unique, not null and when the values are reasonably evenly distributed
Compound key
A key consisting of several columns, in which all values are known (not null) and, in combination, are unique.
Foreign key
a reference to a key in another table
Candidate key
a key with a unique and ‘not null’ constraint.
Unique key
a key containing unique values across all component columns.
Natural key
a key containing information used by people. Primary keys for business entities such as invoices, deliveries or products are generally best if they are composed of something the business already uses, such as a product code, shipment reference, customer reference, or invoice number. and is part of their culture.
This type of key is the so-called 'natural key'.
There is a problem with having a meaningful key. If the data that formed the natural key (such as a person's initials) was was wrongly entered, the key will be wrong but the business will already "know" the wrong version.
Surrogate key
a computer-generated candidate key. Surrogate keys such as the 'Identity field key' are the normal way of getting round the complexities of trying to handle Natural keys.
If the business uses incrementing numbers that can be huge, and they don't mind gaps, then you can use these identity fields. However, more often you'll find that businesses have arcane rules, passed down from distant tribal ancestors, and they resent having to change their system.
There are often good reasons for this. The cost of retraining, or changing the corporate culture can be astronomical. It is their choice. As Codd says, 'Users [...] are no longer compelled to invent a user-controlled key if they do not wish to'.
We can play it either way. Obviously, where a table represents something that has meaning only within the database schema, a surrogate unique key that is machine-generated, such as an identity field or GUID, is fine. However, where this isn't the case, you are running a risk by using one because, when things go wrong, it is much easier to sort things out if the keys are meaningful, and errors are so much easier to spot too before damage is done.
SmartKey (or contatenated key)
a key containing a single value usually composed of the values of several columns. If the data that made up part of the key was was wrongly entered, the key will be wrong but the business will already "know" the wrong version.
All smart keys, such as postcodes, eventually run into trouble, however preferable they are. A difficulty with compound keys is that you cannot select a 'set' of rows by using the IN clause e.g.*/

Delete from product where product_ID in
    (select max(product_ID)
      from product
      group by productname
      having count(*)>1)

/* While Surrogate keys make the coding of databases easier, they are disliked by book-keepers accountants, retailers or anyone else who has to handle them. They aren't human-friendly. Also, identity fields that are so often used to create surrogates can sometimes catch you out in surprising ways.
Composite key
A key that uses several non-null columns.
Alternate key.
A candidate key not assigned to be the primary key.
Foreign key
A foreign key restrains, at all times, one or more columns in a relvar to refer to a corresponding relvar via a candidate key in the referenced table. The enforcement is done via a foreign key constraint
Key Constraint
This is a constraint to the effect that one or more columns in a table are to be a candidate key. A primary Key constraint just defines the default candidate key.
Immutable key
This is a key whose value cannot be changed. This is the normal assumption, though it is seldom enforced.


This article has been viewed 7593 times.
Robyn Page and Phil Factor

Author profile: Robyn Page and Phil Factor

Robyn Page is a consultant with Enformatica and USP Networks. She is also a well known actress, being most famous for her role as Katie Williams, barmaid in the Television Series Family Affairs.

Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 20 years of experience with database-intensive applications. Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

Search for other articles by Robyn Page and Phil Factor

Rate this article:   Avg rating: from a total of 21 votes.


Poor

OK

Good

Great

Must read
 
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Point well taken
Posted by: Anonymous (not signed in)
Posted on: Thursday, September 25, 2008 at 2:50 PM
Message: Your points about keys - natural and surrogate - are well taken.

However, the means by which you got there (how many double ohs, who could do what, etc.), as with open source, only proves that there are way too many creative people with way too much time on their hands ;-)

Subject: re: point well taken
Posted by: Phil Factor (view profile)
Posted on: Thursday, September 25, 2008 at 3:07 PM
Message: LOL. You're right, I'm sure.
This was a difficult article to write. It went through several drafts. Originally, Bond appeared only at the end as a flourish, but we realised we could illustrate a couple of points about keys with 007 and the idea sort of rather took over the article. Somehow it all jelled once we did that.
It all started when Robyn and I were arguing how one could use a derived column as part of a key with the PERSISTED keyword, and we were stuck for an example. We thought of Bond's code name, and wondered how on earth one could have processed agents expenses with a system like that.

Subject: Key Taxonomy/Ontology
Posted by: M. Evers.(DM Unseen) (not signed in)
Posted on: Wednesday, October 01, 2008 at 2:44 AM
Message: Key theory is flaky at best for most practitioners.
First: Even if Codd defined it differently at times, C.J. Date surely noted that from relational theory there are just candidate and super -keys, nothing else. Keys *do* however have a lot of (definable) properties like compoundness, scope, visibility, information entropy, information value, familiarity, stability, mutability and much much more (there are probably hundred or more attributes to be discovered for keys).
Given these properties you construct a Key Taxonomy or Ontology based on these properties. The classes you then get construct can reflect the names everyone uses like Natural Key, Surrogate Key, Intelligent key etc.
Only from the Ontology or Taxonomy it is possible to talk unambiguously about types of keys. Everything else is just trying to find common ground without a good foundation.

Subject: How about a simple association table?
Posted by: Anonymous (not signed in)
Posted on: Wednesday, October 01, 2008 at 9:41 AM
Message: Why not simply have something like:


    
-- agent types table
CREATE TABLE AgentTypes
    
(
                
Code CHAR(3)
                
Description VARCHAR(500)
    )
    
-- agents HR table
CREATE TABLE Agents
    
(
                
AgentID INT IDENTITY,
                
FirstName VARCHAR(100),
                
LastName VARCHAR(100),
                
DOB DATETIME,
                
DOH DATETIME
    
)
    
-- agent - agent types relation table
CREATE TABLE AgentTypeAgent
    
(
                
AgentTypeAgentID INT IDENTITY,
                
Code CHAR(3),
                
AgentID INT,
                
StartDate DATETIME
    
)

This way in AgentTypeAgent we can keep a historical record of all 007 agents, or any other type code for that matter, where the current would be the one with latest StartDate in AgentTypeAgent. It is rather a design issue than a key domain issue.




Subject: I know it was not the point, but...
Posted by: Dennis A. (not signed in)
Posted on: Thursday, October 02, 2008 at 12:38 PM
Message: Wonderful article. I always look forward to your though provoking workbenches.

I know this was a discussion on key construction/selection, but one of the items that stuck out for me was the need for a transaction in the InsertNewAgent procedure. Both the check for an available agent number and the insertion of that number into the table can be combined to avoid this need. Here is an example of what I mean that uses a number table function. Oh, and as a quick aside, it would be grand if you added the SQL Prettifier for these posts.

Thank you again
CREATE PROCEDURE InsertNewAgent
    
@FirstName VARCHAR(50),
    
@LastName VARCHAR(50),
    
@Type INT
            AS
DECLARE
@Base INT, @Highest INT
            
/* CHECK: require both first and last name */
IF COALESCE(@FirstName+@LastName, '') = ''
    
BEGIN
    RAISERROR
('an agent must have a name',16,1)
    
RETURN 1
    
END
            
/* CHECK: require a type between 1 and 3 be provided */
IF COALESCE(@Type, -1) NOT BETWEEN 1
            
AND 3
    
BEGIN
    RAISERROR
('Invalid agent type. The valid agent types are 1, 2, and 3.',16,1)
    
RETURN 1
    
END
            
/* Calculate the valid agent number range for the given type */
SELECT
    
@Base = POWER(10, @Type-1), -- @Type of 1=0, 2=10, 3=100
    
@Highest = POWER(10, @Type)-1 -- @Type of 1=9, 2=99, 3=999
            /* Try to insert the new agent.
Note that no insert will occure if there is not a position of the given type available
*/
INSERT INTO agent (Firstname, Lastname, MyNumber)
    
SELECT TOP 1 @FirstName, @LastName, r.n
    
FROM dbo.IntRange(@Base, @Highest) AS r -- table function that returns all integers in a given range
            
LEFT JOIN agent AS a ON a.MyNumber = r.n
    
WHERE a.MyNumber IS NULL
    
ORDER BY n ASC;
            
/* Catch the case where no row was inserted into the agent table */
IF @@RowCount = 0
    
BEGIN
    RAISERROR
('You can''t have any more agents of this type, (between %d and %d)', 16, 1, @Base, @Highest)
    
RETURN 1
    
END
RETURN
0
GO
            

Subject: C J Date Reference
Posted by: Lori S (view profile)
Posted on: Wednesday, February 25, 2009 at 9:11 PM
Message: "They are no longer required by relational theory for a table, just a candidate key, and are, according to CJ Date, included only for historical reasons. " What's you bibliographic reference for this statement.

Thank you.

 










Phil Factor
Exploring your database schema with SQL
 In the second part of Phil's series of articles on finding stuff (such as objects, scripts, entities, metadata) in... Read more...



 View the blog
Mission Critical: Database Design
 There is nothing like a checklist to make sure you've completed all the tasks in designing a database,... Read more...

Transparent Data Encryption
  Transparent Data Encryption is designed to protect data by encrypting the physical files of the... Read more...

SQL Toolbelt 2008: Predominantly an Engineering Task
 The conversion of the Red-Gate tools to be compatible with SQL Server 2008 might not seem, on first... Read more...

SQL Response: The dim sum interview
 Richard Morris met David and Nigel of the SQL Response team, in a dim sum Restaurant in Cambridge. They... Read more...

Data Correlation Optimization Internals
 Having adroitly introduced us, in his previous article, to the Date Correlation ability of the Query... Read more...

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 Database design and implementation is the cornerstone of any data centric project (read 99.9% of... Read more...

Beginning SQL Server 2005 Reporting Services Part 2
 Continuing his in-depth tour of SQL Server 2005 Reporting Services, Steve Joubert demonstrates the most... Read more...

SQL Server Full Text Search Language Features
 SQL Full-text Search (SQL FTS) is an optional component of SQL Server 7 and later, which allows fast... Read more...

Creating CSV Files Using BCP and Stored Procedures
 Nigel Rivett demonstrates some core techniques for extracting SQL Server data into CSV files, focussing... Read more...

Over 150,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.

Join Simple Talk