Click here to monitor SSC
  • Av rating:
  • Total votes: 76
  • Total comments: 19
Joe Celko

Constraint Yourself!

26 October 2008

In his first article for Simple-Talk, Joe Celko demystifies the use of Constraints, and points out that they are an intrinsic part of SQL and are a great way of ensuring that a business rule is done one way, one place, one time. Almost all database programmers will find something new and useful in this article.

In the Newsgroups I frequent, I find that I constantly have to remind people that rows are not records, columns are not fields and tables are not files.  After teaching SQL and writing books on the language for a few decades, I have found that if a person can switch his mindset from procedural code and sequential files to declarative code and sets, then SQL and data modelling become much, much easier.  It is an epiphany, much like when recursion suddenly makes sense to a procedural programmer.  If you are a native LISP programmer, replace "recursion" with "iteration" in that last sentence. 


When a business
 rule is in the DDL,
 it is done one way,
 one place, one time
                  ”

Part of this epiphany is realizing that DDL (Data Declaration Language), DML (Data Manipulation Language) and DCL (Data Control Language) are all an intrinsic part of SQL and not disjoint languages that stand apart.  This article deals with the DDL, but better DDL makes for better DML and DCL.  When a business rule is in the DDL, it is done one way, one place, one time.  You do not have to hope that every application and every DML statement gets all the rules right.  You do not have to hope that a change in the rules will be corrected in hundred or even thousands of places in the system. 

An Overview of Constraints: DDL versus DML

Since DDL is declarative, the SQL engine can convert row and column constraints into search conditions that can be used by the optimizer, by adding them to the execution plan under the covers. It is a little trickier than it sounds because there is a subtle difference between DDL and DML logic. Let's start with a sample table, Foobar, which has no  CHECK() constraints on it, and load some sample data into it:

CREATE TABLE Foobar
(foo_nbr CHAR(5) NOT NULL PRIMARY KEY,
 
a INTEGER,
 
b INTEGER NOT NULL,
 
c INTEGER NOT NULL);
 
INSERT INTO Foobar
VALUES ('Fred', NULL, 12, 10),
       (
'Teri', 14, 12, 11),
       (
'Mike', 3, 12, 12);

Then run the following query:

SELECT foo_nbr, a, b, c
  
FROM Foobar
  WHERE a < b;

This query will return the result set: ('Mike', 3, 12, 12) because the WHERE clause returns UNKNOWN for ('Fred', NULL, 12, 10) and FALSE for ('Teri', 14, 12, 11).  The UNKNOWN is treated as a FALSE in the DML.  That should not surprise even a beginning SQL programmer.  But now, let's add a CHECK () constraint and see how the rules change. 

DROP TABLE Foobar;
CREATE TABLE Foobar
(foo_nbr CHAR(5) NOT NULL PRIMARY KEY,
 
a INTEGER,
 
b INTEGER NOT NULL,
 
c INTEGER NOT NULL,
 
CHECK (a < b)); -- same search condition as query

Now load the table with the new constraint on it:

INSERT INTO Foobar VALUES ('Fred', NULL, 12, 10); -- accept, UNKNOWN
INSERT INTO Foobar VALUES ('Teri', 14, 12, 11); -- reject!  FALSE
INSERT INTO Foobar VALUES ('Mike', 3, 12, 12); -- accept, TRUE

The CHECK () constraint clause returns UNKNOWN, as per the usual rules of SQL's Three-Valued Logic (3VL). However, the rules for DDL are nor the rules for DML.  The CHECK () constraint treats UNKNOWN and TRUE the same.  Thus, the row ('Fred', NULL, 12, 10) can go into the table, but the query will not return it in the result set. 

The reason for this is to allow the tables to hold NULLs without having to write complicated search conditions, which would be a mess of "CHECK (<some constraint on my column> OR <my column> IS NULL)", all over the schema.  It gets even worse for multi-column constraints because you would have to consider all the possible combinations of NULLs and values in that table.  The idea is that if you wanted to prohibit NULLs in a column, then you would have used NOT NULL in the declaration for that column.  So you get the "benefit of the doubt" when you leave it off.

NOTE:
As an aside to SQL Server programmers, there is only a NOT NULL constraint in Standard SQL and NULL is strictly dialect. 

CHECK () Constraint Basics

New SQL programmers do not appreciate the fact that, in modern SQL products, the search conditions in the CHECK () clauses are passed along to the optimizer for queries, inserts, updates and deletes.  They consider them to be data integrity features only.  Yes, data integrity is important -- nay, vital -- to the RDBMS, but it is nice to get a performance bonus. 

The biggest mistake is not to use all the CHECK () constraints you can put on a table.  From the data integrity side, this means that all or most of the business rules are in one place.  An application program or direct query tool cannot override these rules. 

Another feature that is often missed by less-experienced SQL coders is that constraints can be given names.  These names are global to the schema and are not at the table level.  The syntax is simply:

"CONSTRAINT <constraint name> CHECK (<search condition>)"

The name will appear in error messages when the constraint is violated.  There two reasons that the names are global.  First, local names would be confusing when you have a multi-table query or statement.  Secondly, in Standard SQL, there is a CREATE ASSERTION statement that lets you put a CHECK () on the schema as a whole.  If a table is empty, then all constraints are TRUE in SQL. This lets you handle empty tables, constraints that deal with multiple tables, and so forth, in one place. 

A single column can have more than one CHECK () constraint on it.  A more sophisticated error is to lump all of the rules into one honking long CHECK () that has a list of search conditions separated by OR's and a vague name like "bad foobar code" that does not give enough information to be helpful.  But if each of these checks were placed in its own constraint, then the constraint names would give the end user some extra help -- "foobar code too high", "foobar code too low", "foobar code <= 0", or whatever.  It also means that when a rule changes, only one constraint has to be altered. 

If a CHECK () constraint involves one, and only one, column then it can be placed on the end of the column declaration before the comma that separates the column declarations in the CREATE TABLE statement.  If a CHECK () constraint involves more than one column then it has be placed by itself, separated from the column declarations in the CREATE TABLE statement.  Some people have trouble finding the missing comma in multi-column constraints since the error message in most SQL products just says something about a bad column reference and does not suggest the missing comma.  Technically, all CHECK () constraints can be done with the stand-alone syntax, but the single column syntax keeps the constraint next to that column and makes "cut & paste" a bit easier.  The multi-column constraints can appear anywhere in the CREATE TABLE statement, but try to put it near the columns involved. 

CASE Expressions in CHECK () Constraints

The first advanced trick you can do with a CHECK () constraint is to use a CASE expression to build in complex logic.  Many programmers do not think of this trick because, in their original programming language, there was a CASE statement and not a CASE expression.  The CASE statement (also known as CASE in Pascal and ADA, switch in C, computed GOTO in COBOL and so forth) is a "flow of control" construct, like the IF-THEN-ELSE statement in all procedural programming languages.  Declarative programming languages have no control flow by definition. 

The CASE expression has two forms and it returns a value of one data type.  For constraints, the searched case expression is the most useful.  Here is the BNF for it:

 <searched case> ::=

    CASE

      WHEN <when operand> THEN <result>

      ..

 

      [ELSE <result>]

    END

The first step is to look at all the THEN clauses and ELSE to find the highest data type in them; that is the data type of the expression.  It is easy to screw up by not watching that behaviour and putting something like "ELSE 'Not found' END" which will promote everything to a CHAR (n) data type. 

The WHEN...THEN...clauses are executed in left to right order.  The first WHEN clause that tests TRUE returns the value given in its THEN clause.  And, yes, you can nest CASE expressions inside each other.  If no explicit  ELSE clause is given for the CASE expression, then the database will insert a default ELSE NULL clause.  If you want to return a NULL in a THEN clause, then you must use a CAST (NULL AS <data type>) expression.  I recommend always giving the ELSE clause, so that you can change it later when you find something explicit to return. 

In the CHECK () constraints, you assign the results some constant value, such as 'T' and 'F' for TRUE and FALSE.  This lets you do complex logic in the DDL, such as:

 CONSTRAINT special_foobar_rule
CHECK (CASE
  
     WHEN foobar_code = 0
            
AND floob_score > 10 THEN 'T'
  
     WHEN foobar_code = 1
            
AND spy_nbr = '007' THEN 'T'
      
WHEN foobar_code = 2
            
AND arrest_cnt > 3 THEN 'T'
        
..
      
 ELSE 'F' END = 'T')

The SQL Standard defines other functions in terms of the CASE expression, which makes the language a bit more compact and easier to implement.  The COALESCE () expression returns the first expression in its parameter list that is not NULL and cast it as the highest data type in the parameter list. If the list is all NULLs, then the result of the expression is NULL.  For example, to assure one or more of several options is given, you can use CHECK(COALESCE (option_1, option_2, option_3) IS NOT NULL)

Basic DRI Actions

Declarative Referential Integrity (DRI) uses the PRIMARY KEY or UNIQUE constraints on one table (called the referenced table) to assure that matching columns in a second table (called the referencing table) have the same values.  The referenced and referencing table can be the same table, but ignore that for the sake of this discussion.  For example, if we have a business rule that we do not sell things that are not in inventory, then the skeleton schema might look like this:

CREATE TABLE Inventory
(product_id CHAR(15) NOT NULL PRIMARY KEY,
 
..);
CREATE TABLE Orders
(order_nbr CHAR(15) NOT NULL PRIMARY KEY,
 
product_id CHAR(15) NOT NULL
    
REFERENCES Inventory (product_id)
    
ON UPDATE CASCADE
     ON DELETE CASCADE
,
 
..);

People often have trouble with this construct because they think of a table as a file.  Files are separate units of storage, while a table is a part of a schema; the schema is the unit of storage.  There are also DRI action subclauses which can make changes in the referenced table to all the referencing table's matching columns.  The most common case is shown on the Orders table of this example.  When an inventory item changes its product_id, then all of the references to it change to the new value -- ON UPDATE CASCADE.  When an inventory item is deleted, then all of the rows in the referencing tables with that product_id are also deleted -- ON DELETE CASCADE. 

This obviously saves a lot of programming on the application side, but it also tells the optimizer about the relationship among the tables, so it can pick a better execution plan. 

DRI for Transition Constraints

A transition constraint says that an entity can be updated only in certain ways.  These constraints are often modelled as a state transition diagram.  There is an initial state, flow lines that show what are the next legal states, and one or more termination states.  As a very simple example, consider your marital life.  Notice that we have to start with Born and you are single.  It is important to have one initial state, but you can have many termination states.  For example, after you are born, you can die or get married, but you have to be married to get a divorce.  The state diagram is shown in on the right. 

In this example, we have only one termination state, Dead.  Let’s start with a table skeleton and try to be careful about the possible states of our life:

CREATE TABLE Inventory
(product_id CHAR(15) NOT NULL PRIMARY KEY,
 
..);
CREATE TABLE Orders
(order_nbr CHAR(15) NOT NULL PRIMARY KEY,
 
product_id CHAR(15) NOT NULL
    
REFERENCES Inventory (product_id)
    
ON UPDATE CASCADE
     ON DELETE CASCADE
,
 
..);

We are being good programmers, using a DEFAULT and a CHECK() constraint, but this does not prevent us from turning Born directly to Dead, converting Divorced to Married, and so on.  You can actually use CHECK () constraints to enforce our state diagram, but you have to store the current and previous states:

CREATE TABLE MyLife

(..

 previous_state VARCHAR(10) NOT NULL,

 
current_state VARCHAR(10) DEFAULT 'Born' NOT NULL,

 
CHECK (CASE WHEN (previous_state = 'Born'

      
       AND current_state IN ('Married', 'Born', 'Divorced'))
            
THEN 'T'
            
WHEN (previous_state = 'Married'

                  
AND current_state IN ('Divorced', 'Dead'))
            
THEN 'T'
            
WHEN (previous_state = 'Divorced'

                  
AND current_state IN ('Married', 'Dead'))
            
THEN 'T' ELSE 'F' END = 'T')
..);
 

In effect, the state diagram is converted into a search condition.  This procedure has advantages; it will pass information to the optimizer, will port, and will usually run faster than procedural code. 

Another declarative way to enforce Transition Constraints is put the state transitions into a separate table and then reference the legal transitions.  This requires that the target table have both the previous, and the current, state in two columns.  Using this example, we would have something like this:

CREATE TABLE StateChanges

(previous_state VARCHAR(15) NOT NULL,

 
current_state VARCHAR(15) NOT NULL,

PRIMARY KEY (previous_state, current_state));


INSERT INTO StateChanges VALUES ('Born', 'Born'), -- initial state

              
                 ('Born', 'Married'),  
                                (
'Born', 'Dead'),

                                (
'Married', 'Divorced'),

                                (
'Married', 'Dead'),

                                (
'Divorced', 'Married'),
                                (
'Divorced', 'Dead'),

                                (
'Dead', 'Dead'); -- terminal state
 

The target table looks like this. 

CREATE TABLE MyLife
(..

 previous_state VARCHAR(15) DEFAULT 'Born' NOT NULL,
 
current_state VARCHAR(15) DEFAULT 'Born' NOT NULL,
 
FOREIGN KEY (previous_state, current_state)
  
REFERENCES StateChanges (previous_state, current_state)
  
ON UPDATE CASCADE,

..);

If you want to hide this from the users, then you can use an updatable view:

CREATE VIEW MyLife (.., marital_status, ..)
AS
SELECT
.., current_state, ..
  
FROM;

The immediate advantages to doing so are that this will pass information to the optimizer and will port, as with the CHECK () constraint version. However, since the rules are separated from the table declaration, you can maintain them easily. 

A not-so-obvious advantage is that the StateChanges table can contain other data and conditions, such as temporal change data.  Your 'Born' cannot change to 'Married" until you are of legal age.  You cannot go from 'Married' to 'Divorced' for (n) days and so forth. 

 Summary

 "A problem well stated is a problem half solved." -- Charles F. Kettering

This is just a small sample of some declarative DDL techniques you can use in place of traditional procedural code.  I have not gotten to UNIQUE and PRIMARY KEY constraints or CREATE ASSERTION statements yet!  And, while I avoid them as much as possible, there are also many kinds of TRIGGERs that you can add to a table.  Wow!  Maybe there is another article waiting to be written. 

Joe Celko

Author profile:

Joe Celko is one of the most widely read of all writers about SQL, and was the winner of the DBMS Magazine Reader's Choice Award four consecutive years. He is an independent consultant living in Austin, TX. He has taught SQL in the US, UK, the Nordic countries, South America and Africa.
He served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards.
He has written over 800 columns in the computer trade and academic press, mostly dealing with data and databases. He is the author of eight books on SQL for Morgan-Kaufmann, including the best selling SQL FOR SMARTIES.
Joe is a well-known figure on Newsgroups and Forums, and he is famous for his his dry wit. He is also interested in Science Fiction.

Search for other articles by Joe Celko

Rate this article:   Avg rating: from a total of 76 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Transition States
Posted by: timothyawiseman@gmail.com (view profile)
Posted on: Monday, October 27, 2008 at 12:30 PM
Message: That was an awesome article, and I will be sharing it with my team.

One piece that caught my attention was this paragraph "..., but this does not prevent us from turning Born directly to Dead, converting Divorced to Married, and so on..."

From the chart, Born can turn directly to dead and Divorced can go to Married. Perhaps you meant:"..., but this does not prevent us from turning Born directly to Divorced, converting Divorced to Born, and so on..."


Subject: Could'nt agree more ... but
Posted by: randyvol (view profile)
Posted on: Tuesday, October 28, 2008 at 12:00 PM
Message: Amen! Could not agree more.
One caveat though. 99.9% of the tables I work with are not my design. They are in the products we have purchased (and that are supported) by commercial product suppliers.

I have found that I can almost guarantee the vendor's support staff will whip out the dreaded, 'gee you changed it so we can't support it' response to even the most un-invasive of table changes.

I tremble with fear when I think of the breakages I could cause with just some simple constraints applied to these tables.

Would that there was some ANSI-standards committee that reviewed commercial products for some bare minimial adherence to some best practices standards for SQL and the data models in these packages.

Oh if only there were such a thing, how easy our lives would be as users and maintainers of such applications.

';-)

Subject: Nice headshot
Posted by: Adam Machanic (view profile)
Posted on: Tuesday, October 28, 2008 at 4:06 PM
Message: I like the new headshot, Joe!

Subject: More Detail
Posted by: BuggyFunBunny (not signed in)
Posted on: Wednesday, October 29, 2008 at 8:36 AM
Message: For a more thorough treatment of State Transitions, see the recent text by de Haan and Koppelaars, "Applied Mathematics for Database Professionals". Approved by Date and Darwen, no less.

Subject: Nice. But What About the Front End?
Posted by: JJEugene (view profile)
Posted on: Wednesday, October 29, 2008 at 10:56 AM
Message: The designs for controlling transition constraints are very appealing. It seems very neat from strictly a database standpoint.

After reading it, I was trying to imagine implementing such a design in a real application. I found myself wondering what kind of code would be needed in the front-end to make this work. (If it isn't a practical approach for a front-end, it is not a good idea for real-life.) Here's my initial thoughts:

Every time the user makes a change, the front-end app would have to manually put the original value before the user makes the change into the "previous value" column. (Or maybe this could/should be done by trigger to make sure it happens, but the front-end still needs to grab the before-update value for data validation purposes?) Then, to have a user-friendly app (which means never confusing the user with an ugly database error), the app would have to check to make sure that the new state combo of previous value and current value is valid before allowing the change.

If you are going the route of storing current and previous values in the main table in order to be able to use the database to control state changes, it seems to me that you would definitely not want to implement the first idea of using a CASE statement to enforce the rules. Because if you did, then you would be forced to code the rules in two places (if you want a user-friendly application) - once in the CASE statement and once separately in the front-end. On the other hand, if the valid states were stored in a separate table, then both the database schema and the front-end could check against the same place/table for the valid data combinations. Just a thought.

Also, I think there is an error in the article. Look at the code just *above* this sentence: "We are being good programmers, using a DEFAULT and a CHECK() ..." That code is still referencing the inventory and orders tables. Not the new examples for that part of the article.

Thanks for the article. I look forward to reading more.

Subject: A Reply to JJEugene
Posted by: P de heer (not signed in)
Posted on: Thursday, October 30, 2008 at 8:12 AM
Message: What you describe is all too common, but sometimes it can be dealt with in a way that is not user unfriendly. After all the contraints make sense and if a user makes a deliberate error, it it not unfriendly to point that out afterwards.

What can be done is this:

Make sure the contraints are uniquely named and that your application can recognise them from the errors returned by the DBMS.

Then when you save the input and cause an error, you search for the error identification code and present a user friendly message. This message ofcourse can be stored in the DB too, so it is still in one place.

This is not a perfect solution either tho...considder the case when a user makes several mistakes, each causing a constraint to block save the action. The user gets first one message 'you cannot do this', then corrects and saves again, only to get another message. The view of the user about what he can do becomes quite fragmented, even directionless!

In the end the user just experiences your application and not the layers that make up this application so won't appreciate the declarative elegance at all!

This is one of the reasons, as you pointed out, that a lot of validation logic is put in front-ends where you rather not see it so much. Another cause for it is that user input forms tend to be 'felxible', adaptive to what value you just have chosen. An example is a country selection, followed by location and postal/ordering details that differ per country. All the knowledge must be baked into the front end in this case, duplicating any constraint you might have in the relevant DBMS scema.

I think its a broader design problem and that the schema modeler should just focus on the data integrety and performance side of things. If the consequence is that you need to duplicate logic in multiple layers of your applications, so be it.

In most cases handling only the obvious/common things that occur during normal use should be in the interface IMHO and the rest should be handled by the layers below it, the DBMS being the lowest layer that guarantees integrity. This allows for some specialisation (less overlap in logic) and still offer a good end-user experience.


Subject: Good Example, But Not Real-World (I hope!)
Posted by: Anonymous (not signed in)
Posted on: Thursday, October 30, 2008 at 9:05 AM
Message: The MyLife example is a good one, except it should never be implemented with no Dead exit in a real world system.

I work for a small healthcare provider. We submitted a bill to Medicaid for one of our clients and the the response was that the claim was denied because the client was dead. The system, as implemented by my predecessor, duly marked the client as dead in our database, thus canceling all future appointments. Imagine our surprise when the client, obviously alive and well, walked in for the next appointment! Needless to say, I had to MANUALLY modify the database, marking the patient as alive, and disable the logic which copies "client is dead" from claim denials to our database. I have been advised that it may take months for the Federal government to acknowledge that the client is still alive! In the mean time, all claims will be rejected and we will not get paid for the services we are required to provide.

Subject: A Reply to P de heer
Posted by: Jeff B (not signed in)
Posted on: Thursday, October 30, 2008 at 9:20 AM
Message: It's possible to use a UDF (user defined function) in the definition for the check constraint. The front end app can call the UDF to validate the data. The problem with this approach is the UDF will be called twice (once by the app and then when the check constraint is validated), but for simple constraints it will run efficiently. The UDF parameters will be the columns used in the check constraint. The front end app would pass in the form variables for the column parameters. The UDF is also useful for validating the data before adding the check constraint on a table. SQL Server UDFs are infamous for having performance problems so you need to be careful with using this approach. In my experience the flexibility is worth some performance degradation.

Subject: Client - database interface
Posted by: BuggyFunBunny (not signed in)
Posted on: Thursday, October 30, 2008 at 9:42 AM
Message: The general approach to constraints by coders is to write, baby, write. Some (java xml zealots, in particular) have adopted what's often called XBML (or similar acronym; extensible business ml), wherein business analysts define business rules (now being called workflows), which end up as javascript in the html. Said html is then generated, twice. Once to establish the skeleton at build time, and once at runtime through a servlet engine under a framework (Struts, JSF, etc.). Non java languages provide similar frameworks.

The alternative is do the build time generation from the catalog. It is just a matter of finding the text for foreign keys, triggers, et al and making that into javascript. Not something I would want to do, but others have. Andromeda is one example. So is Middlegen. So is Rails.

Using this code generation approach, the catalog becomes the constraint set of record, rather than a piles of stinking code replicating themselves all over the place. Makes coders less needed, which makes them nervous, which makes them less than enthusiastic about such an approach.

Yes, the constraints exist both on the database and on the client. The set provided on the client is a *convenience* to the user. Nothing more. Back when *nix/databases ran connected to VT-100s on RS-232, the client and database existed in the one machine, making the data path rather short. With such character mode terminals, ease of use and performance exceed anything on a browser (a block mode interface with local editing; 3270 replicant).

Subject: Thanks for the Reply
Posted by: JJEugene (view profile)
Posted on: Thursday, October 30, 2008 at 10:24 AM
Message: P de heer: Thanks for the reply. Dialog is appreciated. The purpose of the article was to explain constraints, and I think it did so in a good way. I don't want to detract from the point of the article. At the same time, I don't believe that such teachings should exist completely away from the bigger picture. So, our discussion is a good one.

In reply to: "If the consequence is that you need to duplicate logic in multiple layers of your applications, so be it." I would agree to a point. I would point out that the second example in the article, putting the state logic in a separate table actually serves the goal of allowing one to have the logic in only one place. Where it make sense to put the logic in one place, I believe that is the preferable solution by far. Thus, between the two choices presented here, I think one is good for learning and an intellectual discussion, but one is much better for real world implementation. That was my main point.

Another comment that someone posted is that the logic could be contained in a user defined function that is called both by the constraint and the front-end. That would certainly be another way to go. Though if I were doing that, and if there were enough states, I would probably still want the function to check against a table. And if it were checking against a table, then you might as well implement the way this article suggests.

Just some more thoughts.

Subject: Creating constrained data systems instead of constrained data stores
Posted by: http://techsentinel.wordpress.com/ (not signed in)
Posted on: Thursday, October 30, 2008 at 10:33 AM
Message: I prefer a different approach for state-transition modeling. Using a table driven model may lead to violations of 3NF when columns supporting the states are added to the table. If tables are added for each state instead, then the state related columns will naturally be added to those tables. Also, separate tables for states allows for state data to be used directly without need of going through a non-primary key index that would likely be put on the state field of the table in your example. Also, you have Married and Divorced states ending with Death, whereas I would model it with either Birth ending in Death (assuming that birth is tracked separately from life) and Marriage ending in Divorce.

This brings up the final issue, should we constrain the tables to model the born->married, married->dead, and divorced->dead transitions? I believe we should not model these transitions with DRI, but instead should codify these inside stored procedures. Then these procedures must be set as the only means of altering the table data(i.e. do not grant insert, update, and delete on the tables).

With these change, instead of having some properly constrained tables, you will have a properly constrained data system that models the state-transition process that you have described. The only thing left is to add views to the system which join the tables and generate a user friendly state field, if desired.

It is sad that in 2008 that it is still necessary to write about check constraints and referential integrity. Unfortunately, most IT people are not listening. Application developers seem content to allow user interface or business logic code to attempt to maintain data integrity, even when it cannot. Database administrators are focused elsewhere on database management and maintenance, hoping not to get paged at 3AM again because the system ran out of storage. The missing element is someone that understands the entire system. Data integrity is not something that can be slapped onto the system after completion; likewise application flexibility is not a natural outcome from the agile hack and slash style of development.

Example Schema Pseudo code:
CREATE TABLE MyLife
(
MyLifeKey ...,
MyLife Columns...
)

CREATE TABLE MyLifeBirth
(
MyLifeKey ...,
MyLifeBirth Columns...
FOREIGN KEY (MyLifeKey)
REFERENCES MyLife(MyLifeKey)
)

CREATE TABLE MyLifeDeath
(
MyLifeKey ...,
MyLifeDeath Columns...
FOREIGN KEY (MyLifeKey)
REFERENCES MyLifeBirth(MyLifeKey)
)

CREATE TABLE MyLifeMarriage
(
MyLifeKey ...,
MarriageDate DATE NOT NULL,
DivorceDate DATE NULL,
MyLifeMarriage Columns...
PRIMARY KEY(MyLifeKey, MarriageDate),
FOREIGN KEY (MyLifeKey)
REFERENCES MyLifeBirth(MyLifeKey)
)

PROCEDURE CreateLife()
PROCEDURE Birth()
PROCEDURE Marriage() -> validates that life is born and not dead
PROCEDURE Divorce() -> validates that person is not dead and that the particular marriage record exists, then updates this marriage record. If no record exists, then raise exception.
PROCEDURE Death()

Subject: When Dead Is Not Really Dead
Posted by: JJEugene (view profile)
Posted on: Thursday, October 30, 2008 at 11:24 AM
Message: Anonymous raised an excellent point. The article talks about how to use constraints to enforce "transitions". The text talks about the importance of termination points. I'm going to argue that perhaps there should never (take that work with a grain of salt) be termination points. Dead is not always dead:

One thing I have seen missing from applications again and again and again (they never seem to learn) is the need to deal with user and process error. Someone could be marked dead by accident. There has to be a way to un-do a mistake. By that logic, there should never be a true termination state, because there always needs to be a way to un-dead someone. Or to un-do some other mistake, like go ahead and let someone go from "married" back to "born".

But if you start allowing these kinds of changes, what is left to enforce? Do you have a free-for-all? I'd say that there could still be some rules that might be absolutes. For example, we might say that one could never go from born directly to divorced no matter what. And that the reverse is also true. If the person mistakenly ends up in "divorced" and should be in "born", it might be OK to force the user to first set the person back to "married" before going back to "born". So, that it is OK to enforce a rule that says a person can not go directly from divorced back to born.

The point here is that in real life, one needs to be a lot more permissive of data changes than the simple example shown in the article. 99.9999% of the time, you must have some way that will allow users to correct mistakes. That doesn't invalidate the points in the article. It is just a caution to think about this very important design issue. It gets missed in many different ways in many applications.

Subject: Thanks
Posted by: Jim K. (not signed in)
Posted on: Thursday, October 30, 2008 at 4:05 PM
Message: Nice article Joe. I've used case statements in SQL queries many times but never considered using it in a CHECK statement. Now that you've shown it to me it is very obvious and very useful.

Thanks.

Subject: Back in the real world
Posted by: Sean (view profile)
Posted on: Thursday, October 30, 2008 at 5:28 PM
Message: Nobody here has mentioned in their evangelizing of constraints that constraints can change over time, but historical data shouldn't necessarily follow.

When this happens the constraint is no longer a constraint, it's a point in time business rule. And that's what most people here don't seem to get. Data integrity and business rules are not the same thing. Sure your birth marriage divorce death example is fairly immutable (not really, but for the sake of argument), but the vagaries of whether a parent needs to have their education level recorded for jurisdictional reporting in a school system will change based on the government or education framework in place.

This whole "do it in the database!" approach doesn't seem well connected with reality.
It's not programmer vanity or paranoia DBAs are going to put them out of a job.

Subject: DDL = Data Definition Language
Posted by: Anonymous (not signed in)
Posted on: Thursday, October 30, 2008 at 8:14 PM
Message: DDL = Data Definition Language, not Declaration language.

Subject: Back in the USSR
Posted by: Anonymous (not signed in)
Posted on: Friday, October 31, 2008 at 10:12 AM
Message: @Sean: No big deal. There are two cases: a constraint is removed, a constraint is added (changing the text of an existing constraint is still one or the other).

Case one: there are two cases: existing rows must still meet the removed constraint (not very intelligent, really), existing rows must meet the new constraint. In the former case, some data must exist to demark the old rows from new ones, and the new constraint encompasses both types of rows using this data, enforcing differentially. Case two: do nothing.

Case two: there are two cases: existing rows need not meet the new constraint, existing rows must meet the new constraint. In the former case, do nothing (constraints generally are applied to inserted rows, if the new constraint applies to Update, then differential application as above). In the latter case, write a stored procedure to check that existing rows meet the new constraint, and update such rows with defined default/replacement values. Such values must be known to those creating the new constraint.

As Date says: a row is a Business Rule. Coders who view databases as merely file systems fronted by a SQL parser tend not to see this. The point is: the only way to efficiently and accurately defend data from mangling is to have *one* set of constraints of record in one place. Dr. Codd figured out that the datastore is the *only* candidate for that role. He had had is fill of IMS DL/1 programmers. javBOL coders of today are similar.

Subject: Clarifying point - your design will naturally lead to insanity
Posted by: http://techsentinel.wordpress.com/ (not signed in)
Posted on: Saturday, November 01, 2008 at 9:40 AM
Message: Unfortunately, your design will lead to significant problems because born/dead and marriage/divorce are two entirely separate concepts. Because you are only modeling two concepts with your design, you are able to get away with this mistake. Once you add another concept, the insanity will become obvious. What happens if you add an employed/unemployed state set? Now would a person that is employed be single, married, or divorced?

Your desire to have data properly constrained is noble, but another critical problem with database design is the mistaken generalization of the data. Generalized status and state fields should be avoided unless there is no other reasonable design available.

Subject: Oslo?
Posted by: BuggyFunBunny (not signed in)
Posted on: Monday, November 03, 2008 at 11:57 AM
Message: By a circuitous route, I found Oslo. A search here yielded only a bad link to a November, 2006 snip; which didn't read like this thread.

Perhaps Phil or someone might expound (assuming someone knows more about it than Claudio Perrone, whose site had the link).

Subject: Anonymous commenting disabled
Posted by: Chris Massey (view profile)
Posted on: Thursday, November 13, 2008 at 11:46 AM
Message: I'm afraid anonymous commenting has had to be disabled in this article, due to spamming. If you want to leave a comment, you'll have to sign in or sign up. Sorry for the inconvenience.

 

Related Articles

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

SQL Server XML Questions You Were Too Shy To Ask
 Sometimes, XML seems a bewildering convention that offers solutions to problems that the average... Read more...

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.