Tony Davis

Simple-Talk Editor
News, views and good brews

Working within Constraints

Published Thursday, July 09, 2009 12:03 PM

There are many good reasons for building your data integrity logic into constraints. When such business rules become part of your DDL, they can never be circumvented. Any change to a rule can be enforced by a single update to a constraint in the database. It is a simple and safe architecture for managing your data integrity. But is it always that simple in practice?

It is much better to put this logic in the database than to have similar logic spread scattergun throughout the many tiers and applications that access the database. With the latter approach, one then faces the task of ensuring that the rules are consistent in all places, and that no process can bypass the checks.

However, even with all your integrity logic safely corralled into the database DDL, things can still get complicated, and difficult to track, pretty quickly. Most DBAs will employ the full range of simple CHECK, DEFAULT, UNIQUE and KEY constraints that they have available. They may also have a few more complex rules to enforce that require UDF-based constraints. Of course, there are some checks that you can't realistically do in constraints, so you'll have some triggers thrown into the mix as well. In addition, you may have some "cascading" Referential Integrity requirements (such as "cyclic cascades") that are not supported in SQL Server, so you'll implement these rules using stored procedures.

At this point your logic is beginning to look a little less neat. Although it is all within the database, it can become very easy to forget or miss something when the business rules change and need to be updated. At least, it will take some time to locate all the places where changes need to be made.

If it is not possible to satisfy all integrity rules using constraints alone, then should one, instead, implement them all using triggers, so that all the rules really are in one place and can be maintained easily? This wouldn't be a free lunch. The cost-based optimizer uses the information it finds in constraints to optimize query execution. For example, if a query being optimised uses the LOWER function, and you already have a constraint in place on the column that restricts all entries to lowercase, then the optimizer can, if conditions are right, ignore the function altogether and arrive at a faster execution plan using just the constraint.

How much performance are you actually missing out on? After all, the optimizer won't always use constraints, if they are not trusted because they were disabled at one point, or if they use functions. Does it outweigh the relative advantage of having a single, consistent "layer" of triggers, for example, to implement your rules?

What would you do? What is your overall strategy towards implementing, and easing the management of, business rules in the database?

Cheers,

Tony.

Comments

 

BuggyFunBunny said:

As Chris Date has said:  a row in a table is a business rule.  

As you say, the decision comes down to:  store the constraints (however defined) in the database with the data, or scattered out in some "business layers" (sounds like chicken to me).  Dr. Codd never had such in mind.  In fact, he objected explicitly.

So, what to do, what to do.  Prior to the recent, and accelerating, availability of inexpensive (relative to DRAM types before) flash-based SSD storage sub-systems, high normal form schemas were dismissed as "too slow in the joins".  Joins are no longer slow.  Refactoring existing applications (which would be implemented with views defining the applications' old image of the unnormalized tables) may not be warranted, but then, it may be; that will require cost/benefit analysis.  

Some existing examples:  Andromeda, which utilizes an out-of-catalog meta-data facility; sprox, which is a Python based generator from the catalog; and a text, "Applied Mathematics for Database Professionals", which builds an explicit modeling methodology.  

The last issue is who should be in control of constraint modification.  I once worked in insurance software, and there was a movement to "push" business rule definition to the Business Analysts (an amorphous term, but basically means the subject matter experts, so they think).  In that milieu, schema migration is needed for those cases where the constraints are not table (normalized) data; check, default, triggers, and the like.  Rails has had that for a while, but it should not be a really big deal to implement.  

In the end, there is less long-term pain where constraints reside with the data.  There is also far less code to maintain; a fact not lost on coders, who continue to dig in their heels.  As they say in old Westerns, "take 'em out back of the barn and shoot 'em".
July 10, 2009 10:38 AM
 

Dnomyar said:

After being inspired by the article on Niklaus Wirth, may I say the whole business of constraints, triggers UDF's and stored procedures is an abomination in the eyes of the God of Ultimate Software.

Like C++, MS Sql (though better than many of its peer products) has been designed by the programmer, not by the educator. The whole concept of having 4 or five separate 'objects' that are supposed to work together in 'concert' through  the unreliable ministrations of stressed out programmers is totally crazy.

Where is the tightly coupled, intelligent, flow control interface  that not only provides full documentation but also easy maintenance on all LOGIC within a database ?

The  object dependancy tree is a pale ghost of the ultimate design and documentation tool that will rise from the smoking ruins of the great meltdown of this worlds dreadfull database systems. Yeah verily - database diagrams are but a faint whiff of the wonderfull aroma of a graphic based, truly intuitive real time design and schematic flow interface.

Why are the oppressed millions groaning under the agonising weight of the Inner/Outer/Total Join demon, the crazy Trigger Rules, 4 types of UDF's. For goodness sakes - if the databases are smart enought to work to work out an execution plan, why the &^%%$% do we have to decide what data type is required. Let it look it up from some sample data!

I say - forsooth - call in  the educators and get some intelligent design happening here! Tell Bill gates to drop Malaria research, and put some money into software that I would want my Grandchildren to work with.

Yeah - and while we are at it, lets get Obama to legislate for all employers to provide programmers with dual 22" screens!
July 12, 2009 11:18 PM
 

Greg Gaughan said:

You wouldn't just be missing out on performance. Here's why you should use declarative constraints instead of triggers:

Constraints form part of the metadata for your database model. This can be used not only by the server but by modelling and development tools.

Triggers are procedural code. Constraints are declarative. When given the chance, choose declarative over procedural: declarative is more expressive with no need for iteration, branching or cursoring and so is more maintainable with less chance of bugs.

Declarative constraints are portable. Trigger implementations vary greatly from product to product in syntax and semantics. Triggers have been standardised only recently whereas declarative constraints have been part of the SQL standard for much longer and have well-defined behaviour.

Triggers are side-effects and because they are hidden they are difficult to debug and tune. Once they're developed they're easy to forget about and, because they can do almost anything, they can cause some very strange behaviour - especially involving multi-user locking and unintended cascading.

When a constraint fails, it can return a clear indication of what went wrong, e.g. "Parent X cannot be deleted because its child still exists in table Y", and such errors are surfaced using the usual error handling routes. Error handling in triggers tends to be vague, patchy and often overlooked.

Trigger implementations themselves are more likely to have buggy or undefined behaviour than the constraint-checking machinery.

Declarative constraints can be deferred to allow some chicken-and-egg updates which wouldn't be possible with triggers.

The server has the opportunity to use constraints to apply semantic optimisation (if not now then in future releases). These can give big wins. For example, if the optimiser knows two tables are related by a foreign key constraint then in some cases it can omit some joins altogether.

Internally, the server can short-circuit the evaluation of some constraints and ensure that the best plans are used. The server can choose when to evaluate a constraint. Sometimes, checking early at the row-level can avoid rolling back lots of updates; other times checking once at the end of a statement can be much faster. Row-level checking can access already existing contexts and so avoid building the select/from/permission apparatus that would be needed by a trigger. The server knows when such shortcuts are possible depending on whether the update involves primary keys (e.g. to avoid halloween problems: http://en.wikipedia.org/wiki/Halloween_Problem) or is a self-referencing foreign-key etc.

Triggers are hard to get right, and many people get them wrong or implement them poorly using row-level cursoring. A straightforward foreign key constraint is simple to declare but would require two triggers. One would need to check the parent(s) exists on insertion of a child or children. The other would need to raise an error, delete or update the child(ren) on deletion of one or more parent. The two triggers must deal with complex timing, locking and isolation issues. They have to deal with aliasing ambiguous tables and columns and correctly handle all the subtleties of matching nulls. And if the parent and child table are one and the same then I think the database constraint handling stands a chance of getting it right, but the triggers don't.

So only use triggers if you really must.
July 14, 2009 12:40 PM
 

Ian Ringrose said:

Users do not like getting error messages that are hard to understand after they press OK on a form.  Therefore all constraints need to be checked in the UI.

However as a computer programmer, (not a dba) I do not trust any programmer to correctly check 100% of the constraints in 100% of the UI.  UI code tends to change often and there can be many paths in the UI for setting the same column.

I have also see customers try to bypass the UI to make balk changes to data etc, or import data from other systems.

So constraints must therefore go in the database were possible.  But how can I then avoid having to write then some constraint in many places?

Declarative database constraints can sometimes be automatic passed to the UI, e.g. the UI can be told the width of a text column, so it can enforce the same width.   If only cases this simple were easy with today’s UI toolkits / data access systems.

I am coming more to the view that a domain specific language should be used to generate the database constraints AND drive as match of the UI validation as possible.
July 15, 2009 11:32 AM
 

BuggyFunBunny said:

@Ian:

Sending bare sql error codes is not what those of us who champion UI generation from the catalog have in mind; at least I don't.  Andromeda and sprox generate UI edits from the catalog (either directly or indirectly).  There have been others, and from what I've read, Hibernate is going down that track.  UML is traditionally the DSL for defining constraints.  Some think that is either overkill or wrongkill, but there you are.  ERwin, Rose, and the like more comfortable to some.
July 16, 2009 8:24 AM
You need to sign in to comment on this blog


















<July 2009>
SuMoTuWeThFrSa
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678
Finding Stuff in SQL Server Database DDL
 You'd have thought that nothing would be easier than using SQL Server Management Studio (SSMS) for... Read more...

Mission Critical: SQL Server 2008 Performance Tuning Task List
 In which Buck Woody imagines how the US military would have tackled DBA checklists for... Read more...

Simple Query tuning with STATISTICS IO and Execution plans
 A great deal can be gleaned from the use of the STATISTICS IO and the execution plan, when you are... Read more...

Switching rows and columns in SQL
 When they use SQL Server, one the commoner questions that Ms Access programmers ask is 'Where's the... Read more...

Writing Efficient SQL: Set-Based Speed Phreakery
 Phil Factor's SQL Speed Phreak challenge is an event where coders battle to produce the fastest code to... Read more...