Bad data always seems to appear when, and where, one least expects it. Sam explains the great value of a defensive approach based on constraints to any team that is developing an application in which the data has to be exactly right, and where bad data could cause consequential severe financial damage. It is perhaps better seen as creating a test-driven database
Automated Unit Testing for database development has recently been promoted via the Agile Programming Methodologies. Although Test-Driven database development is an excellent discipline to adopt, it should be considered alongside the well-established practice of using Database Constraints. By using constraints, we already have the means to create test-driven databases. Database Constraints are, essentially, just tests. They are tests of Data Integrity. They also conform to Agile testing practices in the sense that they are usually designed first, before you write a single line of code, in a similar way to “Test-First Development”.
Database Constraints offer enormous value to any development project. Opting not to use them is equivalent to deciding not to test the output of your code.
Why are Constraints Important?
By definition, a “Database Constraint” is something which restricts the possible values which the database will accept. Constraints are declarative, which means that you declare them as rules that MUST be adhered to by the entire system you are developing. They are also bound to the database structure and fire automatically when needed, which makes them difficult to violate. Any violation of these constraints results in a “hard failure”, which returns errors to the application and forces the application-developers to deal with the issue immediately.
Database Constraints are a valuable by-product of the practice of identifying and clarifying beforehand those data rules that you need to follow before you develop the rest of your application. These rules depend on the nature of the business being supported by the application. They could be as trivial as checking the validity of an email address or zip code or as complex as checking for compliance with IRS Tax Rules. This process encourages the team to come to a consensus about what the business rules mean, which is surely one of the most demanding parts of any IT project.
If you can enshrine these rules in constraints, the database will alert you if you happen to violate any of these business rules in the heat of the development effort, and make you fix the issue before it gets out to the customer.
Large software teams in complex projects are faced with the daunting task of ensuring that everyone who is involved in the project has a clear comprehensive idea of all the rules that the system must adhere to throughout the development cycle. With constraints in place, this task becomes largely unnecessary because database constraints provide a safety net which gives the development team the confidence that they could not unwittingly violate any business rules.
So What Are Constraints?
There are several different types of database constraints:
- Primary Key & Unique Constraints – enforce uniqueness of rows in a table
- Foreign Key Constraints – enforce Referential Integrity (“lookup” values)
- DataTypes – enforce proper data type values (date, number, string, etc.)
- Nullability Constraints – determine whether a value is mandatory or not.
- Check Constraints – more flexible constraints that can be bound to a column or table and can be used to enforce a wider variety of constraints that cannot be enforced in any of the above constraints. However, the SQL that can be used in these is limited to smaller expressions.
- Triggers – the most flexible form of constraint. These can only be bound to a table but are free to use the full power of SQL, almost like a Stored Procedure.
What about Stored Procedures? Although they certainly aren’t constraints, these are often used to perform data validation tasks that are impossible to implement by a constraint. They are not bound to the database design, which makes them easy to use “after the fact” when you realize that the database design is faulty and you have to retrofit data validations into it. For the purposes of this article I will not consider these as constraints because they do not fire automatically.
So let's take a simple example. You are building a database for a retail seller and you have an Orders table which stores the orders placed by your customers. In that table you have an Order Date column. When thinking about the database design, you decide that you will not allow orders that have an order-date in the future, so you write a constraint that doesn't allow future dates to be put into this column. In SQL Server, this rule will take the form of a “check constraint” and will look something like this:
With this one simple line of code in your database, you can now stop worrying about this rule. It will not be possible to enter future-dated orders. Any code trying to do so will 'error out' and you will be forced to fix it immediately. You can actually declare to your customers (or anyone else) that your application follows this rule and be completely confident in your statement!
Are Automated unit Tests sufficient?
Why not just use automated unit tests? Well, you should be using automated unit tests, but they will not replace Database Constraints. Database Constraints have a special contribution to the effort to reduce defects in software.
- Constraints fire automatically, all the time whenever the application is used, so you can't get around them. Automated tests fire on demand only, so if you don't run the test suites they can't find any bugs.
- Constraints are shipped out to customers and run on Production environments, as part of the product, whereas Automated Tests are usually only run on Development and Test environments.
- Constraints focus solely on the data, whereas Automated Tests will more likely focus on an entire unit of code (hence the name Automated “Unit” Tests). Therefore, unit tests have a much wider area to cover and will not be able to focus on data as much as constraints will. This is why it’s a good idea to use both constraints and automated unit tests. They have different areas of focus.
- Constraints have “positive” logic; they define what “good data” is and everything else that falls outside of this is automatically considered to be “bad” data. Because of this it is not necessary to find all the different scenarios which might cause bad data. All you need to define is the one (or few) scenarios that constitute “good” data. Automated tests have opposite type of logic; they try to break the application by defining all the scenarios that may be considered “bad”. In my experience there are many, many more such scenarios, and many tests are initially left out of these test suites initially because it is very difficult to come up with all the “bad” scenarios that your customers are going to think of.
- Constraints don't need any additional hardware or separate suites of code to function. Constraints typically have only a few lines of code as opposed to Automated Unit Test Suites which can be huge, and require you to install new technologies and be trained in their programming languages. Finally, you have to dedicate environments to run these test suites and wait until they are finished, which can be anywhere from minutes to hours. Database Constraints don't have any of these problems; they run within your existing code base so no additional hardware or technologies are necessary.
You can get around Database Constraints by dropping them, but in the production environment your customers would not drop DLLs or web pages, and likewise shouldn't drop constraints. If any type of test fails then either the test was wrong or there is a bug in the application. To drop a failing test is a way of “shooting the messenger” and ignoring the true cause of the error. A well-worded SLA will keep customers from dropping constraints if you declare up front that dropping database constraints will invalidate the support agreement.
Constraints don’t necessarily slow the database down. While it is true that constraints have to fire as part of the application, they are designed to do this and are very efficient at it. Constraints can actually speed up the database because the SQL Query Optimizer has access to these constraints and can use them when it decides how to optimize queries. Let’s take the example from above, where we were not going to allow any orders with future dates. If someone then runs a query looking for future dated orders, the database knows that it is not possible to have that data in the database and immediately returns an empty result set, without even trying to run the query.
The Impact of Database Constraints on the Development Process
‘Beating all the points to death’
As with any rigorous test regime, implementing Database Constraints has a profound effect on a software project’s lifecycle. Testing an application is impossible without a sound definition of how an application should function. To implement constraints, the software project has to start with a painstaking Database Design process, during which the customer sees no obvious progress. No screens or web pages are produced during this phase, and the more inexperienced team members can become impatient. Considerable pressure can be put on development teams during this phase to start ‘cutting code’.
All those design debates are necessary because the team will spot issues that customers will bring up later on in development if you don't fix them now. This is the perfect time to argue, since the design only exists as a drawing; if the design is inadequate, all you have to do is erase it from the board! No code needs to be refactored, and no customers are impacted. Thus it makes sense to fully argue and “beat all of the points to death” during this phase. In addition to this, it is extremely important to choose the right people for the design phase, supported by a project manager that understands software design. It is the quality of the people involved that will ensure you don’t end up running around in circles, which is a common pitfall of such design phases.
The constraints should be preserved
Application developers get frustrated with database constraints very quickly because the constraints constantly get in their way. This is understandable; developers want to get their job done as fast as possible, and since many of them are typically not involved in the design of the constraints they don’t understand why the constraints have to be there in the first place. Management needs to facilitate at this point, just as the construction foreman serves as the final arbiter between different parts of a construction project. A decision to drop the constraints of a database is difficult to reverse later on, and the longer you wait the harder it is to justify going back and redoing everything.
Once the application is in production, and data errors do happen, the team may need to make sure that the constraint wasn’t too restrictive, needing to be adjusted to make it more flexible. However, if the constraint is correct and the application has a bug that wasn't caught in testing then that bug needs to be fixed. Whatever is decided, no data needs fixing because the bad data never got into the database in the first place!
However, if a constraint is missing and bad data does get in, it is considered a failure on the part of the requirements team and then the application bug and the data have to be fixed. However, even in such a case this data fix is much more unlikely to be harmful because there are so many other constraints in the database which will prevent the data fix from damaging any surrounding data.
What if you don’t use constraints?
So what happens if you decide not to use Database Constraints? At first, Management is happy with your progress as they see screens or web pages appear quickly in your demos and they even see them working. All is well, and when the product is considered “done” it is tested and shipped out to the customer. Then the troubles start.
Customers will find unexpected and creative ways to use your software and start putting bad data in. Customers might also import data into your application by going straight to the database, thereby bypassing all of your front end and middle tier data validations, and even the ones in stored procedures, because they don't fire automatically. This is why stored procedures aren’t able to implement real constraints. So the company starts to hire support professionals to fix these issues as they come up but, over time, the data issues increase; you add more customers, the product gets more complex, new developers are hired that don't have all the business rules inside their head, and usually there is not enough documentation to give them. So developers violate the business rules, as do support people when they have to fix production data in a high-pressure environment. The database, of course, says nothing about all these violations, since it doesn’t have the constraints. And what do companies do at this point? In my experience they accept this situation as the normal cost of doing business! Everyone complains about how bad the database is, but usually no one does anything about it because at this point it is too late.
In addition to the above, bugs have to be dealt with many times instead of just once, and the “fixes” to the data will likely end up damaging the surrounding data, causing more bugs. I have seen cases where entire departments of support people are hired to deal with these data issues. They end up developing entire libraries of data fix scripts to be run in production, and they run those fixes on a daily or hourly manner. Usually they have someone that manages these script libraries as well, resulting in a type of “parallel development effort”, just to fix the data bugs that the Development department allowed in the product because they didn’t use Database Constraints!
Good software design of database applications is very difficult to achieve without using Database Constraints. Development projects are very complex and one must have separation of concerns in order to make the project manageable. One of the best ways to have this is to research the business rules first and put them in the database in a way that they cannot be subverted. Later on in the project there are many things to worry about; the development of the UI and middle tiers, any frameworks that have to be developed, and, yes, there are even more data validation rules that take place only at the middle tier and UI. You don’t need to add the issue of database validation rules to this long list of problems that the development team has to deal with during the development phase.
I have observed first-hand the cost of deciding not to use database constraints in database design; it results in frustration and confusion all the way from management to the customers. Costs explode as the product scales and the customer base grows. While automated Unit Tests are another weapon in the fight for application quality, they are truly effective only when combined with proper Database Design with constraints. Although the process of getting the design right and resilient against incorrect data during the early phases of design and development is painful and its benefits are difficult to measure, everyone will be happier with the result in the long run.