Av rating:

Total votes: 28
Total comments: 15
Robyn Page's SQL Server Data Validation Workbench
23 October 2006
/*
Checking your data on entry.
Contents
--------
Rules
Defaults
Triggers
Integrity Constraints
Questions
Introduction
------------
If you are completely confident about the data that is being inserted or
updated in a table, then you won't need constraints. Even with a complete
logical interface of stored procedures you'll still require them for
development work because the impossible always seems to happen.
In the real world, however, and particularly if the real world includes
programmers using 'Dynamic SQL', or data feeds into tables and so on,
then they are a vital defence for the integrity of your data.
There are several different ways of putting constraints and checks on data,
and the purpose of this workshop is just to point out the odd quirk that
might be of practical help, whilst running through the list.
Rules, defaults, indexes, and triggers are used for:
. Requirement integrity - a column must have data in every row, no nulls
. Check or Validity - Is the data valid? a valid Email address for example.
Triggers or rules are generally used.
. Uniqueness - no two table rows can have the same non-null values for the
selected columns. Unique Indexes are used for this.
. Referential integrity - data inserted into a table column must already
have matching data in another table column or another column in the same
table.
I have a certain difficulty with this workbench, as the SQL Server team
have given advanced notice of its intention to replace Rules and Defaults,
probably with ANSI Domains. However, until that time comes, they are the
most effective way to implement certain business rules in the database.
One could argue that they shouldn't be used because they are deprecated,
but the fact is that there is no realistic substitute in SQL 2005 for the
functionality, and safety mechanisms, they provide.
Would you refuse to wear a seatbelt, on hearing that a better design
was due in four years time?
Rules
-----
Rules are the best way of ensuring that business rules, especially those
that are liable to change, are implemented in one place only.
As mentioned above, Rules are considered deprecated in SQL Server 2005.
However there is nothing yet that replaces the functionality they provide
until Domains are properly implemented. One should be aware of possible
problems when upgrading. BOL gives the mysterious advice to use Check
constraints instead, as if the two devices had the same functionality.
Rules are used to check data that is being inserted or updated in
columns. If you need to perform a fairly complex check, and it has to
be done in more than one table, then a rule is very handy. One rule can
be referenced from any number of tables or user-defined data types in
the database. You can bind a rule to to a user-defined data type, so it
will then apply to any column where the datatype is used. This is even
more convenient and safe.
The process is...
1/ Create the rule using CREATE RULE.
2/ Bind the rule to a column or user-defined datatype using sp_bindrule.
3/ Test the bound rule by inserting or updating data.
You can create a rule easily in EM or SMSS but don't even think about
altering the rule if you have bound it to an object. The interface was
not implemented with any enthusiasm.
Imagine that you have a database in which you wish to handle postcodes.
Normally, you'd want to put the whole messy business of checking that
the format is valid into a stored procedure, as well as requiring that
the front-end applications do all the appropriate filtering. However,
this isn't always possible. Whatever happens, complex code like this
MUST be kept in one place and heavily documented.
Postcode validation criteria are as follows (slightly simplified)
The entire length must be between 6 and 8 characters
A space must be included
The local (inward) code, to the right of the space, always 3 chars
The first character of the local code must be numeric
The second and third characters of the local code must be alpha
The Sorting Office (outward) code the left of the gap, can be between
2 and 4 characters
The first character of the Sorting Office (outward) code must be alpha
*/
-- we put these criteria, which will define our RULE, in a test harness
-- to develop it and test it as much as possible, as they are not the
-- easiest of objects to change
--start of the test harness
DECLARE @Postcode VARCHAR(10)
SELECT @Postcode='CO10 7SP'
SELECT CASE WHEN
--start of the validation rules
LEN (LTRIM(@Postcode)) BETWEEN 6 AND 8--between 6 and eight chars
AND
LEN(REPLACE(@Postcode,' ',''))=LEN(@postcode)-1--one space only!
--local (inward) code, to the right of the space, 3 characters
AND--The second and third characters of the local code must be alpha
SUBSTRING(@Postcode,CHARINDEX(' ',@Postcode)+1,3) LIKE '[0-9][a-Z][a-z]'
AND--first character of the Sorting Office (outward) code must be alpha
SUBSTRING(@Postcode,1,CHARINDEX(' ',@Postcode)-1) LIKE '[a-z][0-9a-z]%'
AND--Sorting Office (outward) code the left of the gap, between 2 and 4
CHARINDEX(' ',@Postcode) BETWEEN 3 AND 5
--one number followed by two letters.
--end of the validation rules
THEN 'good' ELSE 'bad' END
--end of the test harness
/* Now we create the RULE
You can use any name for the argument (we chose @Postcode), but the
first character must be "@." Make sure it is something meaningful
You can use built-in functions but anything that references other
database objects is forbidden.
*/
IF EXISTS (SELECT * FROM dbo.sysobjects
WHERE id = OBJECT_ID(N'[dbo].[PostcodeValidation]')
AND OBJECTPROPERTY(id, N'IsRule') = 1)
DROP RULE [dbo].[PostcodeValidation]
GO
CREATE RULE PostcodeValidation
AS
LEN (LTRIM(@Postcode)) BETWEEN 6 AND 8--between 6 and eight chars
AND
LEN(REPLACE(@Postcode,' ',''))=LEN(@postcode)-1--one space only!
--local (inward) code, to the right of the space, 3 characters
AND--The second and third characters of the local code must be alpha
SUBSTRING(@Postcode,CHARINDEX(' ',@Postcode)+1,3) LIKE '[0-9][a-Z][a-z]'
AND--first character of the Sorting Office (outward) code must be alpha
SUBSTRING(@Postcode,1,CHARINDEX(' ',@Postcode)-1) LIKE '[a-z][0-9a-z]%'
AND--Sorting Office (outward) code the left of the gap, between 2 and 4
CHARINDEX(' ',@Postcode) BETWEEN 3 AND 5
--one number followed by two letters.
GO
--then we apply the documentation
EXEC sp_addextendedproperty 'MS_Description',
'length between 6 and 8 characters
Containing one space only
3 characters to the right of the space
of which the first myst be numeric and the other two alpha
Before the space can be between 2 and 4 characters
of which the first must be alpha',
'user', dbo, 'rule', PostcodeValidation
/*
with that done we can then bind the rule to a column of a table. You
can bind a rule to as many columns as you wish within the database:
it keeps everything neat without duplication of the implementation
of business rules */
CREATE TABLE TestOfRule (MyID INT IDENTITY(1,1), Postcode VARCHAR(10))
/*
You cannot bind a rule to a text, image, or timestamp datatype column.
You cannot bind rules to system tables. Still, who cares?*/
--First bind the rule to the table column
EXEC sp_bindrule PostcodeValidation, 'TestOfRule.Postcode'
--Now let's try out some inserts
INSERT INTO TestOfRule(postcode) SELECT 'CM20 3EQ'--OK
INSERT INTO TestOfRule(postcode) SELECT 'CM2 3EQ'--OK
INSERT INTO TestOfRule(postcode) SELECT 'CM 3EQ'--OK
INSERT INTO TestOfRule(postcode) SELECT 'CM2 30EQ'--not so good
INSERT INTO TestOfRule(postcode) SELECT 'CM2 3EQ'--not so good
INSERT INTO TestOfRule(postcode) SELECT 'SQUIFFY'--Not so good
/*---error---!
A column insert or update conflicts with a rule imposed by a previous
CREATE RULE statement. The statement was terminated. The conflict
occurred in database 'master', table 'TestOfRule', column 'Postcode'.
The statement has been terminated.
Now, could we use a function in a rule? This would make it so much
easier to develop and we can do a bit more processing.
Sadly no. A missed opportunity*/
CREATE FUNCTION [dbo].[ufiCleverStuff]
( @string VARCHAR(8000)) RETURNS INT AS BEGIN RETURN 1 END
ALTER TABLE dbo.TestOfRule ADD Address VARCHAR(255) NULL
CREATE RULE AddressValidation AS dbo.ufiCleverStuff(@address)=1
--This will cause an error!
/* However, we can bind the rule to a user-defined data type */
EXEC sp_addtype Postcode,'Varchar(10)',NONULL --NONULL=no nulls allowed
EXEC sp_bindrule PostcodeValidation, 'Postcode'
sp_help Postcode --just to chck that the rule is there (Rule_Name)
--And use it wherever we want
CREATE TABLE SecondTestOfRule (MyID INT IDENTITY(1,1), Postcode postcode)
--and try it out
SELECT * FROM SecondTestOfRule
INSERT INTO SecondTestOfRule(postcode) SELECT 'ME3 5EQ'
--OK
INSERT INTO SecondTestOfRule(postcode) SELECT 'AR3 567'
--didn't like the 567 bit
INSERT INTO SecondTestOfRule(postcode) SELECT 'ME2 30RP'
--not so good
INSERT INTO SecondTestOfRule(postcode) SELECT 'ME 3RP'
--OK
INSERT INTO SecondTestOfRule(postcode) SELECT 'DE52 3EQ'
--not so good
INSERT INTO SecondTestOfRule(postcode) SELECT 'SQUIFFY'
--Not so good
UPDATE SecondTestOfRule SET postcode='SE34 2DS' WHERE myid=5
UPDATE SecondTestOfRule SET postcode='squiffy' WHERE myid=5
--good, it chucked it out
/*
Rules bound to columns take precedence over rules bound to user
datatypes.
*/
--now what happens when we create a variable of the datatype
DECLARE @MyPostcode Postcode
SELECT @Mypostcode='&^bananas****'
/*
Not what you'd expect; it accepts a silly postcode.
There is, sadly, no checking of a rule on variable with a
user-Defined datatype.
Defaults
--------
Defaults and rules seem to have attracted the displeasure of the SQL
Standards committee.
Defaults are deprecated by SQL Server 2005 but are actually rather
useful because they can be 'bound' to a user-defined Data Type as
well as any column in the entire database.
There is no other way of doing this.
Take a common example. I believe that every table should have a
column that gives the insertion date. You can, of course laboriously
set up a default constraint on every table but why not do the
following...
*/
--just create a simple default of the current date and time
CREATE DEFAULT DateOfInsertion AS GETDATE()
--now document it
EXEC sp_addextendedproperty 'MS_Description',
'this is a default supplying the current date',
'user', dbo, 'default', DateOfInsertion
--add a new type called InsertionDate
EXEC sp_addtype InsertionDate,'DateTime',NONULL
--add the default to the type
sp_bindefault DateOfInsertion, "InsertionDate"
sp_help InsertionDate --check that the default is there (Default_name)
--now the whole process is a lot simpler
CREATE TABLE EventLog (MyID INT IDENTITY(1,1),
TheDescription VARCHAR(100),
insertionDate insertionDate)
INSERT INTO Eventlog(theDescription)
VALUES ('the martians are attacking')
INSERT INTO Eventlog(theDescription)
VALUES ('It is the treens')
INSERT INTO Eventlog(theDescription)
VALUES ('Call in Dan Dare')
INSERT INTO Eventlog(theDescription)
VALUES ('Dan Dare to the rescue')
SELECT * FROM EventLog--note that the dates have been filled in
/* but there is a feeling that there is something missing.. Why not
have a nice identity field User-Defined Data Type too? */
EXEC sp_addtype counter,INT,'IDENTITY' -- Sybase only. Sorry
--you'll get an error
/*
Triggers
--------
Triggers can be quite complex, as Pop Rivett explains elsewhere on
this site with his excellent 'Auditing' trigger. We'll only discuss
triggers as data constraints.
Imagine we want to check addresses that are being placed in an address
table. We want to ensure that we've done reasonable sanity checks to the
address. For a start, we believe that an address should contain at least
four words and alphanumeric characters with just a sprinkling of
punctuation
Here is one of Phil Factor's routines (Thanks Phil) */
CREATE FUNCTION [dbo].[ufiWordcountVarchar] --counts the number of
--words in a VARCHAR. We need something like this to check a valid
--address.
(
@string VARCHAR(8000),
@WordStartCharacters VARCHAR(255)='a-z',
@WordCharacters VARCHAR(255)='-a-z'''
)
RETURNS INT
AS
BEGIN
DECLARE @Position INT, --index of current search
@WhereWeAre INT,--index into string so far
@wordcount INT,--the number of words found so far
@next INT, --where the next search starts
@size INT --the total size of the text
SELECT @WhereWeAre=1,@size=LEN(@string),@Position=1,@wordcount=0
WHILE @Position>0
BEGIN
SELECT @Position=PATINDEX('%['+@wordStartCharacters+']%',
SUBSTRING(@string,@whereWeAre,8000))
IF @Position>0
BEGIN
SELECT @next=@WhereWeAre+@Position,@wordcount=@Wordcount+1
SELECT @Position=PATINDEX('%[^'+@wordCharacters+']%',
SUBSTRING(@string,@next,8000)+' ')
SELECT @WhereWeAre=@next+@Position
END
END
RETURN @wordcount
END
GO
/* so we create a test table and add a trigger to it, putting the two
checks in the trigger */
CREATE TABLE MyAddresses (MyID INT IDENTITY (1,1), Address VARCHAR(100))
GO
CREATE TRIGGER tCheckAddress_MyAddress
ON MyAddresses
FOR INSERT, UPDATE
AS
BEGIN
IF (SELECT MIN(dbo.ufiWordcountVarchar(Address,DEFAULT,DEFAULT))
FROM inserted)<4
OR
EXISTS (SELECT 1 FROM inserted WHERE address LIKE '%[^-a-z0-9''",.()]%')
BEGIN
ROLLBACK TRANSACTION
RAISERROR ('Suspect address inserted into MyAddresses',16,1)
END
END
--not forgetting the documentation
EXEC sp_addextendedproperty 'MS_Description',
'This checks to see if there are at least four words in the address
and checks for characters you wouldn''t expect to see in an address ',
'user', dbo, 'table', MyAddresses,'trigger', tCheckAddress_MyAddress
INSERT INTO MyAddresses (address)
VALUES ('32, Acacia Avenue, Goldhay, Berkshire')
INSERT INTO MyAddresses (address)
VALUES ('12 the Firs, Clive West Way, Slough, Berks')
INSERT INTO MyAddresses (address)
VALUES ('46, the promenade, felixstowe essex fizzbang<>*')
INSERT INTO MyAddresses (address)
VALUES ('Buckingham palace')
UPDATE Myaddresses SET address='Aargh!' WHERE MyID=2
SELECT * FROM MyAddresses
/*
Integrity Constraints
---------------------
Integrity Constraints, like rules, limit or 'constrain' you in what you
can put in a table or column. Unlike triggers, integrity constraints
cannot cascade changes through related tables in the database, enforce
complex restrictions by referencing other database objects, perform
"what if" analyses or roll back the current transaction as a
result of enforcing data integrity. (With triggers, you can either roll
back or continue the transaction, depending on how you handle
referential integrity.) Nevertheless they are the most commonly used
constraint and simple to define and alter.
There are three types of Table-level constraint:
1/ Unique and primary key constraints ( no two rows in a table have the
same values in the specified columns - with a primary key constraint it
can't be null either).
2/ Referential integrity constraints (enforces the rule that, for a
specific column, there must already be matching data in the column it
references).
3/ Check constraints limit the values of data inserted into columns.
We're most concerned with Check constraints in this article.
Like a rule, the check has to be an expression that would fit in a
WHERE or IF Clause, but can't involve a subquery. You can apply several
constraints on the one column.
Imagine we want to check that an Email address is valid.
*/
CREATE TABLE TestConstraint
( MyID INT IDENTITY(1,1),
MyEmailAddress VARCHAR(50) CONSTRAINT my_check_constraint
CHECK (MyEmailAddress NOT LIKE '%[^a-z-_.@]%'
AND LEN(REPLACE(MyEmailAddress,'@',''))=LEN(MyEmailAddress)-1)
)
EXEC sp_addextendedproperty 'MS_Description',
'Check to make sure that only valid characters are in the email address
and there is only one @ sign',
'user', dbo, 'table', testconstraint,'constraint', my_check_constraint
INSERT INTO TestConstraint (MyEmailAddress)
SELECT 'Robyn.Page@Simple-talk.com'
INSERT INTO TestConstraint (MyEmailAddress)
SELECT 'Silly@>@Simple-talk.com'
INSERT INTO TestConstraint (MyEmailAddress)
SELECT 'Bill.Gates@Microsoft.com'
UPDATE TestConstraint
SET MyEmailAddress='dodgy@@simple-talk.com' WHERE MyID=1
/* the second one and fourth will fail. Ideally, one should be able to
bind a user-defined error message to the constraint. Again, sorry -
Sybase only!
if we want the constraint to check other columns, then we need a table
-level constraint rather then a column constraint*/
CREATE TABLE SecondTestConstraint
( MyID INT IDENTITY(1,1),
MyName VARCHAR(50),
MyDomain VARCHAR(50),
CONSTRAINT my_Second_check_constraint
CHECK (MyName+MyDomain NOT LIKE '%[^a-z-_.@]%'
AND LEN(REPLACE(MyName+MyDomain,'@',''))
=LEN(MyName+MyDomain)-1)
)
INSERT INTO SecondTestConstraint (MyName,MyDomain)
SELECT 'Robyn.Page','@Simple-talk.com'
INSERT INTO SecondTestConstraint (MyName,MyDomain)
SELECT 'Mad <andBad>','@Simple-talk.com'
/*
so we see that the table-level check has done a check based on both
columns.
A unique index permits no two rows to have the same index value,
including NULL. A unique index checks that all data in the column
is unique when it is created and also when an insert or update is
attempted.
You cannot create a unique index on a column that contains null values
in more than one row; An error is, obviously, triggered
You can use the unique keyword on composite indexes.*/
CREATE UNIQUE INDEX idxUniqueEmail ON TestConstraint(MyEmailAddress)
INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Gengis@Kahn.com'
INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Kubla@Kahn.com'
INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Robyn@Kahn.com'
INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Anyone@Kahn.com'
INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Admin@Kahn.com'
INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Gengis@Kahn.com'
--yes, they all go in except for the last one which triggers an error
/*
The use of referential constraints is beyond the scope of this article,
as it is difficult to talk about them without going on to write the
entire chapter of a book, touching on the complexities of referential
data models, cascading deletes and so on.
This workshop started off when I was staring at the tree view of
Enterprise Manager, and suddenly realising I'd never used Rules
and wondering what they were for! I hope you've learned something new
too.
Just a few questions just to check .....
1/ when would you choose a Rule as the best way of checking data going
into your database
2/ What are the advantages of defaults over default constraints?
3/ why would you choose a table-level constraint over a column level
constraint?
4/ How do you ensure that the value entering a table is unique to the
column?
5/ When would you choose a trigger to check on data over either a rule
or default constraint?
6/ What is the point of putting a default on a user-defined datatype?
7/ What happens if you create a Unique index on a column that has two
null values in it? Why?
8/ how would you check in SQL whether a default or rule is bound to a
user-Defined datatype?
9/ What is the easiest way of ensuring that a column containing foreign
keys actually reference valid primary keys in another table.
10/What are Rules and Defaults likely to be replaced by in future?*/
This article has been viewed 13468 times.