SQL Views are essential for the database developer. However, it is common to see them misued, or neglected. Joe Celko tackles an introduction to the subject, but there is something about the topic that makes it likely that even the experienced developer will find out something new from reading it.
A VIEW is a virtual table, defined by a query, that does not exist until it is invoked by name in an SQL statement. This may sound simple enough, but some developers have difficulties with the concept. Because of this, they tend to leave out VIEWs because they do not appreciate their value. It is easy to write a query and not bother to put it into a VIEW, because there is no performance boost. 'If I am going to save code,' they reason, 'I want it in a stored procedure that can take a parameter list instead.'
In fact, a VIEW definition can be copied as in-line text, just like a CTE. But with a good optimizer, the SQL engine can decide that enough sessions are using the same VIEW and materialize it as a shared table. The CTE, in contrast, is strictly local to the statement in which it is declared.
VIEWs are often named incorrectly. A VIEW is a table, so it is named just like any other table. The name tells us what set of things it represents in the data model. The most common offender is the “Volkswagen” coder who prefixes or suffixes the VIEW name with “vw_” or “VIEW_” in violation of ISO-11179 rules. We do not mix data and meta data in a name. This is as silly as prefixing every noun in a novel with “n_” so the reader will know that the word is a noun in English grammar.
The ANSI Standard SQL syntax for the VIEW definition is
CREATE VIEW <table name> [(<VIEW column list>)]
AS <query expression>
[WITH [<levels clause>] CHECK OPTION]
<levels clause> ::= CASCADED | LOCAL
the WITH CHECK OPTION is a little known and less used feature that has been around since the SQL-86 Standards. The <levels clause> option in the WITH CHECK OPTION did not exist in Standards before SQL-92. It is not implemented in T-SQL dialect, so I will skip it. If you see it, just remember T-SQL defaults to CASCADED behavior.
A VIEW has no physical existence in the database until it is invoked. You cannot put constraints on a VIEW for that reason. The name of the VIEW must be unique within the entire database schema, like a base table name. The VIEW definition cannot reference itself, since it does not exist yet. Nor can the definition reference only other VIEWs; the nesting of VIEWs must eventually resolve to underlying base tables. This only makes sense; if no base tables were involved, what would you be VIEWing?
You can either build a column name list in the VIEW header or inherit the column names from the SELECT statement. Building this list is usually just one quick “cut & paste” and well worth it. This is why we do not ever use "SELECT *" in a VIEW definition in production code. When the columns of a base tables change, the definition of the "star" will also change. If you are lucky, you will get an error when the VIEW has too many or too few columns when it is invoked. If you are not so lucky, the VIEW will run and give you unexpected results. If you are unlucky, the VIEW will run and give you wrong answers that you use.
Every few months, someone will post to a SQL forum asking how to use a parameter in a VIEW. They would never ask how to use a parameter in a base table. The sight of a SELECT statement instead of a list of column declarations throws their mindset in the wrong direction.
Mullins Heuristics for VIEWS
Programmers have rules and standards for creating base tables. The data element names should follow the ISO-11179 rules. We have to have a key. We can have all kinds of constraints. We can have Declarative Referential Integrity actions among other base tables. But how do you design a VIEW?
Craig Mullins, a DB2 expert and author, gave the following rule to ensure that VIEWs are created in a responsible and useful manner. Simply stated, the VIEW creation strategy should be goal-oriented. VIEWs should be created only when they achieve a specific, reasonable goal. Each VIEW should have a specific application or business requirement that it fulfills before it is created. That requirement should be documented somewhere, preferably in a data dictionary.
Although this rule seems obvious, VIEWs are implemented at some shops without much thought as to how they will be used. This can cause the number of VIEWs that must be supported and maintained to increase until so many VIEWs exist that it is impossible to categorize their uses. Nobody wants to take a chance and drop a VIEW, so they just write a new one. Whenever a base table used by a VIEW definition is changed, then all those VIEWs have to be re-compiled and checked. Since VIEWs can be built on top of VIEWs, this can be tricky.
Unlike other virtual tables, a VIEW is defined in the schema information tables and its definition (not its content!) is persisted. This implies some privileges are needed to use, create, alter and drop VIEWs. The first question is do you need to have privileges on the base tables that build a VIEW? Yes, but not full privileges. The minimal privileges would be to use the base tables, so you can build the VIEW. But that does not mean that the user needs to be able to directly query or modify the base tables.
The ideal design should give each user a set of VIEWs that make it look as if the schema was designed for just his or her use, without regard to the rest of the enterprise.
This is most often done for security and privacy. The payroll clerk can see the salaries of other personnel and change them. But he cannot give himself a pay raise and try to get out of town before the police find out. He can see the minimum, maximum and average salary in each department, but not who is making which salary.
The Data Control Language (DCL) is the third sub-language in SQL after DDL and DML. This is where the DBA can GRANT, REVOKE or DENY all kinds of schema object privileges. We spend almost no time on it in training classes, and failure to do it right can destroy your enterprise. As a generalization, the DBA ought to start with a list of roles users can play in the enterprise and create a script for the privileges each role needs. A new user can then be assigned a role and you do not have to repeat the script over and over.
Do not grant VIEW creation privileges to everyone. The "nearly the same" VIEWs are a special problem. One user might have read the spec "Employees must be over 21 years of age to serve alcohol" to mean strictly over 21 as of today or can they pour a drink on their 21-st birthday? If VIEW creation had been left to just one data modeler, only one of these VIEWs would exist and it would have the correct business rule.
Tricky Queries and Computations
Not all programers are equal, so you can make sure that the VIEWs preserve the best work in your shop. The other advantage is that if someone finds a better query for the current state of the database, you keep the VIEW header, drop the SELECT statement in the body, replace it and then re-compile your code. The programmer needs no knowledge of how the VIEW works. This technique becomes more useful as the SQL becomes more complex.
In T-SQL, we used to write complicated code to get sequence numbers and pure dates without time. This code was often hidden in VIEWs. The numbering can now be done with ROW_NUMBER() and we have a DATE data type since SQL Server 2008. In many cases, procedures and functions that used loops and fancy string manipulations can be replaced with VIEWs.
Updatable and Read-Only VIEWs
Unlike base tables, VIEWs are either updatable or read-only, but not both. INSERT, UPDATE, and DELETE operations are allowed on updatable VIEWs and base tables, subject to other constraints. INSERT, UPDATE, and DELETE are not allowed on read-only VIEWs, but you can change their base tables, as you would expect.
An updatable VIEW is one that can have each of its rows associated with exactly one row in an underlying base table. When the VIEW is changed, the changes pass through the VIEW to that underlying base table unambiguously. Updatable VIEWs in Standard SQL are defined only for queries that meet these criteria
- They are built on only one table
- No GROUP BY clause
- No HAVING clause
- No aggregate functions
- No calculated columns
- No UNION, INTERSECT or EXCEPT
- No SELECT DISTINCT clause
- Any columns excluded from the VIEW must be NULL-able or have a DEFAULT clause in the base table, so that a whole row can be constructed for insertion.
By implication, the VIEW must also contain a key of the table. In short, we are absolutely sure that each row in the VIEW maps back to one and only one row in the base table. The major advantage of this limited definition is that it is based on syntax and not semantics. For example, these VIEWs are logically identical:
CREATE VIEW Foo1 (a, b, ..) -- updatable, has a key!
AS SELECT (a, b, ..)
WHERE x IN (1,2);
CREATE VIEW Foo2 (a, b, ..)-- not updateable!
AS SELECT (a, b, ..)
WHERE x = 1
SELECT (a, b, ..)
WHERE x = 2;
But Foo1 is updateable and Foo2 is not. While I know of no formal proof, I suspect that determining if a complex query resolves to an updatable query for allowed sets of data values possible in the table is an NP-complete problem.
The INSTEAD OF trigger was the ANSI Standards Committee letting the Data Modeler decide on how to resolve the VIEW updating problem. These triggers are added to a VIEW and are executed on base tables that make up the VIEW. The user never sees them fire and work their magic.
As an example, consider a VIEW that builds the total compensation for each employee by joining the personnel, employee stock holdings, bonuses and salary_amt tables in one VIEW. An INSTEAD OF trigger can update the total compensation using a hidden formula and complex business rules that the user never sees.
The use of INSTEAD OF triggers gives the user the effect of a single table, but there can still be surprises. Think about three tables; A, B and C. Table C is disjoint from the other two. Tables A and B overlap. So I can always insert into C and may or may not be able to insert into A and B if I hit overlapping rows.
Going back to my Y2K consulting days, I ran into a version of such a partition by calendar periods. Their Table C was set up on Fiscal quarters and got leap year wrong because one of the fiscal quarters ended on the last day of February.
A point that is often missed, even by experienced SQL programmers, is that a VIEW can be built on other VIEWs. The only restrictions are that circular references within the query expressions of the VIEWs are illegal and that a VIEW must ultimately be built on base tables. One problem with nested VIEWs is that different updatable VIEWs can reference the same base table at the same time. If these VIEWs then appear in another VIEW, it becomes hard to determine what has happened when the highest-level VIEW is changed. As an example, consider a table with two keys:
CREATE TABLE CanadianDictionary
(english_id INTEGER UNIQUE,
french_id INTEGER UNIQUE,
CHECK (COALESCE (english_id, french_id) IS NOT NULL);
The table declaration is a bit strange. It allows an English-only or French-only word to appear in the table. But the CHECK() constraint requires that a word must fall into one or both type codes.
INSERT INTO CanadianDictionary
VALUES (1, 2, 'muffins', 'croissants'),
(2, 1, 'fish bait', 'escargots');
CREATE VIEW EnglishWords
AS SELECT english_id, eng_word
WHERE eng_word IS NOT NULL;
CREATE VIEW FrenchWords
AS SELECT french_id, french_word
WHERE french_word IS NOT NULL);
We have now tried the escargots and decided that we wish to change our opinion of them:
SET eng_word = 'appetizer'
WHERE english_id = 2;
Our French user has just tried Haggis and decided to insert a new row for his experience:
SET french_word = ‘tripoux’
WHERE french_id = 3;
The row that is created is (NULL, 3, NULL, ‘tripoux’), since there is no way for the VIEW FrenchWords to get to the VIEW EnglishWords columns. Likewise, the English VIEW user can construct a row to insert his translation, (3, NULL, 'Haggis', NULL), but neither of them can consolidate the two rows into a meaningful piece of data.
To delete a row is also to destroy data; the French-speaker who drops 'croissants' from the table also drops 'muffins' from VIEW EnglishWords.
WITH CHECK OPTION Clause
If WITH CHECK OPTION is specified, the VIEWed table has to be updatable. This is actually a fast way to check how your particular SQL implementation handles updatable VIEWs. Try to create a version of the VIEW in question using the WITH CHECK OPTION and see if your product will allow you to create it. The WITH CHECK OPTION was part of the SQL-89 Standard, but nobody seems to know about it! Consider this skeleton:
CREATE VIEW V1
AS SELECT key_col, col1, col2
WHERE col1 = 'A';
and now UPDATE it with
UPDATE V1 SET col1 = 'B';
The UPDATE will take place without any trouble, but the rows that were previously seen now disappear when we use V1 again. They no longer meet the WHERE clause condition! Likewise, an INSERT INTO statement with VALUES (col1 = 'B') would insert just fine, but its rows would never be seen again in this VIEW. This might be the desired behavior. For example, you can set up a VIEW of rows in a jobs table with a status code of 'to be done', work on them, and change a status code to 'finished', and the rows will disappear from your VIEW. The important point is that the WHERE clause condition was checked only at the time when the VIEW was invoked.
The WITH CHECK OPTION makes the system check the WHERE clause condition upon INSERT and UPDATE. If the new or changed row fails the test, the change is rejected and the VIEW remains the same. The WITH CHECK OPTION clause does not work like a CHECK constraint.
CREATE TABLE Foobar (col_a INTEGER);
CREATE VIEW TestView (col_a)
SELECT col_a FROM Foobar WHERE col_a > 0
WITH CHECK OPTION;
INSERT INTO TestView VALUES (NULL); -- This fails!
CREATE TABLE Foobar_2 (col_a INTEGER CHECK (col_a > 0));
INSERT INTO Foobar_2(col_a)
VALUES (NULL); -- This succeeds!
The WITH CHECK OPTION must be TRUE while the CHECK constraint can be either TRUE or UNKNOWN. This is an example of the differences in DDL and DML in SQL. Once more, you need to watch out for NULLs.
T-SQL checks all the underlying levels that built the VIEW, as well as the WHERE clause condition in the VIEW itself. If anything causes a row to disappear from the VIEW, the UPDATE is rejected. Consider two VIEWs built on each other from the Personnel table:
CREATE VIEW Low_Paid_Personnel (emp_id, salary_amt)
AS SELECT emp_id, salary_amt
WHERE salary_amt <= 250.00;
CREATE VIEW Medium_Paid_Personnel (emp_id, salary_amt)
AS SELECT emp_id, salary_amt
WHERE salary_amt >= 100.00;
If neither VIEW has a WITH CHECK OPTION, the effect of updating Medium_Paid_Personnel by increasing every salary_amt by $1,000 will be passed without any check to Low_Paid_Personnel . Low_Paid_Personnel will pass the changes to the underlying Personnel table. The next time Medium_Paid_Personnel is used, Low_Paid_Personnel will be rebuilt in its own right and Medium_Paid_Personnel rebuilt from it, and all the employees will disappear from Medium_Paid_Personnel .
If only Medium_Paid_Personnel has a WITH CHECK OPTION on it, the UPDATE will fail. Medium_Paid_Personnel has no problem with such a large salary_amt, but it would cause a row in Low_Paid_Personnel to disappear, so Medium_Paid_Personnel will reject it. However, if only Medium_Paid_Personnel has a WITH LOCAL CHECK OPTION on it, the UPDATE will succeed. Medium_Paid_Personnel has no problem with such a large salary_amt, so it passes the change along to Low_Paid_Personnel . Low_Paid_Personnel , in turn, passes the change to the Personnel table and the UPDATE occurs. If both VIEWs have a WITH CHECK OPTION, the effect is a set of conditions, all of which have to be met. The Personnel table can accept UPDATEs or INSERTs only where the salary_amt is between $100 and $250.
WITH CHECK OPTION as Constraints
Lothar Flatz, an instructor for Oracle Software Switzerland made the observation that while Oracle cannot put subqueries into CHECK()() constraints and triggers would not be possible because of the mutating table problem, you can use a VIEW that has a WITH CHECK OPTION to enforce subquery constraints.
For example, consider a hotel registry that needs to have a rule that you cannot add a guest to a room that another is or will be occupying. Instead of writing the constraint directly, like this:
CREATE TABLE Hotel
(room_nbr INTEGER NOT NULL,
arrival_date DATE NOT NULL,
departure_date DATE NOT NULL,
guest_name CHAR(30) NOT NULL,
CHECK (H1.arrival_date <= H1.departure_date),
CHECK (NOT EXISTS
FROM Hotel AS H1, Hotel AS H2
WHERE H1.room_nbr = H2.room_nbr
AND H2.arrival_date < H1.arrival_date
AND H1.arrival_date < H2.departure_date)));
The valid_stay_dates constraint is fine, since it has no subquery, but will choke on the no_overlaps constraint. Leaving the no_overlaps constraint off the table, we can construct a VIEW on all the rows and columns of the Hotel base table and add a WHERE clause which will be enforced by the WITH CHECK OPTION.
CREATE VIEW Valid_Hotel (room_nbr, arrival_date, departure_date, guest_name)
AS SELECT H1.room_nbr, H1.arrival_date, H1.departure_date, H1.guest_name
FROM Hotel AS H1
WHERE NOT EXISTS
FROM Hotel AS H2
WHERE H1.room_nbr = H2.room_nbr
AND H2.arrival_date < H1.arrival_date
AND H1.arrival_date < H2.departure_date)
AND H1.arrival_date <= H1.departure_date
WITH CHECK OPTION;
INSERT INTO Valid_Hotel
VALUES (1, '2012-06-01', '2012-06-03', 'Ron Coe');
INSERT INTO Valid_Hotel
VALUES (1, '2012-06-03', '2012-06-05', 'John Doe');
will give a WITH CHECK OPTION clause violation on the second INSERT INTO statement, as we wanted.
VIEWs, like tables, can be dropped from the schema. The T-SQL syntax for the statement is:
DROP VIEW <table name list>;
The use of the <table name list> is dialect and it gives you a shorthand for repeating drop statements. The drop behavior depends on your vendor. The usual way of storing VIEWs was in a schema information table is to keep the VIEW name, the text of the VIEW, but dependencies. When you drop a VIEW, the engine usually removes the appropriate row from the schema information tables. You find out about dependencies when you try to use something that wants the dropped VIEWs. Dropping a base table could cause the same problem when the VIEW was accessed. But the primary key/foreign key dependencies among base tables will prevent dropping some base tables.
Table Expression VIEWs
An old usage for VIEWs was to do the work of CTEs when there were no CTEs. The programmers created VIEWs, and then used them. Of course they wasted space, caused disk reads and were only used in one statement. It might be worth looking at old code for VIEWs that are not shared.
Today the reverse is true. Programmers create the same CTE code over and over in different queries and give it local names for each appearance. Those local names are seldom the same and are often “place markers” like “X” or “CTE_1” and give no hint as to what the table expression means in the data model.
It can be hard to factor out common table expressions across multiple queries. One query uses an infixed JOIN operators and another uses a <span class="mono">FROM</span> list, the predicates are equivalent but written slightly different and so forth.
I recommend that you sit down and think of useful VIEWs, write them and then see if you can find places where they would make the code easier to read and maintain. As an example, our hotel application will probably need to find vacant rooms by calendar date, compute an occupancy ratio by calendar date and other basic facts.
Another bad use is the one VIEW per Base Table myth that was poplar with DB2 programmers years ago. The reasoning behind this myth was the applaudable desire to insulate application programs from database changes. All programs were to be written against VIEWs instead of base tables. When a change is made to the base table, the programs would not need to be modified because they access a VIEW, not the base table.
This does not work in the long run. All you do is accumulate weird orphaned VIEWs. Consider the simplest type of database change – adding a column to a table. If you do not add the column to the VIEW, no programs can access that column unless another VIEW is created that contains the new column. But if you create a new VIEW every time you add a new column it will not take long for your schema to be swamped with VIEWs. Even more troublesome is the question of which VIEW should be used by which program. Similar arguments can be made for any structural change to the tables.