Av rating:
Total votes: 27
Total comments: 10


Joe Celko
SQL and the Snare of Three-Valued Logic
07 May 2009


The whole subject of the Three-Valued (also known as ternary, trivalent or 3VL)  Logic of SQL tends to trip people up. This is hardly surprising in view of the fact that it  involves an esoteric Polish mathematician and because it  behaves differently in the DDL (Data Declaration Language) and the DML (Data Manipulation Language). In response to requests, Joe Celko comes to the rescue and makes it all seem simple.

At the end of my article on Relational Division, one of the comments, by a reader named Dean, asked if the sentence "There ain't no planes in this hangar that I can't fly!” is a double negative or a triple?   And he asked for an article on Boolean logic next.

Here it is!  But the problem is that SQL has a three valued logic (3VL) and not the two values of Mr. George Boole.  In fact, SQL logic is complicated. 

NULLs

Like many problems, this starts with NULLs.  Dr. Codd put the concept into the first version of the Relational Model.  It is not a value; it is a “place holder” for a value.  The history of the NULL is interesting.  When Dr. Codd and Chris Date were business partners, Date was opposed to NULLs and this was a great debate in those days. 

Later Dr. Codd did a second version of the Relational Model which has two kinds of NULLs.  One kind of NULL marks values which are missing because the value is unknown, and the other kind marks values that are missing because the attribute is missing.  For example, my hat size exists (I have a head) but might not be known so it is the first kind of NULL; my hair color does not exist since I am bald as a cue ball, so it is the second kind of NULL.  FirstSQL is the only product which implemented Dr. Codd’s second model. 

SQL followed Dr. Codd’s first model and has a single NULL. But much like FORTRAN follows algebra; SQL had to make adjustments from the mathematical model.  In the Relational Model, a NULL has no data type.  In SQL, the compiler has to know about storage requirements so you can write “CAST (NULL AS <data type>)” to pass that information. 

NULLs have certain basic properties:

  1. All SQL data types are NULL-able.  This is one reason why IDENTITY is a table property and not a data type.  The other reason is that a table can have only one IDENTITY column in it.  The count of PHYSICAL insertion attempts (NOT  successes) is not an attribute; it is audit meta-data and has no place in RDBMS. 

    This is why the first implementations of BIT which were assembly language bits were not a data type.  When BIT became a numeric data type, then things were Kosher. 
     

  2. NULLs propagate.  If you use a NULL in an expression, then result is a NULL.  In numeric expressions, we had questions about priorities, but propagation is vital.  In particular:
    “NULL / 0” = NULL or “division by zero” error?

    “0 / NULL” = NULL or 0, since this can only resolve to zero for any value.  No, I am not going to tell you; test it yourself in Query Analyzer or SSMS!  But can you figure it out from principles? 

TRUE, FALSE and UNKNOWN

A comparison between known values gives you a result of TRUE or FALSE.  This is Boolean logic.  If you get a chance look at some of the Syd Harris cartoons about George Boole at S. Harris Computer Cartoons

But that is classic two-valued logic (2VL); all the values are known.  When you do a comparison with a NULL, you cannot get a Boolean (i.e. TRUE or FALSE) result. 

This is where we invented the logical value UNKNOWN.  Well, re-discovered it.  There were already a lot of multi-valued logics in the mathematical literature.  Some of them are based on discrete values and some on continuous values (i.e. fuzzy logic).  SQL looks like the system developed by Polish logician Jan Łukasiewicz (the L-bar is a “W” sound in English; you can get the rest of it at Wikipedia: Jan Łukasiewicz).  Programmers know him as the guy who invented Polish Notation and inspired Reverse Polish Notation for HP calculators and stack architecture (aka zero address machines) computers like the old Burroughs machines.  SQL has three logical operations, which are found in all programming languages.  These are extended to three values:  

OR

TRUE

FALSE

UNKNOWN

TRUE

TRUE

TRUE

TRUE

FALSE

TRUE

FALSE

UNKNOWN

UNKNOWN

TRUE

UNKNOWN

UNKNOWN

AND

TRUE

FALSE

UNKNOWN

TRUE

TRUE

FALSE

UNKNOWN

FALSE

FALSE

FALSE

UNKNOWN

UNKNOWN

UNKNOWN

UNKNOWN

UNKNOWN

NOT

 

TRUE

FALSE

FALSE

TRUE

UNKNOWN

UNKNOWN

In the Łukasiewicz multi-valued logic systems, the AND, OR and NOT are almost the same as in SQL's three valued logic.  The general case is based on the following Polish notation formulas in which 1 is TRUE, 0 is FALSE and fractions are the other values.

  • Cpq = 1          for p <= q
  • Cpq = 1 - p + q  for (p > q)
  • Np = 1 - p

N is the negation operator, C is the implication operator.

From this pair, we define all other operators

  • Opq = CCpqq     Or
  • Apq = NONpNq    And
  • Epq = ACpqCqp   Equivalence
  • Mp = CNpp       maybe (not FALSE)

Then Tarski and Lukasiewicz had a lot of rules of inference to prove theorems.  David McGoveran pointed out that SQL is not really a logic system because it lacks rules for deductions and proofs.  The idea is that the UNKNOWN might be resolved to {TRUE, FALSE}, so can we sometimes determine the result regardless of how one value resolves.  If we cannot determine a result, then return UNKNOWN. 

Let me sum up.  SQL is not a formal logical system.  We have no inference rules and what we informally call “predicates” are actually “search conditions” in the ANSI/ISO Standards.  What we have is a collection of look-up tables to compute a value of {TRUE, FALSE, UNKNOWN}. 

This is a subtle difference for anyone who is not a math major, but an inference system is like those axioms, postulates and theorems you had in High School geometry.  You need an implication operation.  In regular 2VL, this is shown as a double tailed right arrow “ =>” and it is defined by the look-up table in 2VL, where the column is the first operand:

IMPLIES

TRUE

FALSE

TRUE

TRUE

FALSE

FALSE

TRUE

TRUE

Or informally, a TRUE premise cannot imply a FALSE conclusion.  A FALSE premise can imply anything.  This 2VL model can also be written as:

  • (A IMPLIES B) = NOT (A AND NOT B)

But if you apply De Morgan’s Law, it can also be written as

  • (A IMPLIES B) = (NOT(A) OR B)

Now expand these 2VL rules into 3VL rules.

IMPLIES-1

TRUE

FALSE

UNKNOWN

TRUE

TRUE

FALSE

UNKNOWN

FALSE

TRUE

TRUE

UNKNOWN

UNKNOWN

UNKNOWN

FALSE

UNKNOWN

IMPLIES-2

TRUE

FALSE

UNKNOWN

TRUE

TRUE

TRUE

TRUE

FALSE

TRUE

TRUE

TRUE

UNKNOWN

UNKNOWN

UNKNOWN

UNKNOWN

Oops!  They are not the same.  The premise “X IMPLIES {FALSE, UNKNOWN}” could argue that a FALSE premise X can imply anything, TRUE, FALSE or UNKNOWN.  Or that propagation of unknown values applies, so a FALSE premise can lead to a TRUE conclusion. Make a decision and design a logical system for yourself. 

We avoided it in the SQL Standards. We did not do implication and the other rules.  This is a problem for those of us doing an optimizer for SQL. 

BIT Flags versus Predicates

One of the major problems for those people learning SQL is that they have not un-learned a mindset based originally on punch cards and magnetic tapes instead of RDBMS.  Even if they never saw a punch card or a magnetic tape, they think of data processing as a sequence of procedural steps that depend on sequentially ordered data.  At each step the data is moved to another file, scratch tape or temporary table.

Did you ever think about the people who use a singular table name instead of a collective or plural name?  In a file system, you process one record at time.  Calling the current record “Employee” makes sense.  But in RDBMS, we work with the entire table (set) all at once.  Therefore we call that table “Personnel” instead. 

We used bit flags in the Dark Ages because we lacked storage and computing power.  We would discover a fact that was TRUE at one point in time and then set a flag to mark that state within the system.  A classic example is a deletion flag at the start of a tape file record.  When the system read it, it would skip forward on the tape to the next record.  Eventually, the deleted records would be removed when the data was copied over to a new tape. 

An embarrassing story of my own was writing a small restaurant scheduling program in BASIC on an early Personal Computer.  I used a flag to show if the staff member was at least 18 years old which was the legal age to consume and serve alcoholic beverages at the time.  The legal age then changed to 21 and my program cheerfully scheduled an entire shift of now under-aged servers on the night that the state Alcoholic Beverage Control agency decides to check on the new law.  My client was not happy. 

If I had SQL back then, I would have used the employee birthdates to set up the shifts.  A simple two keystrokes in a VIEW declaration would have done it. 

SQL is a “predicate language” by its nature in the same way that FORTRAN is an “algebra language” by its nature.  Because the data is shared, and in constant motion, you cannot ever trust that the flags will actually show the current state of the data.  You have to test it when you do something. Silly bit-flags work in file systems because nobody else can physically read the tape as it goes thru its sequential process. 

DDL versus DML

As if all of this was not weird enough, SQL’s 3VL behaves differently in the DDL (Data Declaration Language) and the DML (Data Manipulation Language). 

When you declare a column in a table in the DDL, you have the options of making in NOT NULL, giving it a DEFAULT value, using it in a CHECK() constraint and/or adding DRI actions.  Those options are one of the many reasons that a column in RDBMS is different from a field in a file system. 

When I use a CHECK() constraint in the DDL and some or all of the columns are NULL-able, the search condition can return UNKNOWN.  This is fine because DDL accepts {TRUE, UNKNOWN} as the same and will permit the data to persist. 

We did that because adding the extra search conditions to test for NULLs would be a screaming nightmare.  Try it. 

In DML, we are not so forgiving.  DML sees {FALSE, UNKNOWN} as the same and will reject the data, keeping only that which tests TRUE. 

This can be weird.  You put something into a table and you need to work extra hard to get it back out because of NULLs and 3VL.  That is why we SQL pros keep telling you to not make a column NULL-able unless it makes sense in the data model. 

Shorthand

SQL is a higher level language so it has shorthands in it.  But they can all be expanded out into combinations of {AND, OR, NOT}.  Beginners in SQL often do not know how to use them and so tend to stick with the familiar {AND, OR, NOT} constructs they know from earlier languages like FORTRAN or BASIC. This is a bad way to write code.  It hides the higher level logic and in better SQL engines, the shorthands are optimized.  

BETWEEN

This search condition is equivalent to two comparisons:

(x BETWEEN a AND b) = ((a < = x) AND (x < = b))

The early optimizers simply expanded it out this way.  More sophisticated optimizers look at it as a 3-ary logical operator and optimize for the “between-ness” that seems to be expected

IN()

The IN()n search condition has two forms.    

<expression> [NOT] IN (<expression list>)
<expression> [NOT] IN (<single column select stmt>)

This is a bit tricky.  With either version we get a single column of expressions and compare the left hand expression to each of those values with an OR.  In effect,

x IN (1, 2, 3)

...is shorthand for ...

((x = 1) OR (x = 2) OR (x = 3))

Likewise,

x NOT IN (1, 2, 3)

... is shorthand for ...

NOT ((x = 1) OR (x = 2) OR (x = 3))

Which can be re-written with DeMorgan’s law to:

((x <> 1) AND (x <> 2) AND (x <> 3))

Now try to follow the same rule with

x IN (1, 2, NULL)

...is shorthand for...

((x = 1) OR (x = 2) OR (x = NULL))
((x = 1) OR (x = 2) OR UNKNOWN)  -- propagate NULLs

Let’s assume that one of the ORs is TRUE. So the other is FALSE

(TRUE OR FALSE OR UNKNOWN) 
(TRUE OR UNKNOWN) 
(TRUE) 

The only way this can return FALSE would be for x to be something other than {1, 2}.  The only way for it to return UNKNOWN would be for x to be NULL. 

Repeat the exercise with

x NOT IN (1, 2, NULL)

...is shorthand for...

NOT ((x = 1) OR (x = 2) OR (x = NULL))
NOT ((x = 1) OR (x = 2) OR UNKNOWN) -- propagate NULLs
((x <> 1) AND (x <> 2) AND NOT UNKNOWN) -- DeMorgan
(UNKNOWN) -– definition of AND

So if this was used in DDL, rows would be allowed in the table, but will always be rejected by the DML.

SOME|ANY()

There is an underused generalization of the IN() of the form:

<expression> <theta op> [SOME | ANY] (<single column select stmt>)

This is shorthand for a chain of ORs:

(<expression> <theta op> <expression_1>
  OR <expression> <theta op> <expression_2>
  OR…
<expression> <theta op> <expression_n>
)

There is no difference between SOME and ANY, but sometimes the search condition reads a little better to a human. 

The IN() search condition is the case where the <theta op> is equality.  All of the 3VL problems you had with IN() are also here.

ALL

There is really, really underused (almost unknown) search condition that generalized a chain of ANDs:

<expression> <theta op> ALL (<single column select stmt>)

This is shorthand for a chain of ANDs:

(<expression> <theta op> <expression_1>
AND <expression> <theta op> <expression_2>
AND…
<expression> <theta op> <expression_n>)

The ALL search condition does not play well with the extrema functions (e.g. MAX, MIN). 

It is counter-intuitive at first that these two search condition are not the same in SQL:

x >= (SELECT MAX(y) FROM Foobar)
x >= ALL (SELECT y FROM Foobar)

But you have to remember the rules for the extrema functions -- they drop out all the NULLs before returning the greater, greatest, or least values.  The (<single column select stmt>) does not drop NULLs, so you can get them in the results.  This can leave a NULL to give us an UNKNOWN. 

EXISTS()

The EXISTS() search condition is on of the very few Boolean operator we have in SQL.  It returns {TRUE, FALSE} and never UNKNOWN. This is because it works at the table level and not at the column level.  Compare this to COUNT(*) which returns the cardinality of the whole table as opposed to COUNT(<expression>).  They look alike but are totally different. 

Summary

I tell people that they need to learn to think in sets to get good with SQL.  Give up your sequential, procedural record-at-a-time mindset and trade up to declarative set processing!  See the light!  Leave the darkness! 

I probably ought to spend more time on 3VL since it is also something they never saw before.  And it is not easy. 



This article has been viewed 5997 times.
Joe Celko

Author profile: Joe Celko

Joe Celko is one of the most widely read of all writers about SQL, and was the winner of the DBMS Magazine Reader's Choice Award four consecutive years. He is an independent consultant living in Austin, TX. He has taught SQL in the US, UK, the Nordic countries, South America and Africa.
He served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards.
He has written over 800 columns in the computer trade and academic press, mostly dealing with data and databases. He is the author of eight books on SQL for Morgan-Kaufmann, including the best selling SQL FOR SMARTIES.
Joe is a well-known figure on Newsgroups and Forums, and he is famous for his his dry wit. He is also interested in Science Fiction.

Search for other articles by Joe Celko

Rate this article:   Avg rating: from a total of 27 votes.


Poor

OK

Good

Great

Must read
 
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: And again.. and again...
Posted by: Anonymous (not signed in)
Posted on: Friday, May 08, 2009 at 9:06 AM
Message: "All SQL data types are NULL-able. This is one reason why IDENTITY is a table property and not a data type. The other reason is that a table can have only one IDENTITY column in it."

Another rant on identity?

Subject: I think you have your AND table incorrect?
Posted by: caderoux (view profile)
Posted on: Friday, May 08, 2009 at 1:22 PM
Message: ;
WITH logic
AS (
SELECT 0 AS val
,'FALSE' AS txt
UNION
SELECT 1 AS val
,'TRUE' AS txt
UNION
SELECT NULL AS val
,'UNKNOWN' AS txt
)
SELECT l.txt + ' OR ' + r.txt AS expression_or
,CASE WHEN l.val = 1
OR r.val = 1 THEN 'TRUE'
WHEN NOT (
l.val = 1
OR r.val = 1
) THEN 'FALSE'
ELSE 'UNKNOWN'
END AS result_or
,l.txt + ' AND ' + r.txt AS expression_and
,CASE WHEN l.val = 1
AND r.val = 1 THEN 'TRUE'
WHEN NOT (
l.val = 1
AND r.val = 1
) THEN 'FALSE'
ELSE 'UNKNOWN'
END AS result_and
FROM logic AS l
CROSS JOIN logic AS r

Subject: Apologies for the reformatting of my previous message...
Posted by: caderoux (view profile)
Posted on: Friday, May 08, 2009 at 1:24 PM
Message: Apologies for the reformatting of my previous message...

Subject: This version does the PIVOT, so the output is easier to read
Posted by: caderoux (view profile)
Posted on: Friday, May 08, 2009 at 1:30 PM
Message:
WITH logic

         
AS (
   
SELECT AS val
            
,'FALSE' AS txt
      
UNION
   SELECT 
AS val
            
,'TRUE' AS txt
      
UNION
   SELECT 
NULL AS val
            
,'UNKNOWN' AS txt
   
),
         
results
         
AS (
   
SELECT CASE WHEN l.val 1
            
OR r.val THEN 'TRUE'
               
WHEN NOT (
                  
l.val 1
               
OR r.val 1
         
THEN 'FALSE'
      
ELSE 'UNKNOWN'
      
END AS result_or
            
,l.txt AS l_txt
            
,r.txt AS r_txt
      
FROM logic AS l
            
CROSS JOIN logic AS r
   
)
SELECT l_txt AS 'OR'
         
,FALSE
         
,TRUE
         
,UNKNOWN
   
FROM results PIVOT MIN(result_orFOR r_txt IN ([TRUE][FALSE],
               
[UNKNOWN]) ) AS pvt
   
ORDER BY l_txt
         
;
         
WITH logic
         
AS (
   
SELECT AS val
            
,'FALSE' AS txt
      
UNION
   SELECT 
AS val
            
,'TRUE' AS txt
      
UNION
   SELECT 
NULL AS val
            
,'UNKNOWN' AS txt
   
),
         
results
         
AS (
   
SELECT CASE WHEN l.val 1
            
AND r.val THEN 'TRUE'
               
WHEN NOT (
                  
l.val 1
               
AND r.val 1
         
THEN 'FALSE'
      
ELSE 'UNKNOWN'
      
END AS result_and
            
,l.txt AS l_txt
            
,r.txt AS r_txt
      
FROM logic AS l
            
CROSS JOIN logic AS r
   
)
SELECT l_txt AS 'AND'
         
,FALSE
         
,TRUE
         
,UNKNOWN
   
FROM results PIVOT MIN(result_andFOR r_txt IN ([TRUE][FALSE],
               
[UNKNOWN]) ) AS pvt
   
ORDER BY l_txt 

Subject: Well, you can see it properly formatted here: http://snipt.org/Xli
Posted by: caderoux (view profile)
Posted on: Friday, May 08, 2009 at 1:34 PM
Message: Well, you can see it properly formatted here: http://snipt.org/Xli

Subject: code formatting
Posted by: Tony Davis (view profile)
Posted on: Saturday, May 09, 2009 at 7:11 AM
Message: caderoux,

I apologize for the difficulty in pasting formatted code. The best option we can offer is to use our code prettifier at:

simple-talk.com/prettifier

You paste in using the Source Code tab, set your options using the left menu (you must use "Forums - FONTs and PRE for posting to the blogs and forums), click Prettify, then grab the HTML from the "Source HTML" tab and pasting it into the HTML tab of the blog editor.

I did this for your previous post above.

Cheers,

Tony.

Subject: Disappointing...
Posted by: Hugo Kornelis (not signed in)
Posted on: Tuesday, May 19, 2009 at 5:39 AM
Message: A very disappointing article.

The rant about IDENTITY is completely useless, since it has nothing to do with the subject of the article. It's also incorrect. As far as I know, nobody ever suggested identity to be a data type. It's a column property that can be applied to non nullable columns only.

The 3VL truth tables presented by Joe Celko are incorrect. The correct result for "false AND unknown" as well as "unknown AND false" is of course "false" - not "unknown" as it these tables state!

Joe then completely loses it when he comes to the IMPLIES operator. Besides the difference he highlighted, I count three more differences between the two tables. One even in a cell that does not involve any "unknwon" value at all. So have we jsut invalidated DeMorgan's law, or has Joe been very sloppy? We can all relax. I've invested the five minutes to work out all nine results for both variations of the IMPLIES operator and there were no differences at all. The difference highlighted by Joe might be caused by his incorrect assessment of "unknown and false" (I didn't check), but some of the other differences -at least the one for "true IMPLIES false"!- can only be caused by extreme sloppiness.

Then comes another of Joe's rants, that is only vagulely related to the subject at hand. He chooses to completely ignore the fact that "boolean" as a (non numeric!!) datatype is part of the SQL-2003 standard (and as far as I know has not been removed for newer versions). And the example he gives is just silly. A customer apparently has chosen to save some costs by not storing employees birthdates (yeah, right). And then, when a law changes that relates to age of employees, he again chooses to save money by not revealuating some data that has been derived off that non-stored birthdate. And then, he holds the database guy responsible for the consequences? Get out of it! - I call shenanigans on this one.

At this point, I was unable to stomach any firthher reading of this article. I glanced over it and didn't see any more errors, but I would not be surprised if better reading would uncover a few more.

For a good discussion about NULLs, search the web. I'm sure there are many available, and I'm also sure that most of them are far better than this one.

Best, Hugo

Subject: Nil Carborundum
Posted by: Dean (view profile)
Posted on: Tuesday, May 19, 2009 at 8:12 AM
Message: Very interesting. Thanks.

Nil carborundum.

Subject: Thanks, and 3VL elsewhere
Posted by: Keith Macdonald (not signed in)
Posted on: Tuesday, May 19, 2009 at 9:59 AM
Message: Joe

Thanks for a good reminder, it's the principal that is important, never mind the snipers :-)

I still remember the surprise when we first started coding with the MS-Windows 2.0 SDK (25 years ago?) and found the "Yes, No, Cancel" dialog box. "Oh! We've got to code for 3VL!" In a way it's good to know that the issue isn't 100% unique to SQL.

Regards
Keith

Subject: Rant?
Posted by: Bruce W Cassidy (view profile)
Posted on: Tuesday, May 19, 2009 at 4:16 PM
Message: Wow. A good discussion of 3VL and how gnarly it is, and people get hung up on identity being a table property?

Good article, Joe. And as you implied, I should remember to use ANY() and ALL() more often!

 










Phil Factor
SQL Server CRUD-Generation from System Views
 If you are not keen on repetitive typing, you can still rapidly produce production-quality documented code by... Read more...



 View the blog
Product Review: Schema Compare for Oracle
 One of the more important tasks in the process of rolling out incremental developments to a... Read more...

SQL Source Control: The Development Story
 Often, there is a huge difference between software being easy to use, and easy to develop. When your... Read more...

SQL Response: The dim sum interview
 Richard Morris met David and Nigel of the SQL Response team, in a dim sum Restaurant in Cambridge. They... Read more...

Data Correlation Optimization Internals
 Having adroitly introduced us, in his previous article, to the Date Correlation ability of the Query... Read more...

Transparent Data Encryption
  Transparent Data Encryption is designed to protect data by encrypting the physical files of the... Read more...

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 Database design and implementation is the cornerstone of any data centric project (read 99.9% of... Read more...

Beginning SQL Server 2005 Reporting Services Part 2
 Continuing his in-depth tour of SQL Server 2005 Reporting Services, Steve Joubert demonstrates the most... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

SQL Server Full Text Search Language Features
 SQL Full-text Search (SQL FTS) is an optional component of SQL Server 7 and later, which allows fast... Read more...

Over 150,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.

Join Simple Talk