How to Get NULLs Horribly Wrong in SQL Server

NULLs in SQL are a mixed blessing. The Three-Valued Logic of SQL has its uses but can cause difficulties to anyone who assumes that NULL has a 'value'. It can make reports go horribly wrong in a number of alarming ways, as Robert Sheldon explains.

Failure #1: Not knowing what NULL means

When referring to NULL within our data sets, we often use phrases such as “the NULL value” or “a value of NULL.” I do it all the time. I see it done all the time. Such phrases are so common that we think little of their use. But inherent in the phrase is the notion of NULL as an actual value. It is not. NULL is a non-value, a nonexistent value. It is not zero. It is not an empty string. A value cannot equal NULL. No two NULL values are equal.

A NULL value is often defined as one that is unknown or not applicable, but even these definitions can be open to debate. For example, a record might not include a customer’s birthdate because the salesperson didn’t ask or because the customer would not provide it, but the customer still knows that date, so it is hardly unknown, nor is it any less applicable. There can be many reasons for missing data, and there has been much debate about a precise meaning of NULL. If you need to attach a specific meaning to NULL, then missing or absent data is probably your safest bet.

Perhaps a better way to think of NULL is as a setting or marker that indicates if a data value does not exist. That certainly seems to be how SQL Server treats NULL. The database engine uses a special bitmap to track which columns in a row are NULL and which are not. The bitmap contains a bit for each column, with the bit set to 1 if the column is NULL, that is, if the value is missing.

The confusion for some, particularly beginning T-SQL developers, is the way SQL Server Management Studio indicates that no value exists for a specific column: by displaying what appears to be a value that reads NULL. For example, the following SELECT statement returns data from the Product table in the AdventureWorks2014 sample database:

As the following results show, every row includes at least one column for which no value exists:

ProductName

ProductLine

Class

Style

SellEndDate

HL Road Frame – Black, 58

R

H

U

NULL

HL Road Frame – Red, 58

R

H

U

NULL

Sport-100 Helmet, Red

S

NULL

NULL

NULL

Sport-100 Helmet, Black

S

NULL

NULL

NULL

Mountain Bike Socks, M

M

NULL

U

2012-05-29 00:00:00.000

Mountain Bike Socks, L

M

NULL

U

2012-05-29 00:00:00.000

Sport-100 Helmet, Blue

S

NULL

NULL

NULL

AWC Logo Cap

S

NULL

U

NULL

Long-Sleeve Logo Jersey, S

S

NULL

U

NULL

Long-Sleeve Logo Jersey, M

S

NULL

U

NULL

Long-Sleeve Logo Jersey, L

S

NULL

U

NULL

If we look at the columns with the NULL values, we can see how difficult it is to determine why the values are missing. For example, is a SellEndDate value missing because the product is still live or because the application failed to properly record the end date? Are the Class and Style values missing because they don’t apply to those particular rows or because someone was careless in entering data?

An even bigger problem with viewing the results in this way is that it suggests that the database is actually storing multiple instances of this mysterious placeholder called NULL and that all those NULL values are the same. They are not the same. There is nothing there. Nothing cannot be the same as nothing. Nothing cannot be compared to nothing. You can think of these as the fundamental principles behind NULL. The principles of nothingness. And not understanding these principles can lead to all sorts of problems when working with SQL Server, resulting in many of the failures we cover here.

Failure #2: Treating NULL like a real value in comparisons

Developers who don’t understand how NULL works will sometimes use comparison operators to compare an expression to a NULL value, resulting in a statement that looks similar to the following:

The assumption here is that NULL is a valid value and therefore can be compared to other NULL values. That means, in this case, the statement should return each row whose Title value is NULL. A quick check into the source data will show four rows that match the statement’s assumed logic. However, because the database engine cannot compare nothing to nothing, it will not return these rows and will instead provide us with an empty result set.

If we assume from the empty result set that no rows in the source data match our criteria, we might be satisfied with the statement as is and send the code onto production. (Let’s hope no one would really do that.) We might also choose to continue along this logical trajectory and recast the statement to return the non-NULL data:

The statement is no longer trying to compare NULL with NULL, but rather compare real non-NULL values to NULL, which would seem an equally credible comparison, if not more so. It is not. We cannot compare real values to nothing any more than we can compare nothing to nothing. As a result, the statement still returns no data, even though we know for certain the data exists.

But before we go further, note that there are cases when comparing NULL values in this way can work, which we cover in the next failure. For now, know that the proper way to perform NULL comparisons is to use the IS NULL and IS NOT NULL operators. For example, the following SELECT statement uses the IS NULL operator to return rows whose Title value is NULL:

This time, the statement returns the data we expect, with the Title value missing, along with a couple of MiddleName values:

Title

FirstName

MiddleName

LastName

NULL

Ken

J

Sánchez

NULL

Terri

Lee

Duffy

NULL

Roberto

NULL

Tamburello

NULL

Rob

NULL

Walters

We can also return those rows that contain an actual Title value by instead using the IS NOT NULL operator:

Now the statement returns only the following two rows:

Title

FirstName

MiddleName

LastName

Ms.

Gail

A

Erickson

Mr.

Jossef

H

Goldberg

The IS NULL and IS NOT NULL operators provide the logic we need to handle the ambiguous nature of NULL values. The purpose of these operators is to determine whether or not an expression is NULL and from that return either TRUE or FALSE. Comparison operators, when faced with a NULL value, instead want to return UNKNOWN, which can result in unpredictable and costly mistakes if implemented in production.

Failure #3: Not knowing how ANSI_NULL works

In the last failure, I demonstrated why you should not use comparison operators against NULL values. Even if there are results to be had, a statement such as the following will likely not return them:

The word likely is key here. It turns out that the behavior I described is the default behavior, but not the only behavior. SQL Server supports a feature called ANSI_NULLS, which determines how the database engine handles NULL comparisons. The option is a carry-over from the ghost of SQL Server Past and has been deprecated since 2005. But the setting still exists, it still works, and it can still cause all sorts of trouble.

By default, SQL Server installs with the ANSI_NULLS option turned on, which gives us the behavior I’ve described so far. If we compare values to NULL, the database engine does not return those rows, even if NULL exists.

We can verify the current ANSI_NULLS setting by running a statement similar to the following:

If no one has messed with ANSI_NULLS, the database engine will return the first option, which will give us the results we’ve seen so far. But suppose we turn the option off:

Now our statement returns the expected four rows, despite the fact that we’re using a comparison operator against NULL:

Title

FirstName

MiddleName

LastName

NULL

Ken

J

Sánchez

NULL

Terri

Lee

Duffy

NULL

Roberto

NULL

Tamburello

NULL

Rob

NULL

Walters

With the option turned off, we can also use the not equal operator:

Now the results look like the following:

Title

FirstName

MiddleName

LastName

Ms.

Gail

A

Erickson

Mr.

Jossef

H

Goldberg

If we turn the option back on (it’s default setting), the database engine will again return no rows:

Because the ANSI_NULLS option has been deprecated, you should not use it and you should update any code that does, assuming you have the access and time. If you don’t, you could be in store for bigger problems.

Imagine an application that connects to two different SQL Server instances, one with the option turned on (the default) and one with it turned off. If the code is using comparison operators with NULL values, you could end up with inaccurate data or conflicting results. Even if you’re dealing with only one SQL Server instance, you still risk unexpected results, especially if statements are turning the ANSI_NULLS option on at the session level.

Failure #4: Treating ISNULL the same as IS NULL

The ISNULL function tends to cause a bit of confusion when first encountered, in part because of its name, which implies a meaning similar to the IS NULL operator. Another reason for confusion arises from the fact that products such as Microsoft Access and MySQL also include the ISNULL function, where it behaves much differently, more like the name suggests. But we’re talking SQL Server and need to understand how ISNULL is implemented here. A lack of understanding can lead to misuse and failure to take advantage of its real purpose in SQL Server.

Let’s looks at a few examples to demonstrate, starting with a simple query that returns products whose Color value is NULL:

ProductName

Color

LL Touring Handlebars

NULL

HL Touring Handlebars

NULL

If we’re not familiar with ISNULL in SQL Server or are transitioning in from Access, we might attempt to retrieve the same results by using ISNULL in our WHERE clause:

Fortunately, this construction will generate the following error, which we’ll likely catch immediately, or at least during the Q&A phase:

Those of us with a background in MySQL might instead take a different approach with the ISNULL function by comparing it to 1 or 0, the function’s two possible return values:

Once again, we receive an error message:

By now, we should have checked the documentation to verify what this function is supposed to do and have come to the conclusion that, based on the syntax and the fairly explicit error messages, we should be adding a second argument, which we do:

Unfortunately, all we’ve done is make our predicate completely illogical, resulting in yet another (but different) error message:

The point of that this is that, in SQL Server, the purpose of the ISNULL function is simply to replace an expression’s value with a real value, if the original value is NULL (nonexistent). If we wanted to make the preceding example right, we would need to define our Boolean condition correctly:

Now the SELECT statement returns the results we want, rather than an error:

ProductName

Color

LL Mountain Frame – Black, 40

Black

LL Touring Handlebars

NULL

HL Touring Handlebars

NULL

LL Crankset

Black

That said, we can achieve the same thing by using IS NULL, without incurring the extra processing:

More often than not, the place we’ll want to use the ISNULL function is within our select list to replace the NULL values with real values:

The Color column now contains a value for each row, with no missing values to contend with:

ProductName

Color

HL Road Pedal

Silver/Black

Touring Pedal

Silver/Black

ML Mountain Frame-W – Silver, 38

Silver

LL Mountain Frame – Black, 40

Black

LL Mountain Frame – Silver, 40

Silver

Front Derailleur

Silver

LL Touring Handlebars

neutral

HL Touring Handlebars

neutral

Front Brakes

Silver

LL Crankset

Black

Be aware that, when using ISNULL, the database engine converts the replacement value to the data type of the original value, making its possible for data to get truncated or for the database engine to generate a conversion error. Make sure you read up on ISNULL before adding it to your code.

Failure #5: Treating ISNULL the same as COALESCE

In the last failure, I tried to address any confusion there might be around the ISNULL function, but there’s another area of possible confusion, and that is in understanding the differences between ISNULL and COALESCE. Both functions let us replace NULL with a real value, but there are differences between the two that can be important to understand.

First off, the ISNULL function implemented in SQL Server is specific to that brand of T-SQL, whereas COALESCE is standard ANSI, making it more universal and, consequently, more portable, an important consideration if the possibility exists that you might one day want to point your apps to a different database system.

The COALESCE function also lets us include more than two expressions, as shown in the following SELECT statement:

The function evaluates the expressions in the order they’re specified and returns the first non-NULL value. As the following results show, each row returns one of the specified values in the FirstNotNull column:

ProductName

Class

Style

ProductLine

FirstNotNull

Rear Derailleur Cage

NULL

NULL

NULL

n/a

HL Road Frame – Red, 58

H

U

R

H

Mountain Bike Socks, M

NULL

U

M

U

Sport-100 Helmet, Blue

NULL

NULL

S

S

The COALESCE function can actually be thought of as syntactical shorthand for a CASE expression:

The SELECT statement achieves the same results as the preceding example, but its construction is more complex. The main advantage of using a CASE expression is that it makes the logic clearer, for those who might need a little extra help.

Compared to the COALESCE function, the ISNULL function is much more limited:

In this case, all we’re saying is that if Class is NULL, then use Style, giving us the following results:

ProductName

Class

Style

FirstNotNull

Rear Derailleur Cage

NULL

NULL

NULL

HL Road Frame – Red, 58

H

U

H

Mountain Bike Socks, M

NULL

U

U

Sport-100 Helmet, Blue

NULL

NULL

NULL

We could have specified n/a as our second argument, similar to what we did with COALESCE, but you get the point. COALESCE is more flexible and it conforms to ANSI standards. In addition, COALESCE handles data types differently. To understand this difference, let’s start with a different example:

The Class and Style columns are defined as NCHAR, and the ProductSubcategoryID column as INT. The SELECT statement returns the following results:

ProductName

Class

Style

ProductSubcategoryID

LL Shell

L

NULL

NULL

Now lets modify our code to include the COALESCE function:

The database engine now generates an error when it tries to convert the Class value to an INT:

It turns out that COALESCE uses the type with the highest precedence, based on the submitted values. Because the ProductSubcategoryID column takes an INT and an INT takes precedence over NCHAR (which is treated as NVARCHAR), the database engine tries to convert L to an INT, which of course is not possible.

In such cases, we might try changing the order of the columns we pass to the COALESCE function:

But we will again receive a conversion area. The database engine does not care about the order of the arguments, only which types take precedence over others:

The ISNULL function works differently:

In this case, the Class data type sets the standard, so the database engine has no problem returning the results:

ProductName

Class

ProductSubcategoryID

FirstNotNull

LL Shell

L

NULL

L

Now let’s reverse the order of the columns we pass to ISNULL:

The database engine will once again try to convert L to an INT and generate a conversion error. Be sure that, whenever you’re using these functions, you are well aware of the possibility for conversion errors.

Failure #6: Forgetting about the NULLIF function

While we’re on the topic of NULL-related functions, it’s worth noting another one that garners less attention, but can still be useful: the NULLIF function. We don’t seem to come across this one as often as the other functions, in part because there does not seem to be as many use cases for NULLIF as there are the others. Basically, the function returns a NULL value if two expressions are equal. If they are not equal, the function returns the first expression. Here’s an example:

All we’re doing here is passing the two columns into the function and returning the results, as shown in the following table:

ProductName

MakeFlag

FinishedGoodsFlag

NullIfEqual

Top Tube

1

0

1

Rear Derailleur Cage

0

0

NULL

HL Road Frame – Black, 58

1

1

NULL

Sport-100 Helmet, Red

0

1

0

We can achieve the same results by using a CASE expression instead of the NULLIF function:

As with COALESCE, the NULLIF function helps to simplify our statements, but the CASE expression makes the logic clearer to the rest of the world.

Perhaps one of the most useful places to use the NULLIF function is within a mathematical expression in which an error would be generated if a zero were encountered. Dividing by zero is a good example of this. Let’s start with the following SELECT statement:

The database engine runs the calculation with no problem and returns the following results:

ProductName

StandardCost

ListPrice

NetDiff

HL Road Frame – Black, 58

1059.31

1431.50

2.8461

Now let’s pull the data from a row in which the StandardCost and ListPrice values are zero:

This time, we get a divide-by-zero error:

This is where NULLIF comes it. By returning a NULL in place of a zero, we can at least prevent the database engine from generating an error:

Now we at least get results we can live with.

ProductName

StandardCost

ListPrice

NetDiff

Rear Derailleur Cage

0.00

0.00

NULL

We could have also used the WHERE clause to filter our rows that would have given us zero, but then we would not be able to view any of the data’s row. The NULLIF function at least gives us data to work with and makes it easy to do so.

Failure #7: Assuming that 2 + NULL = 2

As great as the NULLIF function is for handling certain situations, such as avoiding divide-by-zero errors, having NULL show up in our numeric calculations is often less than optimal. It’s not that SQL Server does anything surprising, but rather that you need to know what’s coming should you run into issues with your calculations.

This all goes back to the idea that NULL is a non-value. It is nothing. It merely indicates that any real data value is missing. Consequently, when we try to add a value to nothing or subtract from nothing or multiply by nothing or divide by nothing or do anything by nothing, we end up with nothing, or NULL. Take a look at this SELECT statement:

We’re simply multiplying the Weight value by 4.22 and getting the following results:

ProductName

Weight

FreightCost

Long-Sleeve Logo Jersey, L

NULL

NULL

Long-Sleeve Logo Jersey, XL

NULL

NULL

HL Road Frame – Red, 62

2.30

9.7060

HL Road Frame – Red, 44

2.12

8.9464

Not surprisingly, whenever we try to multiply by NULL, we get NULL. What choice does the database engine have? Now let’s throw a little addition in there to see what happens:

As before, when Weight contains a real value, we get a real value in return, but when NULL enters the equation, we always end up with NULL:

ProductName

Weight

FreightCost

Long-Sleeve Logo Jersey, L

NULL

NULL

Long-Sleeve Logo Jersey, XL

NULL

NULL

HL Road Frame – Red, 62

2.30

22.3660

HL Road Frame – Red, 44

2.12

21.6064

One approach we can take (when circumstances make it reasonable) is to define a default value that we can return in the event the source data is NULL:

In this case, we’re using ISNULL to return a default value of 4.95 should our calculation result in a NULL value, which gives us the following results:

ProductName

Weight

FreightCost

Long-Sleeve Logo Jersey, L

NULL

4.9500

Long-Sleeve Logo Jersey, XL

NULL

4.9500

HL Road Frame – Red, 62

2.30

9.7060

HL Road Frame – Red, 44

2.12

8.9464

SQL Server does what it should do when it encounters NULL. It returns NULL. We just need to be prepared for this eventuality. In some cases, having our expressions return a NULL value is fine for our circumstances, but at other times, we’ll want to adjust our code to accommodate those wayward missing values. Otherwise a whole lot of customers might end up getting free shipping because the weight was initially not a consideration.

Failure #8: Aggregating data without taking NULL into account

As with numeric calculations, we must be prepared to handle aggregations that could include NULL values. However, unlike mathematic operations, the database engine eliminates NULL values from the calculations before performing the actual aggregations, at least at the most basic level. Let’s start with a simple SELECT statement:

The statement returns the following results, which include two rows with a Weight value of NULL:

ProductName

Weight

Long-Sleeve Logo Jersey, L

NULL

Long-Sleeve Logo Jersey, XL

NULL

HL Road Frame – Red, 62

2.30

HL Road Frame – Red, 44

2.12

HL Road Frame – Red, 48

2.16

HL Road Frame – Red, 52

2.20

Suppose we now try to find the average weight for these rows:

The database engine will perform the aggregation with no problem by first eliminating the NULL rows from the mix and then returning a result of 2.195000. However, the results also include something else, a warning message about the NULL values:

We can verify that the NULL rows are being eliminated by instead using the COUNT aggregate function on the Weight column:

This time the database engine returns a value of 4, indicating that the two NULL rows have been eliminated. We will again receive the warning message.

All aggregate functions eliminate the NULL values prior to preforming the actual calculations except for the COUNT(*) function:

In this case, the database engine includes all rows, no matter where the NULL values lie, and will generate no warning messages.

The fact that SQL Server handles aggregations this way is not a bad thing. But it’s something you need to understand. If you’re not familiar with the data, your queries might be eliminating rows without understanding the full implications. You can consider using a function such as ISNULL to replace the NULL values, but you still risk skewing your results. As with many situations, handling NULL values can take a special touch. For example, detailed analytics will require, at the very least, knowledge of the number of NULL values in a particular data set.

As a side note, those who don’t appreciate having their queries return warning messages like those we’ve seen here, can use the SET ANSI_WARNINGS statement to turn warnings off when executing an aggregation:

Although this will give us the same results without the message, turning off this option can affect other operations. For example, modifying data on indexed views or on tables with indexed computed columns will fail if the option is turned off. So tread carefully if you consider this route.

Also be aware that when you start grouping and aggregating data, you might not get exactly the results you expect when that data includes NULL values. For example, the following SELECT statement groups by the ProductSubcategoryID column in our trusty Product table:

The following table shows the first 10 rows in our result set:

ProdSubID

AvgWeight

FullCount

ColCount

NULL

434.285714

209

7

1

25.058750

32

32

2

17.448139

43

43

3

27.680909

22

22

4

NULL

8

0

5

187.000000

3

3

6

317.000000

2

2

7

NULL

1

0

8

603.333333

3

3

9

151.500000

2

2

The full result set represents all 504 rows in the Product table, including rows whose ProductSubcategoryID and Weight columns are NULL. If we were to dig into the averages and the individual column counts for each group, we would discover that the NULL values are not included, but the result set as a whole includes all the data. Again, I point this out not to suggest that there is a problem with the way SQL Server groups and aggregates data, but only to demonstrate that we need to be aware of what’s happening with our queries when NULL values are involved. In this case, by including both types of counts in each group, we’re providing a handy way to provide insight into our source data.

Failure #9: Forgetting about NULL in your Boolean expressions

Boolean expressions that involve NULL values can be especially tricky because they introduce what is called three-valued logic. Typically, when an expression is evaluated, it returns either TRUE or FALSE. However, when NULL enters the mix, we get a third possibility: UNKNOWN. As we’ve already covered, SQL Server marks a value as NULL if it does not exist. As a result, a Boolean expression that tries to evaluate a condition based on a missing value results in UNKNOWN.

The main thing to keep in mind with Boolean expressions is that your result set will include only those rows in which the expression evaluates to TRUE. If the expression evaluates to FALSE or UNKNOWN, the row is omitted.

The specifics of three-valued logic can seem unwieldy when taking account the various combinations that can result from using the AND, OR, and NOT operators, but you can find a handy reference to help sort all that out be referring to the SQL Server help topic “Null Values.” In the meantime, lets look at a few examples that demonstrate what happens when NULL values enter the Boolean mix, starting with the following temporary table:

The following results show the data that’s now stored in the #products table:

ProductID

ProductName

Weight

Class

945

Front Derailleur

88.00

NULL

946

LL Touring Handlebars

NULL

L

947

HL Touring Handlebars

NULL

H

948

Front Brakes

317.00

NULL

949

LL Crankset

600.00

L

950

ML Crankset

635.00

M

951

HL Crankset

575.00

H

952

Chain

NULL

NULL

953

Touring-2000 Blue, 60

27.90

M

954

Touring-1000 Yellow, 46

25.13

H

Now let’s use the AND operator to create a simple Boolean expression in our WHERE clause:

As expected, the statement returns the results shown in the following table:

ProductID

ProductName

Weight

Class

951

HL Crankset

575.00

H

Notice that the query engine excludes the row for product ID 947. Even though the Class value is H, the Weight value is NULL, so the expression evaluates to UNKNOWN. And of course, row 952 is excluded because both the Weight and Class values are NULL.

Now let’s rerun the statement, but this time use the OR operator for our Boolean expression:

As expected, our results look quite different:

ProductID

ProductName

Weight

Class

947

HL Touring Handlebars

NULL

H

949

LL Crankset

600.00

L

950

ML Crankset

635.00

M

951

HL Crankset

575.00

H

954

Touring-1000 Yellow, 46

25.13

H

In this case, only one of the two Boolean conditions needs to evaluate to TRUE, resulting in many more returned rows. Even row 947 is included, even through the Weight value is NULL. Because the OR operator is used and the second condition evaluates to TRUE, that row is included.

This time around, let’s use the AND NOT operator in our statement:

Once again, we get very different results:

ProductID

ProductName

Weight

Class

949

LL Crankset

600.00

L

950

ML Crankset

635.00

M

Not surprising, the only rows to be included are those in which both Boolean conditions evaluate to TRUE. All the FALSE and UNKNOWN values are left behind.

As with aggregate functions, SQL Server’s treatment of Boolean expressions in itself is not a bad thing. You just need to be aware of what can happen with your data when NULL values are involved. Data can be eliminated that you might expect to be included, leading to unintended consequences in subsequent analytics and decision-making.

Failure #10: Not accounting for NULL in your string expressions

Up to this point, we’ve touched little upon string values and what happens when NULL enters the mix, but at least some coverage in this area is warranted, beyond what we saw when working with Boolean expressions.

Suppose we start with the following SELECT statement:

The statement returns the results shown here:

ProductID

ProductName

Color

835

ML Road Frame-W – Yellow, 44

Yellow

836

ML Road Frame-W – Yellow, 48

Yellow

837

HL Road Frame – Black, 62

Black

838

HL Road Frame – Black, 44

Black

839

HL Road Frame – Black, 48

Black

840

HL Road Frame – Black, 52

Black

841

Men’s Sports Shorts, S

Black

842

Touring-Panniers, Large

Grey

843

Cable Lock

NULL

844

Minipump

NULL

845

Mountain Pump

NULL

Now suppose when want to filter our data set even further by returning all rows with a Color value other than black:

Now our SELECT statement returns the following results:

ProductID

ProductName

Color

835

ML Road Frame-W – Yellow, 44

Yellow

836

ML Road Frame-W – Yellow, 48

Yellow

842

Touring-Panniers, Large

Grey

We might have expected the results to include those rows whose Color value is NULL. However, we’re once again faced with the issue of trying to evaluate an expression against what is essentially a missing value. A “not black” value cannot be equated with a “missing” value; consequently, those rows are omitted.

Concatenating string values follows similar logic:

Whenever NULL shows up, we end up with nothing (NULL) in our NewProductNumber column:

ProductName

ProductNumber

Color

NewProductNumber

ML Road Frame-W – Yellow, 44

FR-R72Y-44

Yellow

FR-R72Y-44_Yellow

ML Road Frame-W – Yellow, 48

FR-R72Y-48

Yellow

FR-R72Y-48_Yellow

HL Road Frame – Black, 62

FR-R92B-62

Black

FR-R92B-62_Black

HL Road Frame – Black, 44

FR-R92B-44

Black

FR-R92B-44_Black

HL Road Frame – Black, 48

FR-R92B-48

Black

FR-R92B-48_Black

HL Road Frame – Black, 52

FR-R92B-52

Black

FR-R92B-52_Black

Men’s Sports Shorts, S

SH-M897-S

Black

SH-M897-S_Black

Touring-Panniers, Large

PA-T100

Grey

PA-T100_Grey

Cable Lock

LO-C100

NULL

NULL

Minipump

PU-0452

NULL

NULL

Mountain Pump

PU-M044

NULL

NULL

Once again, the bottom line is that you must be prepared to handle NULL values if they could in any way impact your results. For example, you might try throwing an ISNULL into the mix:

Now the statement returns the following results:

ProductName

ProductNumber

Color

NewProductNumber

ML Road Frame-W – Yellow, 44

FR-R72Y-44

Yellow

FR-R72Y-44_Yellow

ML Road Frame-W – Yellow, 48

FR-R72Y-48

Yellow

FR-R72Y-48_Yellow

HL Road Frame – Black, 62

FR-R92B-62

Black

FR-R92B-62_Black

HL Road Frame – Black, 44

FR-R92B-44

Black

FR-R92B-44_Black

HL Road Frame – Black, 48

FR-R92B-48

Black

FR-R92B-48_Black

HL Road Frame – Black, 52

FR-R92B-52

Black

FR-R92B-52_Black

Men’s Sports Shorts, S

SH-M897-S

Black

SH-M897-S_Black

Touring-Panniers, Large

PA-T100

Grey

PA-T100_Grey

Cable Lock

LO-C100

NULL

LO-C100_neutral

Minipump

PU-0452

NULL

PU-0452_neutral

Mountain Pump

PU-M044

NULL

PU-M044_neutral

At least this way we have something to work with other than NULL.

Failure #11: Joining tables on nullable columns

Another area where we might get results other than what we expect is when we join data based on nullable columns. Because NULL value are considered non-values and because the database engine cannot match non-values, NULL values will show up in the join column only for outer joins. To demonstrate, let’s start by creating and populating a couple tables:

Notice that the CategoryAltID column in both tables contains NULL values. Now let’s create an inner join between those tables, based on that column:

Although the CategoryAltID column contains NULL values in both tables, they are not considered equal (for all the reasons we’ve discussed), which means they’re not included in the result set:

ProductID

ProductName

pCatAltID

cCatAltID

CategoryNumber

945

Front Derailleur

10

10

FR-R72Y-44

946

LL Touring Handlebars

20

20

FR-R72Y-48

948

Front Brakes

20

20

FR-R72Y-48

949

LL Crankset

10

10

FR-R72Y-44

Now let’s revise the statement to include a left outer join:

This time our results include all rows from the Product table, including those whose CategoryAltID column is NULL in both tables:

ProductID

ProductName

pCatAltID

cCatAltID

CategoryNumber

945

Front Derailleur

10

10

FR-R72Y-44

946

LL Touring Handlebars

20

20

FR-R72Y-48

947

HL Touring Handlebars

NULL

NULL

NULL

948

Front Brakes

20

20

FR-R72Y-48

949

LL Crankset

10

10

FR-R72Y-44

950

ML Crankset

NULL

NULL

NULL

The query engine includes NULL rows not because they’re considered matches, but rather because that’s the nature of an outer join and the way it returns all rows from the “outer” table. We can also try our statement with a full outer join:

Now our results include everything, regardless of the NULL values:

ProductID

ProductName

pCatAltID

cCatAltID

CategoryNumber

945

Front Derailleur

10

10

FR-R72Y-44

946

LL Touring Handlebars

20

20

FR-R72Y-48

947

HL Touring Handlebars

NULL

NULL

NULL

948

Front Brakes

20

20

FR-R72Y-48

949

LL Crankset

10

10

FR-R72Y-44

950

ML Crankset

NULL

NULL

NULL

NULL

NULL

NULL

NULL

FR-R92B-62

NULL

NULL

NULL

NULL

FR-R92B-52

If you consider the logic behind the query engine’s approach to NULL values, it’s not surprising we get the results we do. Like any place where NULL shows up, it comes down to understanding the data we could potentially be dealing with and what that means to the bigger picture. Whenever NULL enters into the equation, we must take it into consideration so we don’t end up with query results very different from what we expect. When we join data, a NULL in one column can cause us to miss important data in other columns, all because we did not think through the possible outcomes.

Failure #12: Creating a unique index without taking NULL into account

Up to this point, I’ve tried to emphasize how SQL Server uses NULL only to indicate that a value is missing. We cannot compare or add or multiply NULL values because we cannot compare or add or multiply missing values. When it comes to unique indexes, however, SQL Server seems to have a double standard, treating NULL values as equal values and consequently permitting only one NULL value in a unique index.

To demonstrate, let’s create and populate a simple table:

Now let’s try to create a unique index on the ProductNumber column:

Because the column contains multiple NULL values, the database engine returns the following error:

This might seem contradictory behavior on the part of SQL Server, given the non-value status it imposes on NULL everywhere else. After all, how can missing values be considered duplicates if there is no way to compare them?

Because of this limitation, we have to be particularly careful when we create our tables and indexes prior to populating them. We could be faced with similar error messages if we try to insert a row with a duplicate NULL value.

Fortunately, since SQL Server 2008, we’ve been able to create filtered indexes that let us sidestep the duplicate NULL issue:

The index is now created without a hitch. We can even insert additional duplicate NULL values:

Creating a filtered index is easy enough to do and is a handy way to sidestep the limitations of a unique index. Just be sure to refer to the SQL Server help topic “Create Filtered Indexes” to make certain you’re familiar with the limitations and restrictions on these types of indexes.

Failure #13: Sorting data on a nullable column

A result set that is not in the order you want might not seem such a big deal, but it can still be annoying when you sort your results and end up with a bunch of NULL values at the beginning. For example, the following SELECT statement sorts the data by the Color column:

The statement gives us the following results:

ProductID

ProductName

Color

843

Cable Lock

NULL

844

Minipump

NULL

845

Mountain Pump

NULL

837

HL Road Frame – Black, 62

Black

838

HL Road Frame – Black, 44

Black

839

HL Road Frame – Black, 48

Black

840

HL Road Frame – Black, 52

Black

841

Men’s Sports Shorts, S

Black

842

Touring-Panniers, Large

Grey

835

ML Road Frame-W – Yellow, 44

Yellow

836

ML Road Frame-W – Yellow, 48

Yellow

We can instead sort the data in descending order, but that would put the colors themselves in reverse order. Another work-around is to slip a CASE expression into our WHERE clause to trick the database engine:

The extra expression assigns 1 to the NULL rows and 0 to the rest so they’re sorted first. From there, the normal sorting occurs, as shown in the following results:

ProductID

ProductName

Color

837

HL Road Frame – Black, 62

Black

838

HL Road Frame – Black, 44

Black

839

HL Road Frame – Black, 48

Black

840

HL Road Frame – Black, 52

Black

841

Men’s Sports Shorts, S

Black

842

Touring-Panniers, Large

Grey

835

ML Road Frame-W – Yellow, 44

Yellow

836

ML Road Frame-W – Yellow, 48

Yellow

843

Cable Lock

NULL

844

Minipump

NULL

845

Mountain Pump

NULL

This might not be an earth-shattering solution, but it’s an easy enough way to move those NULL values to the bottom, if you happen to prefer them there.

Tags: , , ,

  • 49899 views

  • Rate
    [Total: 3    Average: 5/5]
  • Anonymous

    great!
    great!

  • Lsantos

    Great Article about NULL
    Great article with many useful examples and a very good sequence to explain step-by-step all of them.
    Thanks Robert

  • Anonymous

    Three valued logic.
    — A must read article.

    — Addition:
    — ‘Boolean expression that tries to evaluate a condition based on a missing value results in UNKNOWN.’

    DECLARE @Unknown varchar(30) — Used as the NULL
    DECLARE @A varchar(30) = 1,@B varchar(30) = 2

    SELECT ‘CONTRADICTION’ as result FROM (SELECT 1 X ) D WHERE (@A < @B) OR (@Unknown < @B)

    –Other systems can implement NULLs differently. (Equal to an empty string for example, or the lowest ‘date’ in a date format).
    –The interpretation of the NULL is often not clear. This can be model dependend or organisation/user dependend. This interpretation does not have to be wrong, but is often confusing.

    — Ben

  • Anonymous

    Three valued logic.
    — A must read article.

    — Addition:
    — ‘Boolean expression that tries to evaluate a condition based on a missing value results in UNKNOWN.’

    DECLARE @Unknown varchar(30) — Used as the NULL
    DECLARE @A varchar(30) = 1,@B varchar(30) = 2

    SELECT ‘CONTRADICTION’ as result FROM (SELECT 1 X ) D WHERE (@A < @B) OR (@Unknown < @B)

    –Other systems can implement NULLs differently. (Equal to an empty string for example, or the lowest ‘date’ in a date format).
    –The interpretation of the NULL is often not clear. This can be model dependend or organisation/user dependend. This interpretation does not have to be wrong, but is often confusing.

    — Ben

    • Charles Kangai

      Re: Three valued logic.
      The behaviour is easy to understand when one realises that NULL means UNKNOWN or MISSING. It is not equal to any value, not unequal to any value, not less than or greater than any value, and NULL is not equal to NULL! It is not confusing at all.

      If you try to make any such comparisons with NULL, e.g. @UNKNOWN

  • Anonymous

    re: Three valued logic
    Ben,

    In your example, @A is lexically less than @B, so the predicate returns TRUE. If you change the logical operator to AND, it returns FALSE, as expected.

    Where is the contradiction are you referring to?

    -Mike

  • Anonymous

    Contradicting the ‘line’, not contradicting Three valued logic.
    I do understand Three valued logic.

    ‘Boolean expression that tries to evaluate a condition based on a missing value results in UNKNOWN.’

    The result of the boolean expression in the example is based on a missing value but this boolean expression does not produce a Unknown. So it contradicts the quoted line.

    Yes it is still correct three valued logic, but it is slightly more subtle than dat a missing value in a boolean expression will always result in a UNKNOWN.
    For many people this is one of the confusing aspects when using expressions with missing values.

    Ben

  • Anonymous

    Three valued logic. A Unknown is not equal to a FALSE.
    In your example, @A is lexically less than @B, so the predicate returns TRUE. If you change the logical operator to AND, it returns FALSE, as expected.

    No it does not produce FALSE, because a NOT FALSE would produce a TRUE. The line would then produce a UNKNOWN (or NULL). A NOT NULL will still produce a NULL.
    It is rather important that a UNKNOWN is not equal to a False.

    Ben

  • Anonymous

    Three valued logic. A Unknown is not equal to a FALSE.
    In your example, @A is lexically less than @B, so the predicate returns TRUE. If you change the logical operator to AND, it returns FALSE, as expected.

    No it does not produce FALSE, because a NOT FALSE would produce a TRUE. The line would then produce a UNKNOWN (or NULL). A NOT NULL will still produce a NULL.
    It is rather important that a UNKNOWN is not equal to a False.

    Ben

  • Anonymous

    What to use instead of a NULL?
    The article was well written. Unfortunately, it left out the most important question: what to use instead of NULL? If data is unavailable or non-existent, and I need to indicate that fact, what’s the best solution?

    At work we took the approach of assigning a default value "N/A" for text-only columns, and "-9999" for numeric columns. Needless to say, problems arise when doing calculations on numeric columns, because "-9999" can distort the results. If we use NULL, then we are really not sure whether someone forgot to enter a value, or the value is non existent.

    Does anyone else have solution for handing NULLs in numeric columns?

    Note: we work with biological and ecological data.

    Ruperstland

    • Charles Kangai

      Re: What to use instead of a NULL?
      Don’t do that! See my post below. If the value is unknown then leave it as NULL. Do not enter some other value. If you enter a value like -9999 it means you know the value and it is -9999! NULL means the value is unknown or missing. Just know how NULL works, and that is what the blog author is trying to share.

  • Will Everitt

    NULLIF
    Thanks – had totally forgotten about NULLIF. Makes a much more elegant solution to divide by zero issues than the one I’ve implemented…

  • eremmel

    The NULL debate
    Check the option at https://www.simple-talk.com/opinion/opinion-pieces/chris-date-and-the-relational-model/ (an interview with Chris Date).

    Note his remarks about NULLs. In the end it pays off.

  • Charles Kangai

    NOT IN
    Excellent article.

    The only other gotcha I would add about NULLs is this:
    SELECT col1, col2 FROM Table WHERE col3 NOT IN (SELECT col4 FROM Table2);

    This is a big gotcha. If any values in col4 are NULL, the above query will not return any results. The query should be modified to:

    SELECT col1, col2 FROM Table WHERE col3 NOT IN (SELECT col4 FROM Table2 WHERE col4 IS NOT NULL);

  • Charles Kangai

    Replacing NULLs?
    The poster above who is asking what to use instead of NULL is missing the point. If you enter, e,g. -9999 instead of NULL the semantic has changed. NULL means the value is UNKOWN. If you enter -9999 it means you **know** the value and it is -9999! That is **very** different. The author of the blog is quite right. If you don’t know the value then it is NULL. Don’t replace it with something else. That is very bad. You want your data to reflect the truth. I often end up cleaning data using the NULLIF function where people have entered -9999 or ‘N/A’ instead of NULL.

  • Anonymous

    nice article
    I always use defaults where it makes sense to eliminate nulls where I can. This simplifies the logic.
    But can’t be used in every real world example.

  • Anonymous

    Re: What to use instead of a NULL?
    In case it’s helpful…
    We found the best thing, though clumsy, is to have a companion column that indicates the entry status of the value – so the value column would contain a NULL and the companion would contain a value indicating if it was ‘not recorded’, ‘refused’, ‘not applicable’, etc. (with a value indicating ‘recorded’ when the main column does contain a value).
    It a little messy, in that it means including checks on that column to make sense of the NULL where we need to, but we can differentiate between those different cases without contaminating the main data column.

  • Ron Moses

    a haiku…
    NULL is not zero
    NULL is not an empty string
    NULL is the unknown

  • MedcotTony

    Meaning of NULL
    I think that rather than stating that NULL means ‘nothing’, it is better to think of it as meaning something like ‘I don’t know what the value is’. Then some of your other explanations make more sense. After all, in English adding nothing to 2 clearly results in 2, but no-one would expect adding ‘I don’t know what the value is’ to 2 to yield a sensible answer, and comparing any value to ‘I don’t know what the value is’ would be expected to result in ‘I don’t know what the result is’!

  • Anonymous

    A baseball reference
    Back in Jr High, my algebra teacher used this example to explain the concept of null to us.

    The great baseball announcer Ernie Harwell would often say something like, "And at the end of 3 innings, there’s no score at Tiger Stadium." My teacher countered with, "That’s wrong. There is a score, it’s 0 to 0. No runs have been scored, but there is a score. The score is not null."

  • MedcotTony

    Meaning of NULL
    I think that rather than stating that NULL means ‘nothing’, it is better to think of it as meaning something like ‘I don’t know what the value is’. Then some of your other explanations make more sense. After all, in English adding nothing to 2 clearly results in 2, but no-one would expect adding ‘I don’t know what the value is’ to 2 to yield a sensible answer, and comparing any value to ‘I don’t know what the value is’ would be expected to result in ‘I don’t know what the result is’!

  • Anonymous

    Meaning of NULL
    Since the NULL issue still persists even today, decades after the RDBMS was invented (not by MS) then it may be time to re-think the usage/value of NULL altogether. Users generally have no use for it, developers find it annoying and it is nothing more than another value that has no more importance in a system than any other value but unlike any other value can and does cause alot of headaches. It’s just a nuisance, time to have an off switch.

    Statisticians argue that NULL means a value was not entered, of course you can also have a value that means a value wasn’t entered too. So to them zero and NULL mean something different or a boolean having 3 states, true, false and NULL actually means something. However, utlimately I can force a column to NULL so it defeats the purpose of stating a value was never entered and makes it a false presumption. Is not the act of inserting a NULL value into a column thereby change the value to NULL, so how can NULL not be something? Very obtuse reasoning on NULL supporters part.

    To me it’s forcing a definition on something that is forced upon you and if it didn’t exist a way to define a column value as non-existent would be invented but invented in very, very unique cases only. In the other 99.9% of applications NULL is nothing more than a nuisance value and simply has no meaning other than to make unneccessary complications.

    Maybe the real question is not how to deal with NULL but why should we have to deal with it at all unless we really want to.

  • Anonymous

    Meaning of NULL
    Since the NULL issue still persists even today, decades after the RDBMS was invented (not by MS) then it may be time to re-think the usage/value of NULL altogether. Users generally have no use for it, developers find it annoying and it is nothing more than another value that has no more importance in a system than any other value but unlike any other value can and does cause alot of headaches. It’s just a nuisance, time to have an off switch.

    Statisticians argue that NULL means a value was not entered, of course you can also have a value that means a value wasn’t entered too. So to them zero and NULL mean something different or a boolean having 3 states, true, false and NULL actually means something. However, utlimately I can force a column to NULL so it defeats the purpose of stating a value was never entered and makes it a false presumption. Is not the act of inserting a NULL value into a column thereby change the value to NULL, so how can NULL not be something? Very obtuse reasoning on NULL supporters part.

    To me it’s forcing a definition on something that is forced upon you and if it didn’t exist a way to define a column value as non-existent would be invented but invented in very, very unique cases only. In the other 99.9% of applications NULL is nothing more than a nuisance value and simply has no meaning other than to make unneccessary complications.

    Maybe the real question is not how to deal with NULL but why should we have to deal with it at all unless we really want to.

  • Anonymous

    Meaning of NULL
    Since the NULL issue still persists even today, decades after the RDBMS was invented (not by MS) then it may be time to re-think the usage/value of NULL altogether. Users generally have no use for it, developers find it annoying and it is nothing more than another value that has no more importance in a system than any other value but unlike any other value can and does cause alot of headaches. It’s just a nuisance, time to have an off switch.

    Statisticians argue that NULL means a value was not entered, of course you can also have a value that means a value wasn’t entered too. So to them zero and NULL mean something different or a boolean having 3 states, true, false and NULL actually means something. However, utlimately I can force a column to NULL so it defeats the purpose of stating a value was never entered and makes it a false presumption. Is not the act of inserting a NULL value into a column thereby change the value to NULL, so how can NULL not be something? Very obtuse reasoning on NULL supporters part.

    To me it’s forcing a definition on something that is forced upon you and if it didn’t exist a way to define a column value as non-existent would be invented but invented in very, very unique cases only. In the other 99.9% of applications NULL is nothing more than a nuisance value and simply has no meaning other than to make unneccessary complications.

    Maybe the real question is not how to deal with NULL but why should we have to deal with it at all unless we really want to.

  • Anonymous

    Meaning of NULL
    Since the NULL issue still persists even today, decades after the RDBMS was invented (not by MS) then it may be time to re-think the usage/value of NULL altogether. Users generally have no use for it, developers find it annoying and it is nothing more than another value that has no more importance in a system than any other value but unlike any other value can and does cause alot of headaches. It’s just a nuisance, time to have an off switch.

    Statisticians argue that NULL means a value was not entered, of course you can also have a value that means a value wasn’t entered too. So to them zero and NULL mean something different or a boolean having 3 states, true, false and NULL actually means something. However, utlimately I can force a column to NULL so it defeats the purpose of stating a value was never entered and makes it a false presumption. Is not the act of inserting a NULL value into a column thereby change the value to NULL, so how can NULL not be something? Very obtuse reasoning on NULL supporters part.

    To me it’s forcing a definition on something that is forced upon you and if it didn’t exist a way to define a column value as non-existent would be invented but invented in very, very unique cases only. In the other 99.9% of applications NULL is nothing more than a nuisance value and simply has no meaning other than to make unneccessary complications.

    Maybe the real question is not how to deal with NULL but why should we have to deal with it at all unless we really want to.

  • Anonymous

    Meaning of NULL
    Since the NULL issue still persists even today, decades after the RDBMS was invented (not by MS) then it may be time to re-think the usage/value of NULL altogether. Users generally have no use for it, developers find it annoying and it is nothing more than another value that has no more importance in a system than any other value but unlike any other value can and does cause alot of headaches. It’s just a nuisance, time to have an off switch.

    Statisticians argue that NULL means a value was not entered, of course you can also have a value that means a value wasn’t entered too. So to them zero and NULL mean something different or a boolean having 3 states, true, false and NULL actually means something. However, utlimately I can force a column to NULL so it defeats the purpose of stating a value was never entered and makes it a false presumption. Is not the act of inserting a NULL value into a column thereby change the value to NULL, so how can NULL not be something? Very obtuse reasoning on NULL supporters part.

    To me it’s forcing a definition on something that is forced upon you and if it didn’t exist a way to define a column value as non-existent would be invented but invented in very, very unique cases only. In the other 99.9% of applications NULL is nothing more than a nuisance value and simply has no meaning other than to make unneccessary complications.

    Maybe the real question is not how to deal with NULL but why should we have to deal with it at all unless we really want to.

  • Toronto

    Put missing or unknown pieces of data on the waiting list…
    We may or may not agree with use of NULL markers, but what will always happen is missing or unknown data. That problem can be solved at least in two ways:
    – If you know, or suspect, than certain pieces of data will not be available at the moment of INSERTion of a new row, then that piece of data should not be in the table. Put it somewhere else, where it can safely be missing altogether until you get that piece of data. For example, you just hired an employee, and you want her car’s licence plate in order to issue parking label, so she can park in the company’s garage. Make it part of a separate table and enter it when it is available. If the new employee has no car, nothing happens – no entry is made into table EmpParkingLabels.
    – If something is really optional, not mandatory piece of information, like Comment, leave it in the table, even as NULL or with some default, ‘N/A’ or something, but please no zero length strings ( they can be more troublesome than NULLs )

    I did not make this up on my own. You can read it in the book "Applied Mathematics for Database Professionals", by Lex de Haan and Toon Koppelears, which I learned about on – SQL Central

    🙂

  • Toronto

    .. or you can place the NULLs under tight control
    By adding CHECK constraint, sometimes we can control NULLs. Imagine table Employee,
    CREATE TABLE Employees (EmpID int PK
    , Job varchar(15) NOT NULL
    , Certification varchar(50) NULL)

    Conditions:
    – Job can be any of the following: {‘dba’,’programmer’,’marketing’,’accountant’..)
    – We have a policy that database administrators (Job = ‘dba’) must be certified; nobody messes with our databases unless certified. This condition can be expressed by the following CHECK constraint:

    ALTER TABLE Employees
    ADD CCONSTRAINT [DB must be certified]
    CHECK ( NOT (Job = ‘dba’ AND Certification IS NULL ))

    Constraint [DB must be certified] demands that for Job = ‘dba’ there must be an entry in column Certification.

    If we want to prevent entries in column Certification for employees that are not ‘dba’, we can add one more constraint:

    ALTER TABLE Employees
    ADD CCONSTRAINT [Column Certification is empty for non-dba employees]
    CHECK ( NOT (Job <> ‘dba’ AND Certification IS NOT NULL ))

    This way, the NULLs have some meaning – ‘This employee is not DBA’. That is different from UNKNOWN or MISSING or NOY YET KNOWN.

    In the same book I mentioned in previous post, Applied Mathematics for Database Professionals, there is an example like this, with controlled NULLs. Authors rightly call that solution ‘a hack’, it should have been in a separate table. However, in MS SQL at least, it is not easy to make it mandatory to insert new record in a table when we hire or promote somebody to role of ‘dba’

    Other than that, avoid NULLs, at least in tables. OUTER JOINS will still introduce NULLs, but those we expect and understand – they too have meaning – ‘there is no matching row on the other side of the JOIN’.

  • goussarova

    why we need nulls
    Greate article, but it fails to explain why, in fact, do we need nulls if they are such a pain to deal with.

    – with referential integrity on, you can still add a record with null as a secondary key. This comes very handy at times. Without nulls, you are forced to create a fake primary key, which is not always easy.

    – in numeric columns where data can be negative, zero, or positive, it is impossible to create a neutral default for missing data. Without a neutral default, no one will ever know if the temperature was actuall 0, or you forgot to enter it, and the aggregates will be all screwed.
    Say, if you measured air temperature in Celsius in tropics, and half of your records are missing temperature value, and you use 0 as default, your agverage temperature will be about 15 degrees, which is not warm at all. That is why the aggregates are implemented they way they are, and that is why we have to have nulls.

  • cbowman

    COALESCE is slower than ISNULL
    Great article. I sent it around to all our programmers as required reading. We loved the idea of using COALESCE until we tested it and found it to be significantly slower than isnull. In our test a query that used isnull run in < 1 second but takes 5 seconds to run using COALESCE. Has anyone else seen similar results?