Monday, July 14, 2014

NULL in T-SQL

Friday before I left I had the need to find all entries of configurations that we are using in one of our tables. I wanted to list all of them in nicely formatted SQL result so that I could later use that query to build a tool to make our configurations easier and faster to put together. I was joining two tables, one we'll call Cars, the other we'll call Attributes. Here's a sample query to clarify:

SELECT c.AttributeId, c.Name, a.[Description]
FROM Cars c 
JOIN Attributes a ON a.AttributeId = c.AttributeId
WHERE c.AttributeId IS NOT NULL 

The results looked something like this:

AttributeId   |  Name   | Description
-----------------------------
1             | Ford    | Cool Car
1             | Chevy   | Cool Car
2             | Honda   | Yellow Car
2             | Mazda   | Yellow Car

See how the id is repeated twice? This was ok because I knew we had more Attributes than Cars. The tool was intended to expose the attributes, the Cars were only there to clarify how the Attributes were being used. No word yet on how I'll display it in the UI.

I knew that my Attributes had 36 results, but when I joined them with the cars, I counted 12. This means we had a lot of orphaned Attributes. Interestingly, I had a hard time finding all of the attributes that weren't being used. I thought I could simply do this:

SELECT *
FROM Attributes 
WHERE AttributeId NOT IN (
 SELECT AttributeId
 FROM Cars
)

The problem with this approach was that some of the Cars have NO Attributes. Meaning they have NULL entries. When you try to use NULL's in conjunction with the IN operator in T-SQL, it's roughly equivalent to doing this:

SELECT 'Not being used.'
FROM 1 (NotEqual) 2 AND
     1 (NotEqual) NULL

What I found is that when you're working with NULL's in SQL you have to be very careful of three value logic. NULL's don't have a value so they don't evaluate to anything. When this happens SQL will just omit the results that don't equal either true or false. Since an IN operator is roughly equivalent to a AND operator, this will evaluate to unknown and won't be included in the results. See this awesome table to make a little more sense out of this, the portion is green is our specific scenario:

p q p OR q p AND q p = q
true true true true true
true false true false false
true unknown true unknown unknown
false true true false false
false false false false true
false unknown unknown false unknown
unknown true true unknown unknown
unknown false unknown false unknown
known unknown unknown unknown unknown