Click here to monitor SSC
Av rating:
Total votes: 11
Total comments: 4


Kathi Kellenberger
SSN Matching Speed Phreakery
05 August 2010

On Ask.SQLServerCentral.com, a group of people interested in experimenting with heavily optimised SQL techniques try them out on a problem, using reasonbly large amounts of data. They aren't so interested in explaining the techniques, so Kathi continues on her mission to explain the lessons learned and the tips that can be derived.

This is the third in a series of articles (see Part 1 and Part 2) aiming to dissect and analyze the cleverest and fastest SQL solutions to the challenges posed by the Speed Phreak competitions, expose and explain the core set-based techniques on which they rely, and so make these techniques more widely-accessible to developers who need a faster and more scalable solution than the cursor can offer.

This time, Phil Factor has engaged some of the best minds in the T-SQL world to tackle the SSN matching SQL Problem, where entrants had to provide an SQL solution to establish the validity or otherwise of user-entered, 9-digit Social Security Numbers (SSNs), compared against a table of validated numbers, and according to a set of established matching rules.

The existing cursor-based solution, referred to affectionately as "The Fu Bar Code" worked, but took over 70 pain-filled minutes to execute. Astonishingly, the vast majority of entrants produced alternative set-based solutions that executed in well under one second, proving just how much time and processing power can be saved by a bit of brain power coupled with some solid set-based techniques. The winning SQL solution was submitted by Matt Whitfield which, on the sample data set, ran in under 0.5 s.

SSSIS and CLR-based solutions
There was very little in it, but on this occasion the fastest entry of all was based on a very clever CLR algorithm, submitted by Johan Åhlén (320 ms) and second placed entry exploited the multi-threading power of SSIS. These solutions will be covered in separate articles.

Although I didn't enter the competition, I decided to join in the fun. After investigating and explaining the supplied cursor solution, I will present the "Aunt Kathi" SQL solution, devised before I examined any of the competition entries, which attempts to walk the fine line between outright speed and ease-of-understanding (and therefore ease-of-maintenance), and performs comparably with many of the set-based solutions submitted to the competition.

Finally, I'll dissect the winning SQL solution from Matt, and point out where and how he managed to squeeze out a bit of extra speed.

The SSN Matching Rules

The sample data for the problem consists of two tables: one holding a set of user entered US social security numbers (UserEntered), and one holding a set of validated social security numbers (Validated). The goal is to produce a list of the user SSNs that exactly match the validated entries, along with those that are closely matched, but differ by one or two digits (due to user error during entry).

If a user-entered SSN has two non-matching digits, compared to a validated SSN, then it is only considered a close match if the non-matching digits are not adjacent. In other words, the non-matching digits must both be in odd positions or must both be in even positions. If a user-entered SSN matches a validated SSN exactly, then it cannot be used with any other validated SSNs. In other words, if there is an exact match then other validated SSNS that are a close match for that user-entered SSN should be excluded from the results. Conversely, a non-matching user SSN may closely match more than one validated SSNs.

The final results must contain the user SSN, the matching or closely matching validated SSN and a status. The status will be 0 for an exact match, 1 for one incorrect digit, or 2 for two incorrect digits. Any user SSNs that are not matches, or close matches, will not be part of the results. The final results must be sorted by validated SSN, user-submitted SSN, and status.

To illustrate how these rules work in practice, let’s look at some sample data and results. Table 1 contains a few validated SSNs from the Validated table and table 2 contains some UserEntered data; both display the odd and even position digits to aid in comparisons. Table 3 gives the results and an explanation for each entry.

Validated SSN

Odd Digits

Even Digits

007309486

07046

0398

007101415

07045

0111

007201415

07045

0211

Table 1: Validated SSN

User Entered SSN

Odd Digits

Even Digits

007309486

07046

0398

007309488

07048

0398

017201415

07045

1211

Table 2: User Entered SSN

Table 3 gives the results of a validation check between the entries in each table, and an explanation for each result.

User Entered SSN

Validated SSN

Status

Explanation

007309486

007309486

0

Exact match.

007309486

007101415

007201417

N/A

Don’t try to match additional entries if an exact match is found.

007309488

007309486

1

The even digits match, and there is

one incorrect digit in the odd positions.

007309488

007101415

007201417

N/A

Neither the even positions or odd

are an exact match.

017201415

007309486

N/A

Neither the even nor odd positions are an

exact match

017201415

007101415

2

The odd positions match, and there are

two incorrect digits in the even positions.

017201415

007201417

1

The odd positions match, and there is one

Incorrect digit in the even positions.

Table 3: The Results

Set up

To set up the challenge, follow these steps:

  • Create a database on a non-production SQL Server 2005 or later instance.
  • Download and unzip the Challenge5.zip file from here http://challenges.developerworkshop.net/challenge5.zip
  • Run the “Setup.sql” script in your new database to create two tables.
  • Run the “Populate.sql” script in your new database to populate the tables.

The Cursor-Based Solution

The cursor-based “Fu Bar Code”, provided as part of the challenge, gives the correct results but takes a very long time, about 70 minutes, to return those results, using the sample data. To save space, I am not going to provide the complete solution here, but you can find it in its entirety in the “Fu Bar.sql” script included with the Challenge5.zip file.

The opening section of the Fu Bar code (Listing 1) creates a temp table (#CheckedSSN) to hold the results and an accompanying non-clustered index. It then declares and opens the first cursor, to hold and examine each user-entered SSN. The current user-entered SSN value is retrieved using the FETCH statement and is stored in a variable (@SSN).

SET NOCOUNT ON

CREATE TABLE       #CheckedSSN

            (

                   vSSN CHAR(9) NOT NULL,

                   ueSSN CHAR(9) NOT NULL,

                   [Status] TINYINT NOT NULL

            )

CREATE

NONCLUSTERED

INDEX              IX_CheckedSSN

ON          #CheckedSSN

            (

                   ueSSN ASC

            )

INCLUDE            (

                   [Status]

            )

DECLARE     curUserEntered CURSOR FOR

            SELECT SSN

            FROM   dbo.UserEntered

DECLARE     @SSN CHAR(9)

OPEN curUserEntered

FETCH       NEXT

FROM curUserEntered

INTO @SSN

WHILE @@FETCH_STATUS = 0

     BEGIN

Listing 1: A temp table and a cursor

Once inside the WHILE loop, the code inserts one or more rows into the #CheckedSSN table for each SSN value. The SELECT clause of the INSERT statement is based on a derived table – that is a sub-query within the FROM clause that takes the place of a table or view name. The sub-query compares each position (1 through 9) from the variable to the position of the values in the Validated table. The code uses nine CASE expressions, assigning a 0 for a match or a 1 for a non-match. Inside each CASE expression, the SUBSTRING function is used to parse out one position at a time. The benefit of using the derived table, in this case, is that the code can refer to the results by the aliases pos1, pos2, and so on, in other parts of the statement, instead of repeating the CASE expressions.

If we add together the values held in each odd position, and each even position, then an exact match will be indicated by the result "Odd 0 and Even 0". Recall that, for a close match, at least the odd positions or even positions must match and that a maximum of two non-matched positions are allowed, which leads to four other possible outcomes: "Odd 2 and Even 0"; "Odd 0 and Even 2"; "Odd 1 and Even 0"; "Odd 0 and Even 1". Any other outcome indicates a non-match and the results should not be included. The rather-complicated WHERE clause makes sure that only the results for matches and close matches are inserted. The abbreviated code can be found in Listing 2.

            INSERT #CheckedSSN

                   (

                         vSSN,

                         ueSSN,

                         [Status]

                   )

            SELECT SSN AS vSSN,

                   @SSN AS ueSSN,

                   pos1 + pos2 + pos3 + pos4 + pos5 + pos6 + pos7 + pos8 + pos9
                                                                      AS [Status]

            FROM   (

                         SELECT SSN,

                                CASE SUBSTRING(SSN, 1, 1)

                                       WHEN SUBSTRING(@SSN, 1, 1) THEN 0

                                       ELSE 1

                                END AS pos1,

                                CASE SUBSTRING(SSN, 2, 1)

                                       WHEN SUBSTRING(@SSN, 2, 1) THEN 0

                                       ELSE 1

                                END AS pos2,

                         FROM   dbo.Validated

                   ) AS d

            WHERE  (

                         pos1 + pos3 + pos5 + pos7 + pos9 = 2

                         AND pos2 + pos4 + pos6 + pos8 = 0

                   )

                   OR

                   (

                         pos1 + pos3 + pos5 + pos7 + pos9 = 0

                         AND pos2 + pos4 + pos6 + pos8 = 2

                   )

                   OR

                   (

                         pos1 + pos3 + pos5 + pos7 + pos9 = 1

                         AND pos2 + pos4 + pos6 + pos8 = 0

                   )

                   OR

                   (

                         pos1 + pos3 + pos5 + pos7 + pos9 = 0

                         AND pos2 + pos4 + pos6 + pos8 = 1

                   )

                   OR

                   (

                         pos1 + pos3 + pos5 + pos7 + pos9 = 0

                         AND pos2 + pos4 + pos6 + pos8 = 0

                   )

Listing 2: The abbreviated INSERT statement

At the bottom of the loop, the next value from the cursor is stored in @SSN as would be expected. After the loop, the code closes the curUserEntered cursor…and then declares another cursor (curMatch)!

At this point, the temp table should have all the required rows. The only problem is that any exactly matching user-entered values may also have some closely matching rows which should not be included. The second cursor eliminates those unneeded rows by deleting any rows with a status greater than 0 if the user-entered value also has a matching row with a status of 0. Unfortunately, the deletions are accomplished one SSN at a time. Sadly, even though it is very easy to delete all the invalid rows with a single DELETE statement, it is fairly common to see code in production applications and scripts that avoid such simplicity in favor of the brute-force cursor.

Finally, the results are displayed. Listing 3 shows the closing section of the script.

DECLARE     curMatch CURSOR FOR

            SELECT ueSSN

            FROM   CheckedSSN

            WHERE  [Status] = 0

 

OPEN curMatch

 

FETCH       NEXT

FROM curMatch

INTO @SSN

 

WHILE @@FETCH_STATUS = 0

     BEGIN

            DELETE

            FROM   CheckedSSN

            WHERE  ueSSN = @SSN

                   AND [Status] > 0

 

            FETCH  NEXT

            FROM   curMatch

            INTO   @SSN

     END

 

CLOSE              curMatch

DEALLOCATE  curMatch

 

SELECT             vSSN,

            ueSSN,

            [Status]

FROM        CheckedSSN

ORDER BY    vSSN,

            ueSSN,

            [Status]

Listing 3: The second cursor and the SELECT

The “Aunt Kathi” Solution

Before delving into the winning solution I decided, this time, to devise my own solution first and see how well it performed. Listing 4 shows the “Aunt Kathi” solution, with not a cursor in sight.

--Create a temp table to hold the results

CREATE TABLE #Results(ueSSN CHAR(9), vSSN CHAR(9), Status int

)

 

--Insert the exact matches

INSERT INTO #Results(ueSSN, vSSN, Status)

SELECT ue.SSN, v.SSN, 0

FROM UserEntered AS ue

     JOIN Validated AS v ON ue.SSN = v.SSN

 

--Create a temp table to hold remainder of SSN

--with columns for each position

CREATE TABLE #ueSSN(SSN CHAR(9), Pos1 CHAR(1), Pos2 CHAR(1),

     Pos3 CHAR(1), Pos4 CHAR(1), Pos5 CHAR(1), Pos6 CHAR(1),

     Pos7 CHAR(1), Pos8 CHAR(1), Pos9 CHAR(1))

 

--Populate the #ueSSN table. Parse out the SSN.

--Leave out the exact matches!

INSERT INTO #ueSSN(SSN, Pos1, Pos2, Pos3, Pos4, Pos5, Pos6,

     Pos7, Pos8, Pos9)

SELECT SSN, SUBSTRING(SSN, 1, 1), SUBSTRING(SSN, 2, 1),

     SUBSTRING(SSN, 3, 1), SUBSTRING(SSN, 4, 1),

     SUBSTRING(SSN, 5, 1), SUBSTRING(SSN, 6, 1),

     SUBSTRING(SSN, 7, 1), SUBSTRING(SSN, 8, 1),

     SUBSTRING(SSN, 9, 1)

FROM UserEntered

WHERE SSN NOT IN (SELECT ueSSN FROM #Results)

 

 

--Create a temp table for the validated SSNs along

--with a column for each position

CREATE TABLE #vSSN(SSN CHAR(9), Pos1 CHAR(1), Pos2 CHAR(1),

     Pos3 CHAR(1), Pos4 CHAR(1), Pos5 CHAR(1), Pos6 CHAR(1),

     Pos7 CHAR(1), Pos8 CHAR(1), Pos9 CHAR(1))

    

--Insert all validated SSN into #vSSN

--Parse out the SSN

INSERT INTO #vSSN(SSN, Pos1, Pos2, Pos3, Pos4, Pos5, Pos6, 

     Pos7, Pos8, Pos9)

SELECT SSN, SUBSTRING(SSN, 1, 1), SUBSTRING(SSN, 2, 1),

     SUBSTRING(SSN, 3, 1), SUBSTRING(SSN, 4, 1),

     SUBSTRING(SSN, 5, 1), SUBSTRING(SSN, 6, 1),

     SUBSTRING(SSN, 7, 1), SUBSTRING(SSN, 8, 1),

     SUBSTRING(SSN, 9, 1)

FROM Validated

 

--For those that match on the EVEN positions,

--compare the odd positions.

--Insert the rows where the odd positions add up to < 3

;WITH Evens AS (

SELECT ue.SSN, v.SSN AS Validated, 

     CASE WHEN ue.Pos1 = v.Pos1 THEN 0 ELSE 1 END +

     CASE WHEN ue.Pos3 = v.Pos3 THEN 0 ELSE 1 END +

     CASE WHEN ue.Pos5 = v.Pos5 THEN 0 ELSE 1 END +

     CASE WHEN ue.Pos7 = v.Pos7 THEN 0 ELSE 1 END +

     CASE WHEN ue.Pos9 = v.Pos9 THEN 0 ELSE 1 END AS Status

FROM #ueSSN AS ue join #vSSN AS V

ON ue.Pos2 = v.Pos2 and ue.Pos4 = v.Pos4

     and ue.Pos6 = v.Pos6 and ue.Pos8 = v.Pos8)

INSERT INTO #Results(ueSSN, vSSN, Status)

SELECT SSN, Validated, Status

FROM Evens

WHERE Status < 3;

 

--For those that match on the ODD positions,

--compare the EVEN positions.

--Insert the rows where the EVEN positions add up to < 3

;WITH Odds AS (

SELECT ue.SSN, v.SSN AS Validated,

     CASE WHEN ue.Pos2 = v.Pos2 THEN 0 ELSE 1 END +

     CASE WHEN ue.Pos4 = v.Pos4 THEN 0 ELSE 1 END +

     CASE WHEN ue.Pos6 = v.Pos6 THEN 0 ELSE 1 END +

     CASE WHEN ue.Pos8 = v.Pos8 THEN 0 ELSE 1 END AS Status

FROM #ueSSN AS ue join #vSSN AS V

ON ue.Pos1 = v.Pos1 and ue.Pos3 = v.Pos3

     AND ue.Pos5 = v.Pos5 AND ue.Pos7 = v.Pos7

     AND ue.Pos9 = v.Pos9)

INSERT INTO #Results(ueSSN, vSSN, Status)

SELECT SSN, Validated, Status 

FROM Odds

WHERE Status < 3

 

--Display results

SELECT ueSSN, vSSN, Status

FROM #Results

ORDER BY ueSSN, vSSN, Status

Listing 4: The “Aunt Kathi” code

Just like the Fu Bar code, I decided to create a temp table to hold the results. Since the exact matches need no further processing, I decided to insert all of them up front. I knew that I was going to be comparing single positions for the rest of the entries, so I thought it would be more efficient to parse the digits out once and save the parsed-out values. I created temp tables for both the remaining user entered SSNs (#ueSSN) and the validated SSNs (#vSSN) with a column for each position. I inserted the user entered values, minus the exact matches, and used the SUBSTRING function to populate all the individual digits. I used the same technique to insert all of the validated entries into #vSSN.

A close match must match exactly on the even positions or the odd positions. When a column from one table must equal a value in a column from another table, you can join those tables on those columns. I could have created ODD and EVEN columns in my temp tables, but I decided to just join on multiple position columns. If the even positions matched, I just needed to figure out how many odd positions didn’t match and vice versa. In the SELECT clause, I used a CASE expression to assign 0 for matching digits or 1 for non-matching digits and added up the values aliasing the sum as Status. I wrapped this query in a common table expression (CTE) to avoid including the CASE expressions in my WHERE clause. I then only had to filter out any rows with Status values greather than two in the main query. The results of the two queries (one for even and one for odd) were inserted into the temp #Results table. Now the only thing left to do was display the results.

While my solution won’t win any awards, it is quite fast while still being easy to understand. I arrived at the solution by just thinking through what needed to be done and breaking the problem down to its components.

Matt Whitfield's Winning SQL Solution

Not surprisingly, the winner Matt broke the problem down in a similar way as the “Aunt Kathi” code, but without the use of temp tables. The solution is actually only one complex statement which utilizes CTEs, derived tables, and UNION queries. The code creates two CTEs, one for the user entered values (UserEnteredTmp) and one for the validated values (ValidatedTmp). Not only do the CTEs have the digits parsed out, they also have ODD and EVEN columns. To create the user entered CTE, he used nested derived tables and a NOT IN sub-query. The matching values were eliminated from the user entered CTE at this point. The second CTE is similar, but there is no need to eliminate anything. They both parse the positions with SUBSTRING and concatenate back together to create the ODD and EVEN columns. Listing 5 shows the CTE definitions.

WITH UserEnteredTmp ([u1], [u2], [u3], [u4], [u5],

                     [u6], [u7], [u8], [u9], [odds], [evens], [ssn])

AS

(

   SELECT [u1], [u2], [u3], [u4], [u5],

      [u6], [u7], [u8], [u9],

      [u1] + [u3] + [u5] + [u7] + [u9],

      [u2] + [u4] + [u6] + [u8], [SSN]

FROM

     (

       SELECT SUBSTRING(SSN, 1, 1) AS u1, SUBSTRING(SSN, 2, 1) AS u2,

           SUBSTRING(SSN, 3, 1) AS u3, SUBSTRING(SSN, 4, 1) AS u4,

           SUBSTRING(SSN, 5, 1) AS u5, SUBSTRING(SSN, 6, 1) AS u6,

           SUBSTRING(SSN, 7, 1) AS u7, SUBSTRING(SSN, 8, 1) AS u8,

           SUBSTRING(SSN, 9, 1) AS u9, [SSN]

       FROM  (SELECT [SSN] AS SSN

              FROM [dbo].[UserEntered]

              WHERE [SSN] NOT IN

                    (SELECT [SSN]

                     FROM [dbo].[Validated])

              ) idat

     ) odat

),

ValidatedTmp ([u1], [u2], [u3], [u4], [u5],

              [u6], [u7], [u8], [u9], [odds], [evens], [ssn])

AS

(

     SELECT [u1], [u2], [u3], [u4], [u5],

              [u6], [u7], [u8], [u9],

              [u1] + [u3] + [u5] + [u7] + [u9],

              [u2] + [u4] + [u6] + [u8], [ssn]

       FROM

     (

           SELECT SUBSTRING(SSN, 1, 1) AS u1, SUBSTRING(SSN, 2, 1) AS u2,

                  SUBSTRING(SSN, 3, 1) AS u3, SUBSTRING(SSN, 4, 1) AS u4,

                  SUBSTRING(SSN, 5, 1) AS u5, SUBSTRING(SSN, 6, 1) AS u6,

                  SUBSTRING(SSN, 7, 1) AS u7, SUBSTRING(SSN, 8, 1) AS u8,

                  SUBSTRING(SSN, 9, 1) AS u9, [SSN]

         FROM [dbo].[Validated]

     ) odat

)

Listing 5: The CTE definitions

After the CTE definitions comes the main query, which implements the matching logic. The actual meat of the main query is a derived table composed of a UNION SSN query. The first component of the union returns the exact matches. Matt chose to retrieve the exact matches using a sub-query of the validated SSNs, rather than by joining the two tables. Listing 6 shows the "exact matches" portion of the query.

SELECT [SSN] AS ueSSN, [SSN] AS vSSN, 0 AS [Status]

          FROM   (SELECT [dbo].[UserEntered].[SSN] AS SSN

                  FROM   [dbo].[UserEntered]

                  WHERE  [SSN] IN (SELECT [SSN]

                                   FROM   [dbo].[Validated])) AS idat

          UNION ALL

Listing 6: The exact matching rows

Following the exact match query, Matt has a query for the even position matches and a query for the odd position matches. Since he created Odd and Even columns, it was easy to join UserEnteredTmp to ValidatedTmp. Listing 7 shows the query used to find the odd position matches. Just as in the Aunt Kathi solution, he used CASE to determine the number of matches in even positions and added them up, for the Status column. The subsequent rows were then filtered so that only rows with a status less than three are included.

SELECT [ueSSN], [vSSN], [Status]

          FROM (

             SELECT [UserEnteredTmp].[ssn] AS ueSSN,

               [ValidatedTmp].[ssn] AS vSSN,

               CASE WHEN [ValidatedTmp].[u2] = [UserEnteredTmp].[u2] THEN 0 ELSE 1 END

             + CASE WHEN [ValidatedTmp].[u4] = [UserEnteredTmp].[u4] THEN 0 ELSE 1 END

             + CASE WHEN [ValidatedTmp].[u6] = [UserEnteredTmp].[u6] THEN 0 ELSE 1 END

             + CASE WHEN [ValidatedTmp].[u8] = [UserEnteredTmp].[u8] THEN 0 ELSE 1 END

             AS [Status]

            FROM   UserEnteredTmp INNER JOIN ValidatedTmp ON [UserEnteredTmp].[odds] =  

                        [ValidatedTmp].[odds]) AS idat

            WHERE  [Status] < 3

          UNION ALL

Listing 7: The odd position matches

The final section of code performs the even position matches, and the necessary ordering, as shown in Listing 8.

          SELECT [ueSSN], [vSSN], [Status]

          FROM (

            SELECT [UserEnteredTmp].[ssn] AS ueSSN,

              [ValidatedTmp].[ssn] AS vSSN,

               CASE WHEN [ValidatedTmp].[u1] = [UserEnteredTmp].[u1] THEN 0 ELSE 1 END

             + CASE WHEN [ValidatedTmp].[u3] = [UserEnteredTmp].[u3] THEN 0 ELSE 1 END

             + CASE WHEN [ValidatedTmp].[u5] = [UserEnteredTmp].[u5] THEN 0 ELSE 1 END

             + CASE WHEN [ValidatedTmp].[u7] = [UserEnteredTmp].[u7] THEN 0 ELSE 1 END

             + CASE WHEN [ValidatedTmp].[u9] = [UserEnteredTmp].[u9] THEN 0 ELSE 1 END

             AS [Status]

            FROM   UserEnteredTmp INNER JOIN

                   ValidatedTmp ON [UserEnteredTmp].[evens] =

                   [ValidatedTmp].[evens]) AS idat

            WHERE  [Status] < 3

         ) AS idd

ORDER BY [vSSN], [ueSSN], [Status];

Listing 8: The even position matches

An interesting general point to note about the code is the use of UNION ALL instead of just UNION. Use of UNION ALL ensures that any duplicate rows remain in the results. In this case, no actual duplicates should exist; SQL Server just had to combine the results of the three queries and didn’t need to perform the work of finding duplicates and removing them. As such, one may be tempted to use UNION but If you take a look at the execution plans between otherwise identical UNION queries, you will see that the UNION query uses a Merge Join to pull the individual results together, while a UNION ALL uses a more-efficient Concatenation, and so allows Matt to squeeze out just a little bit more speed.

Table 4 shows the results when the three methods were applied to the sample data.

Solution

Results

Fu Bar code (cursor)

70 to 75 minutes

Aunt Kathi code (temp tables)

< 1 second

Winning code (CTE)

391 milliseconds

Table 4: Compare the methods

Conclusion

Once again, a set based CTE solution outperformed both cursor and temp table solutions. The keys to this puzzle were to break it down into the three components and to parse out the 9 digit SSNs into their individual digits. Here are a few more points to be learned:

  • Spend some time thinking about the problem before you start writing queries.
  • Use some kind of virtual table (CTE or derived table) to avoid using expressions in the WHERE clause.
  • Avoid unneeded processing. In this example, don’t parse the matching rows.
  • Use UNION ALL if possible instead of just UNION.
  • Use CTEs to avoid temp tables where possible.


This article has been viewed 3972 times.
Kathi Kellenberger

Author profile: Kathi Kellenberger

Kathi Kellenberger is a Data Platform Technology Specialist with Microsoft in St. Louis. Before joining Microsoft she was a SQL Server MVP and database administrator for Bryan Cave LLP, a law firm headquartered in St. Louis, MO. She is author of "Beginning T-SQL 2008" (Apress 2009) and co-author of "Professional SQL Server 2005 Integration Services" (Wrox 2006) and contributer to "SQL Server MVP Deep Dives" (Manning 2009). Kathi enjoys speaking and writing about SQL Server and teaches the occasional SQL Server class. When she is not busy working with SQL Server, you might find her singing at the local karaoke bar or climbing the stairs in tall buildings.

Search for other articles by Kathi Kellenberger

Rate this article:   Avg rating: from a total of 11 votes.


Poor

OK

Good

Great

Must read
 
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: INTERSECT/EXCEPT vs. IN or JOIN
Posted by: Tim Overlund (not signed in)
Posted on: Monday, August 09, 2010 at 12:54 PM
Message: Very nice analysis of the set-based solutions.

From a standpoint of code readability for set-based solutions, I prefer to use
SELECT [dbo].[UserEntered].[SSN] AS SSN
FROM [dbo].[UserEntered]
INTERSECT
SELECT [SSN]
FROM [dbo].[Validated]

and
SELECT [dbo].[UserEntered].[SSN] AS SSN
FROM [dbo].[UserEntered]
EXCEPT
SELECT [SSN]
FROM [dbo].[Validated]

I would assume that the actual query plans are essentially the same, but wonder if anyone else uses INTERSECT and EXCEPT as a replacement for IN or NOT IN with a subquery.

Subject: Let us pray that...
Posted by: Jeff Moden (view profile)
Posted on: Monday, August 09, 2010 at 5:00 PM
Message: ... everyone in the world insists on EXACT matches for SSN's. ;-)

Nice write-up again, Kathi! Hearty congrats to Matt for the great solution and to ALL the contestants.

I also have to say again, very well done Phil Factor. Not only are these reasonably "real world" problems but there are no artificial limitations placed in the requirements such as "must be done in a single query".

--Jeff Moden

Subject: INTERSECT/EXCEPT vs. IN or JOIN
Posted by: Tim Overlund (not signed in)
Posted on: Tuesday, August 10, 2010 at 5:28 PM
Message: Very nice analysis of the set-based solutions.

From a standpoint of code readability for set-based solutions, I prefer to use
SELECT [dbo].[UserEntered].[SSN] AS SSN
FROM [dbo].[UserEntered]
INTERSECT
SELECT [SSN]
FROM [dbo].[Validated]

and
SELECT [dbo].[UserEntered].[SSN] AS SSN
FROM [dbo].[UserEntered]
EXCEPT
SELECT [SSN]
FROM [dbo].[Validated]

I would assume that the actual query plans are essentially the same, but wonder if anyone else uses INTERSECT and EXCEPT as a replacement for IN or NOT IN with a subquery.

Subject: close match?
Posted by: hellobenchen (view profile)
Posted on: Monday, February 21, 2011 at 12:37 AM
Message: For a close match, if it just needs the two non-matching digits to be un-adjacent, we can still have one non-matching digit at an odd position (say pos 1) and another non-matching digit at an even position (say pos 6). Why we must have both the non-matching digits in odd positions, or both in even positions?

 










Phil Factor
Automated Script-generation with Powershell and SMO
 In the first of a series of articles on automating the process of building, modifying and copying SQL Server... Read more...



 View the blog
Using SQL Test Database Unit Testing with TeamCity Continuous Integration
 With database applications, the process of test and integration can be frustratingly slow because so... Read more...

SQL Source Control: The Development Story
 Often, there is a huge difference between software being easy to use, and easy to develop. When your... Read more...

How to Import Data from HTML pages
 It turns out that there are plenty of ways to get data into SQL Server from websites, whether the data... Read more...

SQL Scripts Manager: An Appreciation
 SQL Scripts Manager is Simple-Talk's present to its readers. William Brewer was an enthusiastic... Read more...

Hosted Team Foundation Server 2010 Review
 Team Foundation Server (TFS) has expanded its remit to support the whole software development process,... Read more...

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 Database design and implementation is the cornerstone of any data centric project (read 99.9% of... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Beginning SQL Server 2005 Reporting Services Part 2
 Continuing his in-depth tour of SQL Server 2005 Reporting Services, Steve Joubert demonstrates the most... Read more...

Creating CSV Files Using BCP and Stored Procedures
 Nigel Rivett demonstrates some core techniques for extracting SQL Server data into CSV files, focussing... Read more...

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.

Join Simple Talk