Creative Solutions by Using a Number Table

One irritating thing for the occasional SQL Server Developer is that the experts assume that you are familiar with the use of number tables when they write about techniques. We therefore asked Greg Larsen to spell it out in a way that would help the beginner and intrigue those already familar with the technique.

The SQL Server database engine does some things very well, but it is not optimized for iterative coding, such as stepping through a recordset one row at time: Therefore, it is always best to think of ways to perform a series of set-based operations to arrive at your resultset. In this article, I am going to explore the use of a number table, one that contains nothing more than a sequential set of numbers, to help resolve a couple of common business situations.

The Number Table

Every database should have a number table. A number table is just that; a table that contains a single column whose value is a unique number. The table contains a series of rows where the numbers start at 0, or 1, and increments by 1 to some define limit such as 1,000 or 10,000 depending how you plan to use the table. The range of numbers that you will need in your number table depends on what problem you are trying to solve.

There are a number of different methods to generate a table of numbers. Below is just one example that uses a CTE to populate a table named “Number” with 10,000 different numbers, where the first number starts at 1. I cannot take credit for developing this solution for generating a Number table. I originally got this code from a post by Itzik Ben-Gan.

This particular example can be use to populate a table that has number values from 1 to 4,924,967,296. All you need to do is change the “SELECT TOP” value in the INSERT statement to control the maximum count of numbers you want to generate. I think creating a 4 billion-row numbers table would be more than any one might need, but then again maybe not. Therefore, if you needed to create one with more the 5 billion rows just add a “L6” that does a CROSS JOIN to “L5”.

Now that we have a simple method to create a number table, let me go through a couple of examples that use this method to populate a number table, then use the table to quickly resolve a business problem using a set based solution.

Finding the Missing Dates

This example deals with how to find the missing dates in a set of records using a Number table. To provide some context to this example let us assume this is your business problem:

You are running a room rental business, known as “Rooms for Rent”. Each room is rented based on a specific date. Your database has a table named “RoomRent” that contains individual records for each day a room is rented. This table contains the name of the room and the date the room is rented. A new customer is trying to rent your “QuarterMaster” room and wants to know which dates are available, between 3/14/2011 and 3/18/2011. One last requirement is that you have is your can only make room rental reservations up to a year out (365 days).

To support this scenario let me first populate our RoomRent table and our Number table for this example. The RoomRent table will contain three different room rentals during the above-mentioned timeframe, two for the “QuarterMaster” room (3/15/2011, and 3/14/2011), and one for a room named “Commencement” (3/14/2011).

In order to solve this problem I will need a number table. The number table will make it easy to find days when the “Quartermaster” room is not rented. Your business requirement will dictate how many numbers you will need in your number table. Since rooms are only rented 365 days in advance, I will only need to populate my number table with 365 different number values. Here is the code to populate both the RoomRent table and the Number table:

Now that we have our tables to support our example, let me jump in and show you how to find the available rental dates for the “Quartermaster” room between 3/14/2010 and 3/18/2010. Here is the code to do that:

Now let me walk through this code.

To determine the available reservations dates, I needed to determine the number of days between the starting and ending period of the reservation request. In my scenario, the begindate is 3/14/2010 the enddate is 3/18/2010. To help to drive my query above I defined two local variables (@BeginDate and @EndDate) and set them to these dates. I then used these variables to select records from my Number table where the number of records returned was the number of dates between the @BeginDate and @EndDate, including the begin and ending dates in that calculation. In this example, that is 5 days, and the number values for the records returned start from 1 and go to 5. Here is the snippet of code from the query above that returned those Number table records:

Here you can see that I used the value of the N column (1, 2, 3, 4 or 5) in the Number table to calculate all the possible reservation dates, which I call CheckDate. I did this by calling the DATEADD function and using the N value from the Number table and the @BeginDate variable. To select only the first sequential five numbers from the Number table I used a WHERE clause. In that WHERE clause, I calculated the number of dates between @BeginDate and @EndDate and then selected a Number table records only if N was less than or equal to the number of dates between these dates. Which I already mentioned in this example is 5 days.

Now that I have all the possible reservation dates, it was a simple matter of performing a RIGHT OUTER JOIN with the RoomRent records. Anytime this join operation produced a NULL value for the “ReservationDate“, this was a day when the “QuarterMaster” room was not rented. Here are the results of the complete query above:

By using a number table to generate a few dates for the reservation date range in this scenario it became quite an easy task to identify the available dates when the “QuarterMaster” room would be available. A Number table can very useful in lots of different date range calculation. Let me move on and explore another situation where a number table helps to provide a simple solution.

Splitting a String based on Delimiter

If you have been writing T-SQL code for a while, you probably have come across a time when you need to take a string and split it apart into two different strings or a series of strings. In this example, I will be taking a string of people’s names and split them apart based on a delimiter character, a semicolon (“;”). Once again, let me put this example in context of the following business problem:

You have an event system that records events. For each event, your system records the individuals affected by the event. Each event might have one, two, or three individuals affected by an event. The individuals affected by an event are stored, unfortunately, in a single denormalised column within an event record, separated by a semicolon. Without being able to change the existing design of the database, you need to write a SELECT statement to display the event information where each individual is displayed on a separate row along with the event information.

This is extremely easy to do by using a Number table. The number values will be used to identify the positions of the semicolons. In order to set up this example the following script will be used to create three different event records. The first event record has 3 individual associated with it, the second only two individuals and the last event only has a single individual attached to it:

In addition to creating my event records, I also need to create a number table to support my business case. My Number table will be used to find semicolons. Since my code will need to use the Number table to look at each character in the “ImpactedIndividuals” column I need to have enough numbers to support the size of this column. Since this column is 150 characters in size, I will be generating a Numbers table that holds 200 different numbers, in reality I could have gotten by with a with only 151 numbers. Here is the code to generate my Number table:

To split apart the “ImpactedIndividuals” column, the Number table will be CROSS JOIN‘d with my Event records. I will only return Number records where the number value represents the column offset of the semicolon in the “ImpactedIndividuals” column of the Event record. Rather than show you the final code for this example, I will show you fragments of code that I will build upon to create the final example that will meet our requirement, as identified in the business case above.

This first code block will show you how to associate a number with the beginning position of each name in the “ImpactedIndividuals” column:

When I run this code, I get the following output:

The first character of each name was identify by performing a CROSS JOIN between my Event and Number table then using a WHERE constraint to only return those rows where “N” identified the positional offset of a semicolon. Note that I had to add a semi-column to the beginning of the “ImpactedIndividuals” column, so I could identify the character offset of the first name in this column, which will be 1 of course. This also allowed my code to identify the offset of each subsequent name as well. Keep in mind the value identified is the true column offset values for the semi-colon, without the concatenated semi-colon at the beginning.

Let me dig into this code a little closer so you can understand what is going on. First, the CROSS JOIN operation associated all 200 numbers, in my Number table, with each row in my Event table. Here is a sample of the first 15 rows returned from that CROSS JOIN operation:

If you looked at all the records, you would see that all 200 numbers would be associated with each one of my rows in my Event table. Thus, causing the result set of this CROSS JOIN operation to have 600 rows. Remember now, I only need to identify the starting offset of each name. Therefore in my query above I used a WHERE constraint to only return rows where the character offset identify by N (the value from the Number table) is a semicolon.  This constraint was performed using the following SUBSTRING function:

Having this WHERE constraint allowed me to create a record set that had one row for each name, and the value of N, which identified the starting offset of the name.

Now by modifying the above query slightly and adding a CHARINDEX function I can identify the ending offset of each name. Here is my slightly modified query:

In this query, I have identified the starting offset of each name by using the column alias of “BeginName” for the value “N”. To identify the ending offset of the name, I used the CHARINDEX function. In that CHARINDEX function I used N to identify where to start searching for the semi-colon. The position of “N” in the “ImpactedIndividuals” column just happens to be to first character of the next name immediately following the semi-colon. Since the last name in the “ImpactedIndividuals” column does not have a semicolon after it, I had to concatenate a semicolon to the end of this column value. This allowed me to find the ending offset of the last name. When this CHARINDEX function executes it will return the location of the semicolon after each name. Therefore, I subtract 1 from this value to identify the actual offset for the ending of the name, and then assign a column alias name of “EndName” to identify this value. Now we have both the beginning and ending offset of the impacted individuals.

I can now use the code above within my final query to meet our business requirement. This requirement was to return event information where each individual was identified on a separate row. Of course I had to modify the code above slightly so I could return the EventDate and EventComment columns. Here is my final query, which meets my business requirement:

Here is the output from this query:

By reviewing this query, you can see that I used the prior query as a table alias named “A“. I also used the “BeginName” and “EndName” columns within a SUBSTRING function to parse apart the “ImpactedIndividuals” column value to identify each impacted individual.

Using a number table made it quite simple to parse apart the “ImpactedIndividuals” column based on the presence of a semicolon. Without knowing this technique, it would be easy to fall into trying to use procedural logic to loop through the “ImpactedIndivdual” one character and records one at a time using a cursor.

Note: in SQL Server 2016, there is a built-in function to do this.

The Value of a Number Table

There are many creative uses for a Number table. I have demonstrated just two of those many uses. Using a number table can eliminate the complicated looping mechanisms needed to support your application logic with a set based solution. My simple examples show the value that a number table can bring to your arsenal of T-SQL tricks. Next time you are puzzled with how to identify, count and/or parse a set of records, see if using a number table provides you a more elegant solution.

Tags: , , , ,

  • 29099 views

  • Rate
    [Total: 2    Average: 4/5]
  • Anonymous

    Creative
    Good informative article with creative solution

  • mcflyamorim

    Good Article
    Hi,
    First of all congrats, the Number table are very very useful and sometimes people don´t understand why… you explain with useful samples and real world scenarios… great text.

    In the example about semicolon transformation, I think your solution is to much expensive to use in a big table, the cross join with the number table kill SQL Server performance. If you permit-me, I would like to show another solution I wrote…
    To see the difference let´s grow the table Events… the following code will insert 19683 rows in the table.

    INSERT INTO Event (ImpactedIndividuals, EventDate, EventComment)
    SELECT Event.ImpactedIndividuals, Event.EventDate, Event.EventComment FROM Event
    CROSS JOIN Event Tab1 CROSS JOIN Event Tab2 CROSS JOIN Event Tab3 CROSS JOIN Event Tab4
    CROSS JOIN Event Tab5 CROSS JOIN Event Tab6 CROSS JOIN Event Tab7 CROSS JOIN Event Tab8

    My approach is all about XML. I change the semicolon column to XML column and then I use the functions nodes and value to read the data…

    SELECT ID AS EventID,
    Tab1.ColXML.value(‘@Ind’, ‘VarChar(80)’) AS ImpactedIndividual,
    EventDate,
    EventComment
    FROM (SELECT *,
    CONVERT(XML, ‘<Test Ind=”‘ + Replace(ImpactedIndividuals, ‘;’,
    ‘”/><Test Ind=”‘) + ‘”/>’) AS ColXML
    FROM Event) AS Tab
    CROSS APPLY Tab.ColXML.nodes(‘/Test’) As Tab1 (ColXML)

    The query above takes 3 seconds to run and makes 189 page reads. 🙂
    Your query takes 41 seconds and makes 4029246 page reads.

    Again, thanks for the article, and keep on with the T-SQL tips, I enjoy read your articles.
    []s
    Fabiano Amorim

  • Jeff Moden

    Fabiano,

    How many rows did your Numbers table actually have in it?

  • Jeff Moden

    Fabiano,

    How many rows did your Numbers table actually have in it?

  • Jeff Moden

    Trust me that the problem ISN’T the Numbers table
    Like the title says…

    SELECT e.ID AS EventID,
    SUBSTRING(e.ImpactedIndividuals, t.N, CHARINDEX(‘;’, e.ImpactedIndividuals + ‘;’, t.N) – t.N) AS ImpactedIndividual,
    e.EventDate,
    e.EventComment
    FROM dbo.Numbers t
    CROSS JOIN dbo.Event e
    WHERE N < LEN(e.ImpactedIndividuals) + 1
    AND SUBSTRING(‘;’ + e.ImpactedIndividuals, t.N, 1) = ‘;’

    That also beats the tar out of XML solutions. 😉 Sometimes reads mean something… sometimes they don’t.

    This is one place where a CLR will actually do better but the Numbers/Tally table will be second best for 8k strings.

  • Jeff Moden

    Missing Clustered Index
    Greg,

    Most numbers tables (Itzek’s cascading CTE’s being the exception) need a good clustered index on the N column. I usually go the full route with any physical Tally tables I make. I make the N column NOT NULL and I make it the Clustered PK. Saves a lot of reads when you need to sort the output by N, as well.

    The biggest performance problem with your splitter solution is that you’re relying on only 200 rows in the Numbers table. You really need to add a length limiter for N like I did in my example a couple of posts up because most people have much larger Numbers/Tally tables and if you don’t limit the lookup of N in the WHERE clause, then things are going to come to a crawl as you read through a large number of values of N.

  • mcflyamorim

    Jeff
    Hi Jeff you are right, If I change the Number table to only 150 rows than the Greg’s solution seems to became faster than XML, but the number of page reads is still higher… But I know that is not always a problem.
    Well lesson learned, keep the number tables with the properly number of rows. Or limit with a where clause, using a N as a cluster key.
    Great debate :-);
    I still like the XML solution, but now I’ll also test the Greg’s solution :-)…
    Tks

  • Greg Larsen

    Here are some additional thoughts to ponder:
    • This article was not about what was the best way to parse a string into pieces based on semicolon delimiter. However, I do like all the stimulating discussion around different solutions. This article topic was about using a number or tally table to develop set based solutions to solve a problem. Although, XXXX did recommend an improvement to the code I provided by implementing a condition in the WHERE clause to limit the number of Number records that will be used in the CROSS JOIN operation. My article did mention I only needed 151 numbers for my example, but I created a number table that was larger than my business requirements (200 rows).

    • As you all pointed out, there are many different options for parsing apart a string based on a semicolon. Here are some of those different options: CLR, XML, Number table, and an iterative WHILE loop.

    • The best option for your environment really depended on a number of different factors, like business requirements, hardware, size if string being parsed, number of records being parsed, etc. You should test each option that meets your business requirements, data being parsed and environment, to determine which one these different methods performs best for your situation.

    Keep the comments coming! Most learn by examples, and new examples and thoughts make us write better code and better T-SQL programmers.

  • Greg Larsen

    Prior comment
    I’m sorry I about giving Jeff Moden an alias in prior post. Please replace “XXXX” with “Jeff Moden” in my prior comment.

  • HolgerSchmeling

    Credit to whom credit is due
    Greg,
    nice article and creative solutions – but did *you* develop all this? I’ve disovered, you aggregated some of the earlier publications from Itzik Ben Gan here. Don’t you think, It’d be fair to mention your sources?

  • Greg Larsen

    Credit to whom credit is due
    HolgerSchmeling,

    Thank you for the post, and you are quite right credit should be provided, and I thought I did that.

    At the beginning of the article I gave Itzik Ben Gan credit. But of course now that I read what I wrote, I can see how my credit to Itzik might not have covered all the techniques I used in my examples.

    Thank you for making sure Itzik gets the credit he deserves.

  • NikHughes

    Addition to Number Table
    Number tables can be expanded and more uses found. I’ve got an addition that I’ve found useful in my Number table. I create a VARCHAR(10) column called [Ordinal] and then post process to populate it:

    UPDATE Number SET Ordinal = CASE
    WHEN RIGHT(CONVERT(VARCHAR,N),2) = ’11’ THEN CONVERT(VARCHAR,N) + ‘th’
    WHEN RIGHT(CONVERT(VARCHAR,N),1) = ‘1’ THEN CONVERT(VARCHAR,N) + ‘st’
    WHEN RIGHT(CONVERT(VARCHAR,N),2) = ’12’ THEN CONVERT(VARCHAR,N) + ‘th’
    WHEN RIGHT(CONVERT(VARCHAR,N),1) = ‘2’ THEN CONVERT(VARCHAR,N) + ‘nd’
    WHEN RIGHT(CONVERT(VARCHAR,N),2) = ’13’ THEN CONVERT(VARCHAR,N) + ‘th’
    WHEN RIGHT(CONVERT(VARCHAR,N),1) = ‘3’ THEN CONVERT(VARCHAR,N) + ‘rd’
    ELSE CONVERT(VARCHAR,N) + ‘th’
    END