The Helper Table Workbench

Cursors and iterations are both renowned for slowing down Transact SQL code, but sometimes seem unavoidable. In this workbench, Robyn Page and Phil Factor demonstrate some set-based techniques for string manipulation and time interval-based reporting, which use helper tables rather than the dreaded cursor.

Sometimes, when writing TSQL code in functions or procedures, it is tempting to do iterations, or even worse, a cursor, when it isn’t really necessary. Cursors and iterations are both renowned for slowing down Transact SQL code. SQL Server just isn’t designed for it.

However, there is usually a way to do such operations in a set-based way. If you do so, then your routines will run a lot faster, with speed at least doubling. There are a lot of tricks to turning a problem that seems to require an iterative approach into a set-based operation, and we wish we could claim we’d invented one of them. Probably the most useful technique involves that apparently useless entity, the ‘helper’ table. This workshop will concentrate on this, because it is probably the most widely used.

The most common Helper table you’ll see is a table with nothing but the numbers in a sequence from 1 upwards. These tables have a surprising number of uses. Once you’ve understood the principles behind helper tables, then you’ll think of many more. We’ll be providing several examples where a helper table suddenly makes life easier. The objective is to show the principles so that you’ll try out something similar the next time you have to tackle a tricky operation in TSQL. As always, you’re encouraged to load the example script into Query Analyser or Management Studio, and experiment!

Our examples include:

Splitting Strings into table-rows, based on a specified delimiter
Encoding and decoding a string
Substituting values into a string
Extracting individual words from a string into a table
Extracting all the numbers in a string into a table
Removing all text between delimiters
Scrabble score
Moving averages
Getting the ‘Week beginning’ date in a table
Calculating the number of working days between dates.

Note. These examples use VARCHAR(8000) just so they compile on both SQL Server 2000 and 2005. If you are using SQL Server 2005, you’ll probably want to change them to VARCHAR(MAX).

Before we start, we’ll need a helper table of numbers. Our examples aren’t going to require high numbers, but we’ve parameterised the size of the table that the routine creates

Creating the helper table

Here is a routine that checks to see if such a ‘helper’ table called numbers exists, and, if not, creates it.

Once you have one of these tables, which we’ve seen described as the Transact SQL developer’s ‘Swiss Army Knife’, you will not want to be without it.

Splitting Strings into table-rows, based on a specified delimiter

Imagine you have a string which you want to break into words to make into a table.

This can be done very simply and quickly through the following single SQL Select statement:

(Make sure you have executed the spMaybeBuildNumberTable procedure first!)

You can then enshrine this principle into a table function that will take any delimiter to split a string. (We believe that the credit for this clever routine should go to Anith Sen).

This is the fastest means I have come across in TSQL to split a string into its components as rows. Try this, which give you a table with the integer and the nominal name.

Encoding and decoding a string

You can use the same principle for a lot of string operations.

Here is one that will URLencode a string so it can be used in a POST or GET HTTP operation. The string is converted into a table with one character per row and, after being operated on, it is re-assembled.

This sort of routine is a lot less complex than the iterative methods and is, as one would expect, a lot faster. Just to show that this is no fluke, here is the reverse function to decode a URL Query string.

Here is a simple function that acts as an int to hex converter.

It works this way:

And what about TSQL that converts hex to int?

And now we can test it out:

Substituting values into a string

Next, we have a function that uses these principles to do macro substitution of values into a string. It will work for replacing XHTML placeholders with a value, or producing error messages in a variety of languages. In fact, uses keep popping up for this sort of function. In this version, one can specify what strings are used for substitutions (the default is %1, %2, %3 etc,) and what you use as the delimiter of your list of macros and values.

There are several ways that we can use this routine in practical applications. Try out these and see what happens!

Extracting individual words from a string into a table

One can do rather cleverer things than this. For example, one can extract all the words from a string into a table, a row for each
word.

Extracting all the numbers in a string into a table

Even more useful than this is a function that picks out all the numbers from a string into a table. You can therefore easily pick out the third or fourth string simply, because the table has the order as well as the number itself. Were it not for the unary minus operator, this would have been a delightfully simple function.

If you are using this routine, you’ll want to cast these numbers into the number type of your choice. We supply them as strings.

Removing all text between delimiters

This is a handy little routine for looking at the strings in HTML code, but seems to earn its keep in a lot of other ways. You specify the opening and closing delimiter. At the moment, only single-character delimiters are allowed. Can anyone re-write it to allow multi-character delimiters?.

So we can try it out with brackets:

Or if you want to take out tags:

Scrabble Score

And as a slightly silly example of the sort of chore that crops up occasionally when analysing strings, character by character, here is a way of scoring a string for Scrabble, assuming it is all on ordinary squares!

And now we try it out!

Now we find out which are the highest scorers, assuming an illegal quantity of tiles. We use the WordList from Phil’s Blog on the Fireside Fun of Decapitations.

Don’t try running this without the WordList!

As well as slicing and dicing strings, once one has one’s helper table,
suddenly time-interval based reporting becomes much easier.

Moving averages

How about moving averages? Here is a simple select statement that gives you the moving average (over a week) of the number of log entries for every day for a year. This can be adapted to give weighted and exponential moving averages over arbitrary time periods. You use this technique for ironing out ‘noise’ from a graph in order to accentuate the underlying trend.

To execute this, you will need a table to try it on.

Now we can try out a moving average!

Getting the ‘Week beginning’ date in a table

Here is a UDF that lists all the Mondays (or whatever you want) between two dates.

Number of Working Days between two dates

Or, how about a UDF that tells you the number of working days between two dates? (you can alter it if Saturday and Sunday are not your days off!)

So how many working days until Christmas?

We can go on for ever with example of using a numeric Helper table but we won’t because most of you will have wandered off even before you reach this point. We hope that you’ll now take over and create some more examples, try them out against iterative solutions that do the same thing. We guarantee you’ll be pleased with the result!

See also other Workbenches at Simple-Talk

Robyn Page’s SQL Server DATE/TIME Workbench, Robyn Page
Date calculation and formatting in SQL Server can be surprisingly tricky. Robyn Page’s “hands-on” workbench will lead you through the minefield.

Robyn Page’s SQL Server String Manipulation Workbench, Robyn Page
String searching and manipulation in SQL Server can be error-prone and tedious… Unless you’re armed with the techniques described in Robyn’s string manipulation workbench…

SQL Server Error Handling Workbench, Grant Fritchey
Grant Fritchey steps into the workbench arena, with an example-fuelled examination of catching and gracefully handling errors in SQL 2000 and 2005, including worked examples of the new TRY..CATCH capabilities.

Robyn Page’s SQL Server Cursor Workbench, Robyn Page
The topic of cursors is the ultimate “hot potato” in the world of SQL Server. Everyone has a view on when they should and mainly should not be used. By example and testing Robyn Page proves that, when handled with care, cursors are not necessarily a “bad thing”.

Robyn Page’s SQL Server Data Validation Workbench, Robyn Page
Robyn Page provides essential techniques for ensuring the validity of the data being entered into your SQL Server tables.

Robyn Page’s Excel Workbench, Robyn Page and Phil Factor
The need to produce Excel reports from SQL Server is very common. Here, Robyn Page and Phil Factor present practical techniques for creating and manipulating Excel spreadsheets from SQL Server, using linked servers and T-SQL. The pièce de résistance is a stored procedure that uses OLE Automation…

Robyn Page’s SQL Server Security Workbench, Robyn Page and Phil Factor
Robyn Page and Phil Factor present practical T-SQL techniques for controlling access to sensitive information within the database, and preventing malicious SQL injection attacks.

Downloads

Tags: , , , , , , , , , , ,

  • 34105 views

  • Rate
    [Total: 42    Average: 4.5/5]
  • Anonymous

    Perfect timing!
    I’d used this technique for building calendars, but never thought of using it to split strings – I have to do this a lot from HTML-form posted data and this is going to make some of the SQL so much neater and faster/

    Thank you.

  • Robyn Page

    Re: Perfect Timing
    Yes, we got into this because all our existing string-parsing functions proved to be too slow dealing with the sort of XHTML Website work we were dealing with, and we did some timings……
    Phil and I are pleased that you find it useful.

  • Anonymous

    Perfect Timing
    Now if I can just find the reverse without using cursors then I’ll be even happier!

  • Phil Factor

    Re Perfect timing

  • Anonymous

    Great Stuff
    This is really interesting stuff that will help make some reporting tasks I’m about to delve into a lot easier/cleaner. Thanks so much!

    FYI, found a typo here:

    sellect SUBSTRING(@ValueList+@Delimiter, number,

    (should be “select”)

    Thanks again!

  • Anonymous

    Create numbers table without loop
    I am a huge fan of helper tables. I have learned some new ways to use a numbers table so thank you for that.

    However, what if you didn’t use a loop to create the numbers table. This is the method I use. You need a source of records and if you don’t have enough just cross join until you do.

    Like this..

    insert Numbers
    select top 10000 identity(int, 0, 1)
    into Numbers
    from master.information_schema.columns
    cross join master.information_schema.columns

  • Anonymous

    Trivia / Humour
    Interesting article, i would say very authentic also, especially coming from ‘page’ and ‘Fill factor’

  • Phil Factor

    Re: Trivia/Humour
    Like the old card game ‘Happy Families’, eh? It is absolutely true that there were two eminent Psychiatrists once, called Dr Batty and Dr Nutter. I admit that Phil Factor is a pseudonym, but ‘Page’ is coincidence.

  • Anonymous

    Set-based Numbers Table
    Use a set-based solution to create your Numbers table on SQL 2000 or SQL 2005:

    SELECT TOP 10000 Num = IDENTITY(INT, 1, 1)
    INTO Numbers
    FROM INFORMATION_SCHEMA.COLUMNS a
    CROSS JOIN INFORMATION_SCHEMA.COLUMNS b

    ALTER TABLE Numbers
    ADD CONSTRAINT PK_Numbers
    PRIMARY KEY CLUSTERED (Num)

  • GSquared

    Working Days Range
    On the working days range, I also have a table of holidays in my database. With a “Where not in …” statement run against that table, you can get even more accuracy.

    I originally tried having a function that would calculate holidays based on the rules for them (2nd Monday in month, etc.), but found the performance on it was far too poor. Would have been easier to maintain, and I only have 100 years worth of holidays in the table, but for performance reasons, I had to precalculate and store them in a table.

    Of course, the other advantage to a table is non-rule-based dates can be used. Rare, but they have come up.

  • aaa

    String Splitting
    A still faster way is to use an inline table-valued function:

    alter function rep.fnStringSplit
    (
    @values varchar(max)
    )
    returns table
    as
    return
    (
    select
    Value = substring(@values, N + 1, charindex(‘,’, @values, N + 1) – N – 1)
    from Numbers
    where N < len(@values)
    and substring(@values, N, 1) = ‘,’
    and len(substring(@values, N + 1, charindex(‘,’, @values, N + 1) – N – 1)) > 0
    )
    /*
    go
    select * from rep.fnStringSplit(‘1,20,30a,,4000,’)
    */
    go
    select * from rep.fnStringSplit(‘1,20,30a,,4000,’)

  • aaa

    String Splitting Alternative
    select *
    from xxx
    where charindex(‘,’ + cast(id as varchar) + ‘,’, ‘,1,20,30,,4000,’) > 0

  • Anonymous

    Questions About Routine For Parsing Numbers
    First, let me say that the article was eye-opening and conceptually fascinating. I already have a stored procedure that populates a temporary table by parsing a string into integer values (primarily for use when the string is a delimited list of keys), but my code used a while loop to look at each character individually so I was anxious to test drive a new stored procedure based on you example code.

    Question 1: Why, in the WHERE clause, do you do a REPLACE inside the LEN parameter (i.e. “LEN(REPLACE(@String,’ ‘,’!’))” )? Why not just LEN(@String) ?

    Question 2: Why, in the SUBSTRING function in the SELECT clause, do you concatenate a space to the beginning of @String (i.e. “RIGHT(‘ ‘+@String,number-1)” ) when number – 1 will never be greater than the length of string and you will never reach that space? Am I missing something obvious?

  • Robyn Page

    re: Questions About Routine For Parsing Numbers
    Answer 1: The LEN(REPLACE(@String,’ ‘,’!’)) was to get around the problem of trailing spaces not being counted in the length of the string when you use the LEN() function. It is not always necessary, but we tend to do it out of habit. In this case I think it probably is necessary, but please by all means prove me wrong. Having been caught out in the past I like to be sure of getting the real length of any string.

    Answer 2: This routine gave quite a struggle because of the problem of retaining the unary minus, (e.g. -243) and we may have left in a bit of code that is no longer necessary. Oops!

  • Anonymous

    Great Article
    I used one of the examples today!

  • Anonymous

    Great Article
    Thanks,
    but there are some problems with running total
    1) the very first day is out of scope
    to include we need
    [start]=DATEADD(DAY,number-1,’1 Jan 2006′),
    the same with end date
    2) it includes the next day with 00:00:00
    to NOT include we have to do for example like this
    [end]=DATEADD(ms, -3,DATEADD(DAY,number+7 – 1,’1 Jan 2006′) )

  • Anonymous

    multi char delimiter

    SET NOCOUNT ON
    declare @String VARCHAR(8000)
    declare @OpeningDelimiter VARCHAR(10)
    declare @ClosingDelimiter VARCHAR(10)

    SET @String='<<?xml version=”1.0″ encoding=”us-ascii”?>>
    <<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
    “http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”>>
    <<html xmlns=”http://www.w3.org/1999/xhtml”>>
    <<head>>
    <<title>>A poem<</title>>
    <</head>>
    <<body>>
    <<div style=”float left: width:300px;”>>
    <<h2>>
    Weather forecast
    <</h2>>
    <<p>>
    The rain it raineth every day<<br />>
    upon the just and unjust fellah<<br />>
    but mainly on the just, because,<<br />>
    the unjust pinched the just”s umbrella<<br />>
    <</p>>
    <</div>>
    <</body>>
    <</html>>

    SELECT @OpeningDelimiter='<<‘, @ClosingDelimiter=’>>’
    DECLARE @numbers table (number int)
    DECLARE @Flag bit, @Rez varchar(1000), @poz int, @ii INT
    SELECT @Flag=1, @Rez=”, @poz=1

    SELECT @ii=1 WHILE (@ii<=1000) BEGIN INSERT INTO @NUMBERS(NUMBER) SELECT @II SELECT @II=@II+1 END
    SELECT @Rez=@Rez+
    CASE
    WHEN substring(@String,Number,LEN(@OpeningDelimiter))=@OpeningDelimiter and @Flag=0 then SUBSTRING(@String,@poz,Number-@poz)
    WHEN substring(@String,Number,LEN(@ClosingDelimiter))=@ClosingDelimiter and @Flag=1 then ”
    WHEN @Flag=1 THEN SUBSTRING(@String,@poz,Number-@poz)
    ELSE ”
    END,
    @poz=CASE
    WHEN substring(@String,Number,LEN(@OpeningDelimiter))=@OpeningDelimiter and @Flag=0 then Number
    WHEN substring(@String,Number,LEN(@ClosingDelimiter))=@ClosingDelimiter and @Flag=1 then Number+LEN(@ClosingDelimiter)
    ELSE CASE @Flag WHEN 1 THEN Number ELSE Number+LEN(@ClosingDelimiter) END
    END,
    @Flag=CASE
    WHEN substring(@String,Number,LEN(@OpeningDelimiter))=@OpeningDelimiter and @Flag=0 then @Flag
    WHEN substring(@String,Number,LEN(@ClosingDelimiter))=@ClosingDelimiter and @Flag=1 then @Flag
    ELSE CASE @Flag WHEN 1 THEN 0 ELSE 1 END
    END
    FROM @Numbers WHERE
    substring(@String,Number,LEN(@OpeningDelimiter))=@OpeningDelimiter OR
    substring(@String,Number,LEN(@ClosingDelimiter))=@ClosingDelimiter
    SET @Rez=@Rez+SUBSTRING(@String,@poz,LEN (REPLACE(@string,’ ‘,’|’)))
    PRINT @Rez

    DECLARE @newString VARCHAR(8000)
    IF @OpeningDelimiter = @ClosingDelimiter
    BEGIN PRINT ‘NULL’ END
    IF @OpeningDelimiter+@ClosingDelimiter+@String IS NULL
    BEGIN PRINT @String END
    SELECT @NewString=”
    SELECT @newString =@newString +SUBSTRING(@String,number,1)
    FROM @numbers
    WHERE number<=LEN (REPLACE(@string,’ ‘,’|’))
    AND CHARINDEX (@OpeningDelimiter,@string+@OpeningDelimiter,number) <
    CHARINDEX (@ClosingDelimiter,@string+’ ‘+@closingDelimiter,number)
    AND number <> CHARINDEX (@OpeningDelimiter,@string,number)
    PRINT @NewString

  • Robyn Page

    Re: Multi-char delimiter
    That is really cool. I loved the flag trick! I’ve altered it slightly to make it a direct replacement for the single-character version. If you can please identify yourself to the editor, a Simple-Talk goodie bag will be yours as a prize! (worth it just for the beer-mat)

  • Anonymous

    Re:Re: Multi-char delimiter
    “If you can please identify yourself to the editor, a Simple-Talk goodie bag will be yours as a prize! (worth it just for the beer-mat)”

    vasc/sqlservercentral.com

    if you can ship to canada 🙂 I could identify myself to editor (real name /adress) a beer-mat would be really nice.

  • Tony Davis

    re: prize
    Hi there,

    We can ship to Canada. If you drop me a mail at editor@simple-talk.com with a mailing address then I can get the swag bag out to you (polo shirt, pen, keyring, beer mat, USB key).

    Cheers,

    Tony (Simple-Talk Ed.)

  • Phil Factor

    Swag
    The Red-Gate USB key is really cute. It flashes red whenever it is accessed. You will be the envy of your friends. The keyring has a beer-bottle opener.

  • user4301

    splitting strings
    Looks really fast since I’m ‘having’to use a cursor at the moment.
    1 Does anyone have the neat addon of also select the ‘patindex’ of the first character of each of the words (so f of first is 1 and s of second is 7).
    2 Does anyone have a resetting counter so that the returned result is “first,1,1″,”second,7,2”
    the purpose for this is to STUFF a leading zero in single digit months and days of dates that can be anywhere in a text line.
    3 “A line” is up to 1200 char long and typical days import is 0.75Mrows to 1.0Mrows
    Very many thanks

  • WebMister

    Re: Splitting strings
    The problem of using PATINDEX is that you are not told the end of the match, as you would in a true regular expression. However, if you are clear about the format of the date you could do it cursorless with patindex. You’d have to do some timings to make sure that this would be quicker than using a WHILE loop.

  • Jeff Moden

    Just a format test… please ignore…
    <code style=”font-size: 12px;”><span style=”color:green”>–===== Create and populate the Tally table on the fly<br> </span><span style=”color:blue”>; SELECT TOP </span><span style=”color:black”>11000 </span><span style=”color:green”>–equates to more than 30 years of dates<br>        </span><span style=”color:#434343″>IDENTITY</span><span style=”color:gray”>(</span><span style=”color:blue”>INT</span><span style=”color:gray”>,</span><span style=”color:black”>1</span><span style=”color:gray”>,</span><span style=”color:black”>1</span><span style=”color:gray”>) </span><span style=”color:blue”>AS </span><span style=”color:black”>N<br>   </span><span style=”color:blue”>INTO </span><span style=”color:black”>dbo.Tally<br>   </span><span style=”color:blue”>FROM Master</span><span style=”color:black”>.dbo.SysColumns sc1</span><span style=”color:gray”>,<br>        </span><span style=”color:blue”>Master</span><span style=”color:black”>.dbo.SysColumns sc2<br><br></span><span style=”color:green”>–===== Add a Primary Key to maximize performance<br>  </span><span style=”color:blue”>ALTER TABLE </span><span style=”color:black”>dbo.Tally<br>    </span><span style=”color:blue”>ADD CONSTRAINT </span><span style=”color:black”>PK_Tally_N <br>        </span><span style=”color:blue”>PRIMARY KEY CLUSTERED </span><span style=”color:gray”>(</span><span style=”color:black”>N</span><span style=”color:gray”>) </span><span style=”color:blue”>WITH FILLFACTOR = </span><span style=”color:black”>100<br><br></span><span style=”color:green”>–===== Allow the general public to use it<br>  </span><span style=”color:blue”>GRANT SELECT ON </span><span style=”color:black”>dbo.Tally </span><span style=”color:blue”>TO PUBLIC<br><br><br><br></span></code>

  • Jeff Moden

    How to format, please.
    Robyn and Phil,

    What is the best way to post the nicely formatted code similar to what Robyn posted a little more than a half dozen posts up from this one?

    Thanks folks.
    –Jeff Moden

  • Kevin

    uftSplitString

    Cool stuff.

    However… on the parameter split if parameter list happens to be exactually 8000 characters the substring in the select will fail. For example:

    DECLARE @Parameter VARCHAR(8000)
    SET @Parameter = REPLICATE(‘1,2,3,4,5,6,7,8,9,10’,1000)

    will produce @Parameter with a length of 8000 and cause the error. One can get around this buy doing something like checking the length and if it was 8000 use a select with a case statement looking for zero to be returned by charindex.

    K

  • Anonymous

    RE: Multi-char delimiter
    Although your article has been written some time ago, i only have discovered it recently. It is indeed of great help and running through most of your examples i have been able to understand this technique and discover lots of situations where i could have used it. Many thanks!
    Regarding the multi char delimiter. I have tried to implement my own function, a bit easier to follow and understand. It seems to me that it works, but perhaps I haven’t tested all scenarios 🙂

    ALTER FUNCTION [dbo].[ufsRemoveDelimited]
    (
    @String VARCHAR(8000)
    ,@OpeningDelimiter VARCHAR(20)
    ,@ClosingDelimiter VARCHAR(20)
    )
    RETURNS VARCHAR(8000)
    AS
    BEGIN
    DECLARE
    @NewString VARCHAR(MAX)
    ,@Poz INT;

    SELECT
    @NewString = ”
    ,@Poz = 0;

    SELECT
    @Poz = CASE WHEN SUBSTRING(@String, Number, LEN(@OpeningDelimiter)) = @OpeningDelimiter
    THEN CHARINDEX(@ClosingDelimiter, @String, Number) + LEN(@ClosingDelimiter) – 1
    ELSE @Poz
    END
    ,@NewString = @NewString + CASE WHEN Number > @Poz
    THEN SUBSTRING(@String, Number, 1)
    ELSE ”
    END
    FROM
    Numbers
    WHERE
    Number <= LEN(@String);

    RETURN @NewString;
    END

    PRINT dbo.ufsRemoveDelimited(‘<-head–>
    <-title–>A poem<-title–>
    <-head–>
    <-body–>
    <-h2–>Weather forecast<-h2–>
    <-p–>
    The rain it raineth every day<-br–>
    upon the just and unjust fellah<-Br–>
    but mainly on the just, because,<-br–>
    the unjust pinched the just”s umbrella<-br–>
    <-p–>
    <-body–>
    <-html–>BLA’

    ,'<-‘
    ,’–>’)