Lists With, or Without, Ranges in both T-SQL and PowerShell

Whether you are working in a procedural language like PowerShell or in T-SQL, there is something slightly bothersome about having to deal with parameters that are lists, or worse with ranges amongst the values. In fact, once you have a way of dealing with them, they can be convenient, especially when bridging the gulf between application and the database. Phil Factor shows how to deal with them.

 In this article, we’ll look at how one would handle lists of numbers with ranges in both SQL Server and PowerShell, converting lists both ways, and showing incidentally how to read and write them.  

 When a  list like ‘1,3,5,6,9’, or ’12 Jan 2016,14 Jan 2016, 18 Feb 2016’  contains a datatype that can be unambiguously sorted in the order of the values of the datatype, it becomes possible to imply a range. This will trim unwieldy lists significantly if they have a lot of contiguous values.  ‘1,2,3,4,5’ can be expressed as 1-5 (or 1..5). The starting integer is separated from the end integer in the range by a dash sign. This representation rather clashes with the minus sign, represented by the same symbol, but the comma removes the ambiguity. A number followed immediately by a ‘-‘ means that the ‘-‘is a range symbol.  As with SQL’s BETWEEN clause that selects rows, the range includes all the integers in the interval including both endpoints. Because the range syntax is intended to be a more compact form,  it is generally only used where there are at least three contiguous values.

A list of integers like this …

-6, -3, -2, -1, 0, 1, 3, 4, 5, 7, 8, 9, 10, 11, 14, 15, 17, 18, 19, 20

Is synonymous with the range expression

-6,-3-1,3-5,7-11,14,15,17-20

When you convert from one format to another, it is usual to remove any duplicates and to correct the order.  Lists are essentially ordered. If this ordering has meaning, (such as the order of components in which a part is assembled) the so-called ‘document order’, then ranges aren’t really possible. Lists with ranges are generally sorted in order of ascending value.

 In databases, we don’t generally need to worry about interpreting lists of numbers that are stored as datatypes within tables because, if we can safely regard them as ‘atomic’, we aren’t interested in what is inside them, and if they aren’t, they are actually lists of keys, and we store them differently, in tables as separate rows. However, it is quite possible to get them as parameters to a routine, (function or procedure) even though we have table-valued parameters nowadays. When you have a run of integers with many gaps and islands, it is handy to represent them as lists with ranges because they are much more compact to store.

PowerShell and lists

In PowerShell, of course, it is trivial to read a comma-delimited list of integers. Such a list is valid PowerShell. All we need to do is check that the input is valid and execute it. In this example, we square every value in the list

Processing a simple list

Processing a ranged list

If, instead, we are dealing with ranges as well as integers in the same list, the PowerShell is scarcely more difficult. PowerShell understands them already, but using a different syntax. We can very simply modify our previous routine to use ranges. The rather opaque RegEx expression here is merely checking that the list is a valid list and not an attempt to execute dodgy code.

This will work. (The most labour involved was in changing the regex to ensure that what is being executed has correct syntax)

Note that the expanded ranges are passed as arrays, which is why the second foreach-object  is needed in the pipeline

Converting a PowerShell ranged list to a simple list

If you just wanted to convert a ranged list that uses the PowerShell syntax  into a simple list, then this would do

Converting from a list with ranges to a simple list in PowerShell

So to expand a string list, all we need to do is to convert the conventional range syntax, ‘1-100’, to the PowerShell syntax of ‘(1..100)’ with the complication of the possible minus sign.

Again, the –cmatch operation looks a bit complicated but it is just a precaution you’d want before executing a string.

Converting from a simple list to a list with ranges in PowerShell

To convert from a list of numbers to a range, the reverse of what we’ve achieved, we use a different technique entirely, exploiting a pipeline. This allows us to do two of the most important tasks, taking out duplicates and ensuring that the list is correctly ordered.  Having done that, we can examine each number and use a state machine to determine what action you take. It would be considerably easier without having to make a range only with three or more numbers in a sequence. This could be more condensed but might then be tricky to understand.

 

SQL Server

Converting  a simple list to a table

In SQL Server, a fast way of reading a simple list into a table is to do this…

The only sort of list that isn’t entirely foreign to SQL Server is the XML list, so we have converted a simple comma-delimited list into an XML list and thence into SQL Server, ending up with a table variable containing all the integers in a column. Because this is the primary key, we can guarantee that they are unique.

Converting from a simple list to a list with ranges in T-SQL in SQL Server

With this technique of creating a table we can now convert a list of integers to  the list with ranges. One can do this with window functions but it is complicated by the condition that a pair of numbers in sequence don’t merit a range, only three or more. I’ve chosen first  a quirky update, as it was faster.

If you don’t like the ‘Quirky update’ technique, here is the more conventional approach

So now we are half way to success. All we have to do now is to unwrap a list containing ranges!

Converting a list with ranges to  a simple list to in T-SQL (SQL Server)

There is a simple iterative approach to doing this that is fine with small ranges, but it is likely to run out of puff with the sort of  lists and ranges you could be faced with.

SQL has the syntax to cope with ranges, in the BETWEEN sub clause of the WHERE clause. The IN subclause also uses simple lists. What we need, therefore, to do is to do an inner join with a number table to get that useful table full of the members of the list. We finish off by converting that table column back into a list. The downside is that this means we can’t use a function because the use of INSERT..EXEC isn’t allowed in a function because the query optimiser can’t be sure that it would be deterministic. We have to make it a procedure. The disadvantage is therefore in that we can’t use it for a whole table of lists. In that case, the iterative method wins out. In this example, I create a number table quickly on the fly, so it is only useful for the range I’ve specified for the table. If you were to use this method, you’d probable use a permanent number table with the required range.

Testing

With any programming, the most difficult part is working out how to test what you’ve done for accuracy and performance. As well as the basic assertion tests, I needed, perhaps, ten thousand or so lists in a SQL Server table as a unit test. I felt pretty sure that a good scheme was to convert them to ranged form, convert them back again, and then compare. I then used SQL Data Generator with  the following very simple reverse-regex to add ten thousand rows in the ‘Before’ column.

As well as making performance testing easier , it was able to flush out some rather more subtle bugs. 

Here was the code I used for the test when using the function for getting the integer list from the ranged list.

Which compared the original unranged list into a list with ranges, and then back again, comparing it with the original. When we try to do the same with the stored procedure version, we have to resort to RBAR which is so slow that you can nip out for a cuppa and a chat while it is doing it.

Conclusion

I must admit to find myself doing odd things in SQL Server occasionally. It is obvious, from what I’ve shown you here that operations like these are much easier in a language such as C# or PowerShell, yet sometimes it is just handy to be able to do them in SQL Server. I confess that string lists of integers fascinate me a bit because they can prove to be useful to communicate with applications, but up to now,  I’ve always shied away from encouraging ranges. Now that one comes across ranges more often, thanks to BI, it is nice to be sure that, in or out of the database, ranged lists can be coped with. As always, I’d be delighted to be told of an easier, more convenient or quicker was of doing this in SQL Server. It is certainly nice enough to know that it can be done.

References

For more articles like this, sign up to the fortnightly Simple-Talk newsletter.

  • 5526 views

  • Rate
    [Total: 5    Average: 4.8/5]
  • Ananda Kumar J

    Hi Phill,
    I am big fan of your master skills in Programming. with the little knowledge i have i found another way to achieve the above requirement. Can the above SQL Code be written as below?

    DECLARE @list VARCHAR(80)
    DECLARE @UNION_LIST VARCHAR(8000)
    SELECT @list=’-3,0,1,4,5,6,7,8,9,11,14,19,20′
    SELECT @UNION_LIST = ‘SELECT ‘ + REPLACE(@LIST,’,’,’ UNION ALL SELECT ‘)
    IF OBJECT_ID(‘TEMPDB..#NUMBERS’) IS NOT NULL
    DROP TABLE #NUMBERS

    CREATE TABLE #NUMBERS (NUMBER INT, MINVALUE INT)

    INSERT INTO #NUMBERS (NUMBER)
    EXEC( @UNION_LIST)

    UPDATE N1
    SET MINVALUE = N1.NUMBER
    FROM #NUMBERS N1
    LEFT JOIN #NUMBERS N2 ON N2.NUMBER = N1.NUMBER -1
    WHERE N2.NUMBER IS NULL

    WHILE EXISTS ( SELECT 1 FROM #NUMBERS WHERE MINVALUE IS NULL)
    UPDATE TOP (1) N2
    SET MINVALUE = N1.MINVALUE
    FROM #NUMBERS N1
    INNER JOIN #NUMBERS N2 ON N2.NUMBER = N1.NUMBER + 1
    WHERE N2.MINVALUE IS NULL

    SELECT CONVERT(NVARCHAR(20),MIN(NUMBER)) + ISNULL(‘ – ‘ + CONVERT(NVARCHAR(20),NULLIF(MAX(NUMBER),MIN(NUMBER))), ”)
    FROM #NUMBERS
    GROUP BY MINVALUE

    • @Ananda
      Yes, there are some ingenious techniques here. You are well on the way to a solution.
      To get the sequential list:
      The major problem with the sequential list solution is that you can only make a range if there at least three sequential numbers. You are creating a range when there are only two.
      Using the UNION ALL or VALUES technique to split a list is a bit slower than XML but the major problem is that you can’t then use it in a function. INSERT…EXEC is forbidden because it makes the function non-deterministic.
      This is a disadvantage in use. This is why I provided the range-to-list routine as a function as well as the procedure. (the test for the function took 14 secs for 10,000 strings whereas the same test with a procedure took 8 Minutes!). The problem with the technique you are using to find the ranges is going to be slow because you are requiring an UPDATE for each integer in the range. If, for example, the range went like this ‘1-50′ that would be 50 updates!It is quicker to …
      1/ add an identity column
      CREATE TABLE #NUMBERS ([rank] INT IDENTITY (1,1), NUMBER INT, MINVALUE INT)
      2/ and then get the ranges with this
      SELECT number-[rank],MIN(Number),MAX(number), COUNT(*) FROM #numbers GROUP BY number-[rank]
      You’ll see a way around the problem with this!
      To get the numbers:
      This is a promising idea. You seem to have a problem with negative number ranges here,
      Msg 245, Level 16, State 1, Line 15
      Conversion failed when converting the nvarchar value ’10–8’ to data type int.
      (I added one to your sample data) but on the other hand, assuming you use a better way of shredding the input, it would avoid having the INSERT-EXEC, which would limit you to a procedure, so it is well worth persevering with this.

      Phil

  • Colin Daley

    I have read only as far as the defintion of a range expression. Suppose you had a sequence like this:
    -6, -3, -2, -1, 1, 3, 4, 5. Would the range be represented as -6,-3–1,1,3-5, i.e.6,31,1,35 (where and are the same character)? Does your code process such a range?

    • -6,-3–1,1,3-5. Yes, it does, well..er.. I hope so!

      A dash follows a numeric character whereas a minus never does. That is how you distinguish them