Click here to monitor SSC
  • Av rating:
  • Total votes: 13
  • Total comments: 0
Dwain Camps

Condensing a Delimited List of Integers in SQL Server

28 August 2013

In real-world applications, it often makes sense to show denormalized data, such as delimited lists, within the application's user interface. Dwain Camps shows why, and how, the distribution business stores information about 'islands' in sequences in order to track the status of the shipping of a consignment. It makes a great SQL puzzle.

In my work I support a couple of companies that are engaged in distribution of one kind or another. The distribution business is generally about moving boxes from point A to point B. A group of boxes going from one shipper location to a final consignee is called a shipment or a consignment. It is important to ensure that every box within the consignment is delivered, and to spot cases where boxes (components of a shipment) haven’t yet been passed through a particular checkpoint. During the lifetime of a consignment within the distribution network, each consignment goes through a sequence of statuses. For example, there’s one called “MDE” which is the time at which information from the consignment note (customer, addresses, etc.) is encoded into the application. Many of the consignment status codes are recorded by scanning a bar code, usually on a package, so there are two types of status: package and consignment level. Included in the package’s bar code is a unique package number within the consignment, e.g., 4 of 9.

When a consignment enters or leaves a distribution center, it is a general practice to scan each package to a different status code. The application contains various forms for tracking the status of a shipment. When the form displays a consignment with many packages, this can result in a huge number of scans (rows) in the history, so in practice what is done is to simply construct a delimited list of package numbers that were each scanned to a status.

When you’ve got 999 packages in a consignment, this will still result in a pretty long list of package numbers.

Both in commerce and science, we already have a way of representing ordered lists of ‘broken sequences.’ This allows us to describe islands of unbroken sequences economically as, for example, ‘2-8, 10-16, 18-24, 26-32, 34-40, 42-48. Even better, we can intermix these with the usual delimited list. ‘1, 2, 5-9’. The unbroken sequence from 1 to 100 would be very economically rendered as ‘1-100’, and it is easy to understand.

Sometimes it makes sense to use this representation of a sequence within an application. How can we handle this in SQL, turning it into a form that is easily manipulated by that relational system, and then back again into a text-based, delimited list? I’ll be showing you how in this article. I’ll first suggest how to condense a delimited list of integer package numbers into a list of unbroken ‘islands,’ show how to determine when these islands are missing packages and then go in the other direction by ‘exploding’ a condensed list.

If you’re a SQL enthusiast, you may want to look ahead to “The Final Word” section to see what algorithms will be combined in this article’s examples.

Data Setup

In order to explain and demonstrate the concept of condensing a delimited list, we’ll need to start by creating some sample data. For this, we’ll create two tables:

  1. The first will contain a row set where the primary key the consignment number (a “class” of items) and also the package (or item) number within the class.
  2. The second table will contain the same consignment number key but in its second column will be a delimited list of the packages that appeared as separate rows in our first table.

Let’s create and populate the first table (#ConsignmentHistory). For the purposes of our examples, we’re ignoring the status code that we mentioned above, or really what we’re doing is assuming that all of these examples are scans to the same status code.

-- Create our Consignment History table
CREATE TABLE #ConsignmentHistory
(
	ConsignmentNo      VARCHAR(20),
	PkgNo              INT,
    PRIMARY KEY (ConsignmentNo, PkgNo)
);

-- Insert some small sequences of packages
INSERT INTO #ConsignmentHistory
SELECT '2322',1 UNION ALL SELECT '2322',3 UNION ALL SELECT '2322',4
UNION ALL SELECT '2343',1 UNION ALL SELECT '2343',2 UNION ALL SELECT '2343',5 
UNION ALL SELECT '2343',6
UNION ALL SELECT '7611',1 UNION ALL SELECT '7611',2 UNION ALL SELECT '7611',4 
UNION ALL SELECT '7611',6  
UNION ALL SELECT '5195',7 UNION ALL SELECT '5195',1 UNION ALL SELECT '5195',2 
UNION ALL SELECT '5195',5;

-- Insert a longer sequence of packages with some gaps
WITH Tally (n) AS (
    SELECT number
    FROM [master].dbo.spt_values
    WHERE [Type] = 'P' AND Number BETWEEN 1 AND 50)
INSERT INTO #ConsignmentHistory
SELECT '4385', n
FROM Tally
WHERE 1 <> n%8;

-- Insert the longest sequence of packages with no gaps
WITH Tally (n) AS (
    SELECT number
    FROM [master].dbo.spt_values
    WHERE [Type] = 'P' AND Number BETWEEN 1 AND 100)
INSERT INTO #ConsignmentHistory
SELECT '6377', n
FROM Tally;

SELECT *
FROM #ConsignmentHistory
ORDER BY ConsignmentNo, PkgNo;

Note that for each of the consignment numbers except the last, there exist some gaps in the package numbers. We can use the familiar FOR XML PATH method within a correlated sub-query to populate our second table, where the second column contains an ordered delimited list of the PkgNos.

-- A second table to store our de-normalized views of packages
CREATE TABLE #ConsHistoryDelimited
(
	ConsignmentNo       VARCHAR(20) PRIMARY KEY,
	PkgNo               VARCHAR(8000),
    StartPkgNo          INT,
    EndPkgNo            INT,
    CondensedPkgNos     VARCHAR(8000),
    MissingPkgNos       VARCHAR(8000)
);

-- Construct the delimited list using the standard FOR XML PATH method
INSERT INTO #ConsHistoryDelimited (ConsignmentNo, PkgNo, StartPkgNo, EndPkgNo)
SELECT ConsignmentNo, STUFF((
    SELECT ', ' + CAST(PkgNo AS VARCHAR(5))
    FROM #ConsignmentHistory b
    WHERE a.ConsignmentNo = b.ConsignmentNo
    ORDER BY b.ConsignmentNo
    FOR XML PATH('')), 1, 2, '')
    -- Ignore these for now (we'll find a use for them later)
    ,MIN(PkgNo), MAX(PkgNo)
FROM #ConsignmentHistory a
GROUP BY ConsignmentNo;

SELECT ConsignmentNo, PkgNo
FROM #ConsHistoryDelimited
WHERE ConsignmentNo IN ('2322','2343','7611','5195');

For the time being, we will ignore the last 4 columns in the #ConsHistoryDelimited table, but we will be using them later. We can see that for the SELECTed results, our delimited lists are quite short:

ConsignmentNo  PkgNo
2322           1, 3, 4
2343           1, 2, 5, 6
5195           1, 2, 5, 7
7611           1, 2, 4, 6

However if we were to examine the results for the last 2 consignments, you’d see that those lists are quite long.

Creating a Condensed List of Delimited Integer Items

One objective of this article is to take a delimited list of integers such as the ones we’ve stored in #ConsHistoryDelimited and condense them, in order to reduce the length of the string. Why would we want to do such a thing you might ask? A fair enough question, but first let’s start with our desired results set.

ConsignmentNo  PkgNo
2322           1, 3-4
2343           1-2, 5-6
5195           1-2, 4, 6-7
7611           1-2, 5
4385           2-8, 10-16, 18-24, 26-32, 34-40, 42-48, 50
6377           1-100

Because our first few lists are quite short to begin with, we haven’t achieved much of a reduction in length. We do see a difference for consignment number 7611 and a significant difference for consignment numbers 4385 and especially 6377. Such a shortening of the list of items could be quite useful if you need to display that list in your application’s front end. The condensed version may actually be easier for a viewer to grasp, especially if they want to know whether the list is complete or not. We have specifically included a space after the commas, so that when you display the delimited list in a form, the long list of items can be wrapped to multiple lines easily.

In order to condense our list, we must draw upon the concept of Gaps and Islands, but more specifically we need to group ranges (islands) of contiguous package numbers, similar to the way SQL MVPJeff Moden demonstrated you can Group Islands of Contiguous Dates. This technique was also demonstrated (possibly pre-dating Jeff’s article) by SQL MVP Itzik Ben-Gan in Chapter 5 of the book SQL Server MVP Deep Dives. Let’s return to our first table and calculate the islands utilizing this approach, which I’ve also heard referred to as the “staggered rows” approach to calculating islands:

-- Display the "islands" (ranges) of contiguous packages for each consignment
WITH Islands AS (
    SELECT ConsignmentNo, StartPkgNo=MIN(PkgNo), EndPkgNo=MAX(PkgNo)
    FROM (
        SELECT ConsignmentNo, PkgNo
            -- This rn represents the "staggered rows"
            ,rn=PkgNo-ROW_NUMBER() OVER (PARTITION BY ConsignmentNo ORDER BY PkgNo)
        FROM #ConsignmentHistory) a
    GROUP BY ConsignmentNo, rn)
SELECT ConsignmentNo, StartPkgNo, EndPkgNo
FROM Islands
ORDER BY ConsignmentNo, StartPkgNo;

We chose this approach because it is pretty efficient and after a bit of study, reasonably easy to grasp conceptually. Here are the results:

ConsignmentNo   StartPkgNo   EndPkgNo
2322            1            1
2322            3            4
2343            1            2
2343            5            6
4385            2            8
4385            10           16
4385            18           24
4385            26           32
4385            34           40
4385            42           48
4385            50           50
5195            1            2
5195            5            5
5195            7            7
6377            1            100
7611            1            2
7611            4            4
7611            6            6

To construct our condensed list of package numbers, we need to combine the above Islands with the FOR XML PATH technique to generate the delimited list that is our desired results set. The only thing different than when we used FOR XML PATH when setting up the #ConsHistoryDelimited, is to CASE the islands that consist of a single PkgNo as a single integer rather than ending up with something like 3-3, as follows:

-- Using the islands, put the delimited list with hyphenated package number 
-- ranges into our table
WITH Islands AS (
    SELECT ConsignmentNo, StartPkgNo=MIN(PkgNo), EndPkgNo=MAX(PkgNo)
    FROM (
        SELECT ConsignmentNo, PkgNo
            -- This rn represents the "staggered rows"
            ,rn=PkgNo-ROW_NUMBER() OVER (PARTITION BY ConsignmentNo ORDER BY PkgNo)
        FROM #ConsignmentHistory) a
    GROUP BY ConsignmentNo, rn)
UPDATE a
SET CondensedPkgNos=STUFF(( 
    SELECT ', ' + 
        CASE -- Include either a single Item or the range (hyphenated)
            WHEN StartPkgNo = EndPkgNo THEN CAST(StartPkgNo AS VARCHAR(5))
            ELSE CAST(StartPkgNo AS VARCHAR(5)) + '-' + CAST(EndPkgNo AS VARCHAR(5))
            END
    FROM Islands b
    WHERE a.ConsignmentNo = b.ConsignmentNo
    ORDER BY StartPkgNo
    FOR XML PATH('')), 1, 2, '')
FROM #ConsHistoryDelimited a;

We have chosen to UPDATE the corresponding column in our #ConsHistoryDelimited table with the condensed list of package numbers. Examination of the SELECTed results shows they are identical to the correct results we wanted to see!

Condensing a List of Delimited Items

Perhaps we already have the delimited list of items to start with and we simply want to condense it. Now that we have an inkling of how to do it, we can easily use Jeff Moden’s “community,” delimited-string splitter, the well-known and very popular, best-of-breed DelimitedSplit8K FUNCTION. In order to run the next example, you’ll need to download and install it from the linked article.

-- Starting with the Delimited list, construct the condensed delimited list
WITH Islands AS (
    SELECT ConsignmentNo, StartPkgNo=MIN(PkgNo), EndPkgNo=MAX(PkgNo)
    FROM (
        SELECT ConsignmentNo, c.PkgNo
            ,rn=c.PkgNo-ROW_NUMBER() OVER (PARTITION BY ConsignmentNo ORDER BY c.PkgNo)
        -- Changes to this CTE start here
        FROM #ConsHistoryDelimited a
        CROSS APPLY dbo.DelimitedSplit8K(a.PkgNo, ',') b
        CROSS APPLY (SELECT CAST(Item AS INT)) c(PkgNo)
        -- And end here
        ) a
    GROUP BY ConsignmentNo, rn)
SELECT ConsignmentNo, a.PkgNo, CondensedPkgNos=STUFF(( 
    SELECT ', ' + 
        CASE -- Include either a single Item or the range (hyphenated) 
            WHEN StartPkgNo = EndPkgNo THEN CAST(StartPkgNo AS VARCHAR(5))
            ELSE CAST(StartPkgNo AS VARCHAR(5)) + '-' + CAST(EndPkgNo AS VARCHAR(5))
            END
    FROM Islands b
    WHERE a.ConsignmentNo = b.ConsignmentNo
    ORDER BY StartPkgNo
    FOR XML PATH('')), 1, 2, '')
FROM #ConsHistoryDelimited a;

Note that only a very minor modification (look for the comment “Changes to this CTE start here”) was required to the Islands Common Table Expression (CTE) to use the delimited list stored in #ConsHistoryDelimited and split that string using DelimitedSplit8K. The Cascaded CROSS APPLY is only there to avoid a few extra CASTs of Item (column output by DelimitedSplit8K) to INT.

Another Case to Consider

So what’s missing? More specifically, perhaps we’d like to see a condensed list of those package numbers that are missing for each consignment. That is the other half of the Gaps and Islands class of problems. We’d like to know what the gaps are. Any of the many methods for calculating gaps will do, but we’ll choose a somewhat obscure method of calculating gaps from islands using CROSS APPLY VALUES. Using the Islands CTE that we already have, we can construct a query that will deliver a condensed delimited list of both Gaps and Islands:

-- Our Islands CTE creates the ranges of package numbers
WITH Islands AS (
    SELECT ConsignmentNo, StartPkgNo=MIN(PkgNo), EndPkgNo=MAX(PkgNo)
    FROM (
        SELECT ConsignmentNo, PkgNo
            -- This rn represents the "staggered rows"
            ,rn=PkgNo-ROW_NUMBER() OVER (PARTITION BY ConsignmentNo ORDER BY PkgNo)
        FROM #ConsignmentHistory) a
    GROUP BY ConsignmentNo, rn)
    -- Convert the Islands to Gaps (missing) packages using CROSS APPLY VALUES
    ,Islands2Gaps AS (
        SELECT ConsignmentNo, GapStartPkgNo=MIN(PkgNo), GapEndPkgNo=MAX(PkgNo)
        FROM (
            SELECT ConsignmentNo, PkgNo
                ,rn=ROW_NUMBER() OVER (PARTITION BY ConsignmentNo ORDER BY (SELECT NULL))/2
            FROM Islands a
            CROSS APPLY (VALUES (StartPkgNo-1),(EndPkgNo+1)) b(PkgNo)) a
        GROUP BY ConsignmentNo, rn
        HAVING COUNT(PkgNo) = 2)
-- We use correlated subqueries with FOR XML PATH to show how Islands and Gaps
-- can be created at the same time.
SELECT ConsignmentNo
    ,CondensedItemNos=STUFF(( 
        SELECT ', ' + 
            CASE -- Include either a single package or the range (hyphenated) 
                WHEN StartPkgNo = EndPkgNo THEN CAST(StartPkgNo AS VARCHAR(5))
                ELSE CAST(StartPkgNo AS VARCHAR(5)) + '-' + CAST(EndPkgNo AS VARCHAR(5))
                END
        FROM Islands b
        WHERE a.ConsignmentNo = b.ConsignmentNo
        ORDER BY StartPkgNo
        FOR XML PATH('')), 1, 2, '')
    ,MissingPkgNos=STUFF(( 
        SELECT ', ' + 
            CASE -- Include either a single package or the range (hyphenated) 
                WHEN GapStartPkgNo = GapEndPkgNo THEN CAST(GapStartPkgNo AS VARCHAR(5))
                ELSE CAST(GapStartPkgNo AS VARCHAR(5)) + '-' + CAST(GapEndPkgNo AS VARCHAR(5))
                END
        FROM Islands2Gaps b
        WHERE a.ConsignmentNo = b.ConsignmentNo
        ORDER BY GapStartPkgNo
        FOR XML PATH('')), 1, 2, '')
FROM #ConsHistoryDelimited a;

Note that we have reverted to the Islands CTE that draws its data directly from the #ConsignmentHistory table. The results output from this query are:

ConsignmentNo  CondensedItemNos                             MissingPkgNos
2322           1, 3-4                                       2
2343           1-2, 5-6	                                    3-4
4385           2-8, 10-16, 18-24, 26-32, 34-40, 42-48, 50   9, 17, 25, 33, 41, 49
5195           1-2, 5, 7                                    3-4, 6
6377           1-100                                        NULL
7611           1-2, 4, 6                                    3, 5

If we prefer, we can UPDATE the missing packages into our table.

-- Our Islands CTE creates the ranges of package numbers
WITH Islands AS (
    SELECT ConsignmentNo, StartPkgNo=MIN(PkgNo), EndPkgNo=MAX(PkgNo)
    FROM (
        SELECT ConsignmentNo, PkgNo
            -- This rn represents the "staggered rows"
            ,rn=PkgNo-ROW_NUMBER() OVER (PARTITION BY ConsignmentNo ORDER BY PkgNo)
        FROM #ConsignmentHistory) a
    GROUP BY ConsignmentNo, rn)
    -- Convert the Islands to Gaps (missing) packages using CROSS APPLY VALUES
    ,Islands2Gaps AS (
        SELECT ConsignmentNo, GapStartPkgNo=MIN(PkgNo), GapEndPkgNo=MAX(PkgNo)
        FROM (
            SELECT ConsignmentNo, PkgNo
                ,rn=ROW_NUMBER() OVER (PARTITION BY ConsignmentNo ORDER BY (SELECT NULL))/2
            FROM Islands a
            CROSS APPLY (VALUES (StartPkgNo-1),(EndPkgNo+1)) b(PkgNo)) a
        GROUP BY ConsignmentNo, rn
        HAVING COUNT(PkgNo) = 2)
-- Update the missing package numbers column of our #ConsHistoryDelimited table
UPDATE a
SET MissingPkgNos=STUFF(( 
    SELECT ', ' + 
        CASE -- Include either a single package or the range (hyphenated) 
            WHEN GapStartPkgNo = GapEndPkgNo THEN CAST(GapStartPkgNo AS VARCHAR(5))
            ELSE CAST(GapStartPkgNo AS VARCHAR(5)) + '-' + CAST(GapEndPkgNo AS VARCHAR(5))
            END
    FROM Islands2Gaps b
    WHERE a.ConsignmentNo = b.ConsignmentNo
    ORDER BY GapStartPkgNo
    FOR XML PATH('')), 1, 2, '')
FROM #ConsHistoryDelimited a;

SELECT ConsignmentNo, MissingPkgNos
FROM #ConsHistoryDelimited;

Exploding the Condensed Delimited List

To explode a condensed, delimited list is to simply expand out any of the entries that include a hyphen. To do this, we’ll once again use DelimitedSplit8K (to split on comma separators) and the familiar concept of a Tally table. Pretty much any Tally table will do, whether it is a true table in your database or an in-line Tally table like I’ve done. To keep performance under control when using an in-line Tally table, make sure you don’t generate any more tally rows than you need, so check what I’ve done with TOP.

-- Exploding a condensed, delimited list
WITH Tally (n) AS (
    -- Restrict the rows in our 0-based Tally table to only what we need using TOP
    SELECT 0 UNION ALL
    SELECT TOP ((
            SELECT MAX(EndPkgNo)-MIN(StartPkgNo)
            FROM #ConsHistoryDelimited)
        ) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM sys.all_columns a CROSS JOIN sys.all_columns b)
SELECT ConsignmentNo, PkgNo=n
FROM #ConsHistoryDelimited a
-- First CA to split the condensed list on the comma delimiter
CROSS APPLY dbo.DelimitedSplit8K(CondensedPkgNos, ',') b
-- Now retrieve the range or make the single entry into a range
CROSS APPLY (
    SELECT StartPkgNo=LEFT(Item, CHARINDEX('-', Item + '-') - 1)
        ,EndPkgNo=CASE CHARINDEX('-', Item) 
                        WHEN 0 THEN Item 
                        ELSE RIGHT(Item, LEN(Item) - CHARINDEX('-', Item))
                        END
    ) c
-- Apply our Tally table to the range (the resulting n is our PkgNo)
CROSS APPLY (
    SELECT n=n+c.StartPkgNo
    FROM Tally
    WHERE n+c.StartPkgNo BETWEEN c.StartPkgNo AND c.EndPkgNo) d
ORDER BY ConsignmentNo, n;

You didn’t really think I was going to try something fancy and inefficient there, now did you? This works exactly the same whether your list is the islands or the gaps (change CondensedPkgNos to MissingPkgNos as the argument to DelimitedSplit8K if you don’t believe me).

The Final Word

Displaying a condensed delimited list can be most useful in the presentation layer of an application, particularly in cases where you may have long integer lists that you need to fit into just a small bit of screen real estate. While we don’t necessarily recommend storing them, there are ways that they can be handled and their manipulation is relatively simple.

In this article, we’ve combined the following concepts in various ways throughout these examples:

  • Creating a delimited list by using FOR XML PATH in a correlated subquery
  • Splitting a delimited list by using DelimitedSplit8K
  • Cascading CROSS APPLYs to create intermediate calculated values
  • Calculating islands (ranges) of contiguous sequence numbers
  • Using the islands to calculate gaps in (or missing) sequence numbers
  • Using an in-line Tally table with rows limited to only what is needed

While I am no teacher, I do know that the best way you, my valued readers, can learn is by throwing out a challenge problem to which you can apply the skills you learned in this article. So here it is:

Convert the condensed, delimited list of missing (gaps between) package numbers to the included (islands) of package numbers. The challenge is to use the CROSS APPLY VALUES method for converting Gaps to Islands that is described here to make it happen. In that method, you must know StartPkgNo and EndPkgNo, and that is why they’ve been provided in the table for you.

Post your solution (or any variant on the challenge that suits you) to the discussion thread to show you’ve done your homework. If I don’t hear back from anybody in a reasonable period of time, I’ll post my solution to the challenge.

As always, thanks for your attention and I hope you found this article to be informative.

Dwain Camps

Author profile:

Dwain Camps has been a project manager for many years. Because performance of applications can be a critical success factor for projects, he has been evangelizing on the need to develop highly performing SQL. By mentoring and authoring articles on SQL, he hopes to train a future generation of software engineers on the right and wrong ways to deliver SQL code. He also has a special interest in developing solutions to complex, data intensive problems using high performance SQL because the declarative nature of SQL allows development of algorithmically unique solutions that procedural languages may not be capable of.

Search for other articles by Dwain Camps

Rate this article:   Avg rating: from a total of 13 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.
 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Highway to Database Recovery
 Discover the best backup and recovery articles on Simple-Talk, all in one place. Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.