28 August 2013

Condensing a Delimited List of Integers in SQL Server

In real-world applications, it often makes sense to show denormalized data such as delimited lists within the application's user interface. Dwain Camps shows why, and how, the distribution business stores information about 'islands' in sequences in order to track the status of the shipping of a consignment. It makes a great SQL puzzle

In my work I support a couple of companies that are engaged in distribution of one kind or another. The distribution business is generally about moving boxes from point A to point B. A group of boxes going from one shipper location to a final consignee is called a shipment or a consignment. It is important to ensure that every box within the consignment is delivered, and to spot cases where boxes (components of a shipment) haven’t yet been passed through a particular checkpoint. During the lifetime of a consignment within the distribution network, each consignment goes through a sequence of statuses. For example, there’s one called “MDE” which is the time at which information from the consignment note (customer, addresses, etc.) is encoded into the application. Many of the consignment status codes are recorded by scanning a bar code, usually on a package, so there are two types of status: package and consignment level. Included in the package’s bar code is a unique package number within the consignment, e.g., 4 of 9.

When a consignment enters or leaves a distribution center, it is a general practice to scan each package to a different status code. The application contains various forms for tracking the status of a shipment. When the form displays a consignment with many packages, this can result in a huge number of scans (rows) in the history, so in practice what is done is to simply construct a delimited list of package numbers that were each scanned to a status.

When you’ve got 999 packages in a consignment, this will still result in a pretty long list of package numbers.

Both in commerce and science, we already have a way of representing ordered lists of ‘broken sequences.’ This allows us to describe islands of unbroken sequences economically as, for example, ‘2-8, 10-16, 18-24, 26-32, 34-40, 42-48. Even better, we can intermix these with the usual delimited list. ‘1, 2, 5-9’. The unbroken sequence from 1 to 100 would be very economically rendered as ‘1-100’, and it is easy to understand.

Sometimes it makes sense to use this representation of a sequence within an application. How can we handle this in SQL, turning it into a form that is easily manipulated by that relational system, and then back again into a text-based, delimited list? I’ll be showing you how in this article. I’ll first suggest how to condense a delimited list of integer package numbers into a list of unbroken ‘islands,’ show how to determine when these islands are missing packages and then go in the other direction by ‘exploding’ a condensed list.

If you’re a SQL enthusiast, you may want to look ahead to “The Final Word” section to see what algorithms will be combined in this article’s examples.

Data Setup

In order to explain and demonstrate the concept of condensing a delimited list, we’ll need to start by creating some sample data. For this, we’ll create two tables:

  1. The first will contain a row set where the primary key the consignment number (a “class” of items) and also the package (or item) number within the class.
  2. The second table will contain the same consignment number key but in its second column will be a delimited list of the packages that appeared as separate rows in our first table.

Let’s create and populate the first table (#ConsignmentHistory). For the purposes of our examples, we’re ignoring the status code that we mentioned above, or really what we’re doing is assuming that all of these examples are scans to the same status code.

Note that for each of the consignment numbers except the last, there exist some gaps in the package numbers. We can use the familiar FOR XML PATH method within a correlated sub-query to populate our second table, where the second column contains an ordered delimited list of the PkgNos.

For the time being, we will ignore the last 4 columns in the #ConsHistoryDelimited table, but we will be using them later. We can see that for the SELECTed results, our delimited lists are quite short:

However if we were to examine the results for the last 2 consignments, you’d see that those lists are quite long.

Creating a Condensed List of Delimited Integer Items

One objective of this article is to take a delimited list of integers such as the ones we’ve stored in #ConsHistoryDelimited and condense them, in order to reduce the length of the string. Why would we want to do such a thing you might ask? A fair enough question, but first let’s start with our desired results set.

Because our first few lists are quite short to begin with, we haven’t achieved much of a reduction in length. We do see a difference for consignment number 7611 and a significant difference for consignment numbers 4385 and especially 6377. Such a shortening of the list of items could be quite useful if you need to display that list in your application’s front end. The condensed version may actually be easier for a viewer to grasp, especially if they want to know whether the list is complete or not. We have specifically included a space after the commas, so that when you display the delimited list in a form, the long list of items can be wrapped to multiple lines easily.

In order to condense our list, we must draw upon the concept of Gaps and Islands, but more specifically we need to group ranges (islands) of contiguous package numbers, similar to the way SQL MVPJeff Moden demonstrated you can Group Islands of Contiguous Dates. This technique was also demonstrated (possibly pre-dating Jeff’s article) by SQL MVP Itzik Ben-Gan in Chapter 5 of the book SQL Server MVP Deep Dives. Let’s return to our first table and calculate the islands utilizing this approach, which I’ve also heard referred to as the “staggered rows” approach to calculating islands:

We chose this approach because it is pretty efficient and after a bit of study, reasonably easy to grasp conceptually. Here are the results:

To construct our condensed list of package numbers, we need to combine the above Islands with the FOR XML PATH technique to generate the delimited list that is our desired results set. The only thing different than when we used FOR XML PATH when setting up the #ConsHistoryDelimited, is to CASE the islands that consist of a single PkgNo as a single integer rather than ending up with something like 3-3, as follows:

We have chosen to UPDATE the corresponding column in our #ConsHistoryDelimited table with the condensed list of package numbers. Examination of the SELECTed results shows they are identical to the correct results we wanted to see!

Condensing a List of Delimited Items

Perhaps we already have the delimited list of items to start with and we simply want to condense it. Now that we have an inkling of how to do it, we can easily use Jeff Moden’s “community,” delimited-string splitter, the well-known and very popular, best-of-breed DelimitedSplit8K FUNCTION. In order to run the next example, you’ll need to download and install it from the linked article.

Note that only a very minor modification (look for the comment “Changes to this CTE start here”) was required to the Islands Common Table Expression (CTE) to use the delimited list stored in #ConsHistoryDelimited and split that string using DelimitedSplit8K. The Cascaded CROSS APPLY is only there to avoid a few extra CASTs of Item (column output by DelimitedSplit8K) to INT.

Another Case to Consider

So what’s missing? More specifically, perhaps we’d like to see a condensed list of those package numbers that are missing for each consignment. That is the other half of the Gaps and Islands class of problems. We’d like to know what the gaps are. Any of the many methods for calculating gaps will do, but we’ll choose a somewhat obscure method of calculating gaps from islands using CROSS APPLY VALUES. Using the Islands CTE that we already have, we can construct a query that will deliver a condensed delimited list of both Gaps and Islands:

Note that we have reverted to the Islands CTE that draws its data directly from the #ConsignmentHistory table. The results output from this query are:

If we prefer, we can UPDATE the missing packages into our table.

Exploding the Condensed Delimited List

To explode a condensed, delimited list is to simply expand out any of the entries that include a hyphen. To do this, we’ll once again use DelimitedSplit8K (to split on comma separators) and the familiar concept of a Tally table. Pretty much any Tally table will do, whether it is a true table in your database or an in-line Tally table like I’ve done. To keep performance under control when using an in-line Tally table, make sure you don’t generate any more tally rows than you need, so check what I’ve done with TOP.

You didn’t really think I was going to try something fancy and inefficient there, now did you? This works exactly the same whether your list is the islands or the gaps (change CondensedPkgNos to MissingPkgNos as the argument to DelimitedSplit8K if you don’t believe me).

The Final Word

Displaying a condensed delimited list can be most useful in the presentation layer of an application, particularly in cases where you may have long integer lists that you need to fit into just a small bit of screen real estate. While we don’t necessarily recommend storing them, there are ways that they can be handled and their manipulation is relatively simple.

In this article, we’ve combined the following concepts in various ways throughout these examples:

  • Creating a delimited list by using FOR XML PATH in a correlated subquery
  • Splitting a delimited list by using DelimitedSplit8K
  • Cascading CROSS APPLYs to create intermediate calculated values
  • Calculating islands (ranges) of contiguous sequence numbers
  • Using the islands to calculate gaps in (or missing) sequence numbers
  • Using an in-line Tally table with rows limited to only what is needed

While I am no teacher, I do know that the best way you, my valued readers, can learn is by throwing out a challenge problem to which you can apply the skills you learned in this article. So here it is:

Convert the condensed, delimited list of missing (gaps between) package numbers to the included (islands) of package numbers. The challenge is to use the CROSS APPLY VALUES method for converting Gaps to Islands that is described here to make it happen. In that method, you must know StartPkgNo and EndPkgNo, and that is why they’ve been provided in the table for you.

Post your solution (or any variant on the challenge that suits you) to the discussion thread to show you’ve done your homework. If I don’t hear back from anybody in a reasonable period of time, I’ll post my solution to the challenge.

As always, thanks for your attention and I hope you found this article to be informative.

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter


This post has been viewed 12144 times – thanks for reading.

  • Rate
    [Total: 14    Average: 4.6/5]
  • Share

Dwain Camps

View all articles by Dwain Camps

Related articles

Also in Database Administration

The SQL Server 2016 Query Store: Forcing Execution Plans using the Query Store

The SQL Server 2016 Query Store can give you valuable performance insights by providing several new ways of troubleshooting queries, studying their plans, exploring their context settings, and checking their performance metrics. However, it can also directly affect the performance of queries by forcing Execution Plans for specific queries.… Read more

Also in SQL

Relational Algebra and its implications for NoSQL databases

With the rise of NoSQL databases that are exploiting aspects of SQL for querying, and are embracing full transactionality, is there a danger of the data-document model's hierarchical nature causing a fundamental conflict with relational theory? We asked our relational expert, Hugh Bin-Haad to expound a difficult area for database theorists.… Read more

Also in SQL Server

SQL Server System Functions: The Basics

Every SQL Server Database programmer needs to be familiar with the System Functions. These range from the sublime (such as @@rowcount or @@identity) to the ridiculous (IsNumeric()) Robert Sheldon provides an overview of the most commonly used of them.… Read more

Also in Database Administration

3 Built-In Technologies for Centralizing SQL Server Administration

DBAs can be more effective in managing their workload by centralizing their procedures. There are several features of SQL server that can be harnessed to this end: PowerShell is only part of the solution: there is also Central Management Server, Master /Target Agent and the Remote Server Administration Tools. It's time to work out your objectives and pick the most suitable technologies to meet them. … Read more

Join Simple Talk

Join over 200,000 Microsoft professionals, and get full, free access to technical articles, our twice-monthly Simple Talk newsletter, and free SQL tools.

Sign up