The SQL of Gaps and Islands in Sequences

Some SQL problems are intriguing because, just when good methods emerge and are accepted, other alternative solutions are discovered. The fun of exploring problems such as 'Gaps and Islands' is all the greater when we have a thorough test-harness to try out the alternative solutions.

The word ‘Gaps’ in the title refers to gaps in sequences of values. Islands are unbroken sequences delimited by gaps. The ‘Gaps and Islands’ problem is that of using SQL to rapidly detect the unbroken sequences, and the extent of the gaps between them in a column.

Islands and gaps appear in all sorts of sequences, be they IDENTITY columns where some rows have been removed or dates that occur in sequence (but some are missing).  In all cases, the sequences do not contain duplicates. The ‘Gaps and Islands’ problem isn’t entirely an academic game, since a number of business processes demand some way of detecting gaps and islands in sequences. A typical example might occur in the express distribution business where a consignment has many packages numbered sequentially.  Typically you scan all packages when a consignment reaches the depot.  If packages are missing those represent the gaps.  So if you want to represent the event “consignment arrived at the depot” and list the package numbers that did arrive, you’d want to group them like 1-10, 15-18, etc. where those are the islands.

It is complex and interesting task to come up with a solution that performs and scales well.  Chapter 5 of SQL Server MVP Deep Dives, Gaps and Islands by Itzik Ben-Gan, is probably the best and most thorough explanation of the main solutions.  My interest was piqued by that article and I played about with the examples to try to gain a little better insight.  While doing so, I happened upon a rather different solution to the problem, and was sufficiently intrigued to develop a test harness to evaluate the way that the various algorithms performed and scaled.

Let’s dip our toe in the pond by looking at some sample data.

We’ve thrown in an ID column because you may need to retrieve islands and/or gaps across more than one record grouping.  However, this value is not used in the basic examples, but will be used in the later performance test harnesses. This produces a table of sample data (on the left) and I’ve added the results we should expect for islands (in the middle) and gaps (on the right):




ID SeqNo

1  1

1  2

1  5

1  6

1  8

1  9

1  10

1  12

1  20

1  21

1  25

1  26

ID StartSeqNo EndSeqNo

1  1          2

1  5          6

1  8          10

1  12         12

1  20         21

1  25         26

ID StartSeqNo EndSeqNo

1  3          4

1  7          7

1  11         11

1  13         19

1  22         24

In our example, islands are contiguous groups of integers.  In a way, gaps are the inverse of islands, as they are the endpoints between the islands. 

It is interesting to note that there will always be one less row of gaps than islands.

Some Islands Solutions

In the SQL MVP Deep Dives book, the author provides four solutions to the islands problem.  From his timing results, it seems that two of these seem to be much better than the others, based on the measure of elapsed time only.  Let’s take a look at these two.

We have modified the code provided to support the ID column, which may reflect different user IDs, machine IDs, or whatever for your particular case.  When you run this code, you’ll find that the results precisely match the expected results for islands shown in the table above.  It is important to note that in at least the second case, the sequence numbers must be unique within an ID in order for the solution to work.

The following performance was achieved by these two solutions across 1,000,000 rows, with the index being essential to achieving these results.  This basically confirms the findings in the book: solution #3 is faster.


Test Harness – 1,000,000 Rows


CPU (ms)

Elapsed Time (ms)

Logical IOs

Islands #1




Islands #3




In the resources file, you may run the script: Islands Test Harness #1.sql to confirm these results for yourself and see how they may vary based on your machine’s configuration.

The Traditional Gaps Solutions

The gaps problem, in my mind at least, is intrinsically a bit more challenging than islands because, in effect you need to “make up” the data points that are the endpoints of each gap; meaning that by definition for the gaps, the endpoints of the gap do not already exist in the data.

As I thumb through my extensively dog-eared copy of the SQL MVP Deep Dives book, I find there are four solutions proposed for gaps.  Of these, there are 3 which seem to perform to relatively the same order of magnitude when it comes to elapsed time.

Once again a check of the results confirms that they are identical to those shown in the preceding table for gaps.  Test harness Gaps Test Harness #1.sql confirms that solution #1 from SQL MVP Deep Dives is the fastest.


Test Harness – 1,000,000 Rows


CPU (ms)

Elapsed Time (ms)

Logical IOs

Gaps #1




Gaps #2




Gaps #3




An Alternative Gaps Solution

Since we have relatively a fast method of determining islands, perhaps it’s possible to convert that solution into one that will identify our gaps.  Let’s take a look at the islands results to see how it might be converted into gaps.


If we first insert two columns and fill the first with one less than the StartSeqNo and the second with one greater than the EndSeqNo, we can circle the numbers that represent the gaps.  Each pair of colored, circled numbers in the rightmost table perfectly represents our gaps.

This suggests to us that UNPIVOTing the islands might give us something to work with to arrive at the gaps.  There are at least three methods to un-pivot two columns: 1) using the SQL UNPIVOT keyword, 2) doing a UNION ALL between each of the columns and 3) using the CROSS APPLY VALUES (CAV) approach to UNPIVOT.  We’ll choose the latter because as this article: An Alternative (Better?) Method to UNPIVOT seems to demonstrate, that approach may be the swiftest.  We’ll also move the fastest islands bit of code magic into a Common Table Expression (CTE) so we can focus on our gaps solution.

We’ve thrown in a couple of row numbers to see if they might help us with the grouping we need.  These results are:

We certainly have the numbers in SeqNo that represent the endpoints of our gap and we even have a grouping column (m) that will allow us to group our gap start/end points, which we arrived at by the totally simplistic method of dividing by two.  Now all we have to do is figure out a way to remove the first and last row from the result set!

Since all the gap endpoints of interest are in groups of two, this brings to mind the HAVING clause.  Let’s give that a try.

Inspection of the results set clearly shows we’ve achieved our expected results – the gaps!

Performance Comparison of Gaps Solutions

Performance always counts so let’s see how well this solution performs in a sufficiently large test harness versus the solutions proposed in the MVP Deep Dives book. 

In the two test harness scripts we mentioned previously, the code to generate a large test data set looks like this:

While reviewing this article, Peter Larsson (alias PESO) suggested an alternative using CROSS APPLY VALUES that may improve the speed.  So let’s take a look at that method also.

Using the gaps solutions identified previously, a quick check of the results at 1,000,000 rows delivers these results (run Gaps Test Harness #2.sql).


Test Harness – 1,000,000 Rows


CPU (ms)

Elapsed Time (ms)

Logical IOs

SQL MVP Deep Dives Gaps #1




SQL MVP Deep Dives Gaps #2




SQL MVP Deep Dives Gaps #3




CROSS APPLY VALUES – Islands to Gaps




CAV – Islands to Gaps (by PESO)




Elapsed times for the new CAV (Cross Apply Values) methods appear quite competitive, and both logical IOs and CPU time appear improved.  In a moment we’ll see if these results can be reproduced and how they scale.

An Alternative Islands Solution

If it is possible to convert Islands to Gaps, perhaps it is also possible to convert Gaps to Islands.  Let’s consider a data transformation as follows:


For this solution, we’ll take the end point and add one, while we subtract one from the start point.  We need to add a row, and that row will be the first and last sequence number in our series.  Finally, we’ll un-pivot and add a row number that groups each resulting sequence number in pairs (note that you could either add 1 to or subtract 1 from the row number before dividing by 2).

So now, skipping the intermediate explanatory step as it is quite similar to what we did before, we can arrive at the following code to calculate islands from gaps:

The Gaps CTE above is the SQL MVP Deep Dives fastest (elapsed time) approach to gaps and the MinMax CTE is used to compute the additional row we’ll need to go from gaps to islands.

So how do these results compare to our fastest islands solutions at 1,000,000 rows?


Test Harness – 1,000,000 Rows


CPU (ms)

Elapsed Time (ms)

Logical IOs

SQL MVP Deep Dives Islands #1




SQL MVP Deep Dives Islands #3




CROSS APPLY VALUES – Gaps to Islands




The SQL MVP Deep Dives Islands solution #3 still wins the elapsed time race but our approach is a respectable showing, being solidly in second place in terms of elapsed time.  It appears that parallelism accounts for this positioning, as CPU for the CAV – Gaps to Islands solution exceeds the elapsed time.  These results may be reproduced by running Islands Test Harness #2.sql.


Using each #2 test harness, we’ll generate test data from 200,000 to 2,000,000 rows and chart the results for comparison.  You’ll see that the final results appear to be a bit mixed.




For the CAV (Cross Apply Values) solutions for islands to gaps, their ranking appears as #1 for CPU across the board, although it is a slim lead and the convergence at 2M rows suggests that it may start lagging behind above that.  For elapsed time, the CAV solutions appear to be basically tied with MVP DD S#1 up to about 1.2M rows, but then they fall to second place ranking above that row count.  In logical IOs they are the clear winner across the board (the SQL MVP Deep Dive solution #2 was omitted due to scaling of the chart).  MVP DD S#3 does get good marks for logical IOs.




For the CAV (Cross Apply Values) solution of gaps to islands, it manages to achieve a #2 ranking for CPU above 1M rows but lagged slightly behind prior to that.  For elapsed time, above 1.8M rows it seems to have achieved the #1 ranking of the 3 solutions (below 1.8M it is #2), but our scaling tests don’t go above 2M rows so we’re not sure how long it would maintain that lead.  However on logical IOs, it is a clear #3 (and would probably remain so) regardless of the number of rows in the test harness.

It also appears that for the CAV gaps to islands solution, we lost the parallelism that resulted in the highly favorable elapsed time result, possibly because SQL had cached a non-parallelized execution plan at the lower row counts.

Ultimately the CAV (Cross Apply Values) approaches may offer benefits under some conditions but your choice should be made based on a careful analysis of which solution works best for your constraints of data rows, parallelism and the specific resource that you most wish to conserve (CPU, elapsed time or logical IOs).

A Google search for anything similar to an approach like this for calculating islands and gaps came up empty, but we’d love to hear if anyone has published anything on it before or has other improvements to suggest.  And we’d most certainly be thrilled to hear from our valued readers if this solution has been of benefit to them. Perhaps in SQL 2012 someone can come up with something faster utilizing a window frame.  We look forward to hearing about that too.

 Timing tests run for this article were done in SQL Server 2008 R2 on a Dell Inspiron Core i5 2.4 GHz with 8GB of memory. 

 Until next time

The Excel file with the results of the test, and the graphs, is attached, as are the four SQL source files for the tests. Links to them can be found at the head of this article


Tags: ,


  • Rate
    [Total: 1    Average: 5/5]
  • Torben Nehmer

    A few sidenotes

    first, a quite interesting article you wrote here. What I miss however, are these things:

    First, you test scaling over a quite limited range, just one magnitude. However, things get really interesting if you add another (or even two) magnitudes. Getting Gaps in 200k rows is easy given potent hardware. Here, I have the problem to locate gaps of sizes between 2 and 2 Million in a dataset of approximately 400 million rows.

    What I have been using (thanks to the MS Press T-SQL Querying book) is approximately this:


    This is the best solution I could find so far in terms of IO (which is the limiting factor in such large datasets). The actual gap search is a pair of inexpensive Table Scans over the base table plus a single quick index seek to find the end of the range. This scales relativly well even in the millions of gaps but is especially fast if you do incremental processing – I need only a few thousand gaps at a time starting at a specified location in the sequence.

    The real key solution I a missing however is the new SQL 2012 way of life. Using window functions, the gaps problem gets really trivial and is downgraded to a single scan of the target table/index:


    The performance in both CPU-time, wall clock time and IO load is second to none.

    Best regards,

  • Torben Nehmer

    A few sidenotes II
    Hi again,

    I did a few more comparisons with the algorithms presented here.

    While CAV – Islands to Gaps (by PESO) is indeed interesting from the basic idea (it requires only a single scan of the data), the amount of sorting and aggregating required actually is contraproductive if you need only the first 10k gaps (which I use as an example, as mentioned above).

    PESOs Solution requires the Server to compute the complete resultset regardless what to do, moving around 7.5 GB of data in my test set with several hundred million rows. I cancelled the test run after 5 minutes, where the solution outlined above needs just two or three seconds to do the work.

    MVP Deep Dives Gaps #3 exhibits the same problem when limited by a TOP statement albeit for other reasons (I am still waiting for a actual execution plan to deduce this).

    Ultimately, it seems that the problem has quite a few facets depending on a) how big your dataset is and b) whether you always need the full results or just a window into the resultset.

    The first point is of particular importance, I guess, as all algorithms scale well here if I limit the dataset to a million rows.

    Best regards,

  • Dwain Camps

    RE: A few side notes
    Hi Torben,

    Thanks for taking a look.

    Yes, I did operate over a limited number of rows in my test harness. No excuses for that. For such large row sets as you are dealing with, testing any proposed solution is always an imperative and I apologize if I didn’t give you sufficient results to infer scalability to those heights.

    I wish I had SQL 2012 to use the solution you posted. I am aware of it but there are probably a few folks out there that haven’t upgraded yet where this solution might be useful (I hope).


    Looking for Gaps using SQL Server 2012 Lag function
    I have no idea how this compares for performance with you examples, however the new LAG() function introduced in 2012 certainly is easier to read.

    SELECT HDR_line , lag(hdr_line,1,0) OVER(ORDER BY hdr_line) AS lag_hdr
    FROM dbo.Claim_hdr
    ) dt
    WHERE hdr_line <> lag_hdr +1

  • Anonymous

    non-problem looking for a human to give it significance
    How about this? "sequences" should only be surrogate keys and never "human readable" references. To me this whole exercise is about a human wanting the computer to make sense. Use your business key to group your packages, let the computer use the sequence to group/separate its "packages".

  • Dwain Camps

    non-problem looking for a human to give it significance
    For a non-problem, there certainly is a lot of extant literature on solving it.

    Whether by design or lack thereof, frequently there are times when it is necessary to do this silly kind of stuff.

    Thankfully, it can be done and has been made a bit more intuitive in SQL 2012.

  • pavan6e

    Not solution for similar case, where I deal with Time in HH:MM format
    I have a table to record some activity with start time and end time in HH:MM (24 hr clock) format. Can you please help me with the query to get the gaps?? I have tried as mentioned below.

    SELECT ID, StartSeqNo=CONVERT(VARCHAR(5),(dateadd(mi, 15,(cur))),114), EndSeqNo=convert(varchar(5),datediff(MINUTE,1,nxt),114)
    FROM (
    SELECT ID, cur=starttime, nxt=(
    SELECT top 1 endtime FROM dbo.ABC AS B
    WHERE B.ID = A.ID AND B.endtime > A.starttime and personid=’XYX’
    and weekdate=’2013-09-09 00:00:00.000′ order by weekdate,starttime desc,endtime desc)
    FROM dbo.ABC AS A where personid=’XYX’
    and weekdate=’2013-09-09 00:00:00.000′) AS D
    WHERE cast(datediff(MINUTE,nxt,cur) as numeric(6,2))/60.00 >= 0;

    This is not giving me the expected result. Can you please help me with this?

  • Dwain.C

    Looking for Gaps using SQL Server 2012 Lag function
    Re: the LAG solution that eklein suggested, you might want to take a look at Wayne Sheffield’s blog comparison. He tested LEAD against a couple of other solutions and I added CROSS APPLY VALUES (islands to gaps) to his test harness and also tried LAG. Note that I just found this blog post:

    The LEAD solution was not a contender and I found that LAG had a similar performance characteristic in his test harness.

  • Dwain.C

    Not solution for similar case, where I deal with Time in HH:MM format
    Regarding an equivalent Islands to Gaps solution for time, the answer is yes this will work.

    The only difference is that you don’t add/subtract one from the islands start/end points. You either add/subtract 0 or 3 milliseconds (for a DATETIME), depending on your intent.

    To suggest a specific query for your case, I’d need you to supply some DDL and test data, and I’d be happy to do so.

  • Dwain.C

    RE: Looking for Gaps using SQL Server 2012 Lag function
    I ran across the following performance test of SQL LEAD as a gaps solution against a couple of others. LAG would be expected to perform similarly.

    I’ve personally tested LAG and LEAD against the solutions I proposed here and they weren’t even close performance-wise. Very nice, tight syntax though.

  • Jason G

    Thanks so much for this article. Being someone still on a 2008R2 box, this is still very relevant and helpful.