Kadane’s solution to the Maximum Subarray problem

Sometimes, you can be asked an awkward question about data that doesn't quite fit with the SQL way of thinking. Although window functions have made SQL more versatile, there are times when you have to reach for your favourite book of algorithms to code your way around an unusual reporting task

One interesting problem that you can come across occasionally is to find the contiguous sequence of numbers in a one-dimensional array of numbers that gives you the greatest sum value. Now that doesn’t sound particularly interesting until you are asked for a particular report like ‘How long was the period in which we were trading most profitably last year?’, or you maybe need to find the brightest part of an image. It is one of those techniques that like linear regression or ANOVA that you find all sorts of uses for once you understand it.

It is called the Maximum subarray problem, Kadane’s algorithm, The ‘Largest Sum Contiguous Subarray’ or the ‘Greatest subsequential sum’ algorithm. Learn it and you’ve aced four technical interview questions! I’ve never seen a SQL Solution.

Normally, the answer you’d give when you see a one-dimensional array of numbers like 45,12,32,90,48 would be ‘the whole lot contribute to the sum’, but that would assume that all the numbers were positive. What if they are not? They might, for example represent the difference from the mean. They might represent both input and output of a black-box system. Imagine that you were doing cashbook entries where cash was going in, and payments were going out, and you wanted to know the longest sequence of entries where you were accumulating money? What if you were doing just-in-time stocking of components and wanted to know those periods in which you were piling up components?

In the sequence of numbers −2, 1, −3, 4, −1, 2, 1, −5, 4 – the contiguous subarray with the largest sum is 4, −1, 2, 1, – with sum of 6.

for the sequence -1, 0, 15, 3, -9, 12, -4 – it would be 0, 15, 3, -9, 12 – with a sum of 21

for the sequence -1, -2, 3, 5, 6, -2, -1, 4, -4, 2, -2 – it would be 3, 5, 6, -2, -1, 4 – with a sum of 15

There are plenty of brute force ways of doing this but Kadane’s algorithm succeeds in being linear. This is given in the Wikipedia. Although it is used mostly for one-dimensional arrays, it will work with two-dimensional arrays as well if you simply sum up the rows, and then let the basic kadane algorithm run on the resulting one-dimensional array.

So how is this done in SQL? The simplest thing is to translate the algorithm into SQL directly, because it is very efficient. I’ve added to it slightly to give you the subarray as well as the sum. This works well-enough on small arrays and is linear. Ideally, you’d want a set-based method but I’ve never outgrown an iterative solution

Now there are several ways you might want to use this, but we want to have a version that we can unit-test easily as we work with the algorithm so we’d want to feed it lists whare the solution has been done already and see if it gives the correct solution. Normally, you’d probably take a table as a parameter and return a table result representing the subarray, but that is a bit messier to unit-test.

So there you have it.  As you can see, it is iterative, but grows linearly with the length of the array.  but you can use Quirky Update to speed it up considerably. I don’t know of a fast set-based algorithm but I’d be pleasantly surprised if you can come up with something faster than Kadane’s Algorithm.

  • 4716 views

  • Rate
    [Total: 3    Average: 5/5]
  • Shishir

    Last select query will not return for array (-2),(1),(-3),(4),(-1),(2),(-1),(-5),(4)

    I think we need to store start index value. I have modified code and added variable to store this.

    DECLARE @max_ending_here INT, –the total so far (the partial sum accumulator)

    @max_so_far INT, –The Maximum of the current solution

    @ii INT, –loop variable

    @iiMax INT, –length of the array

    @x INT, –the value of the current array element in the sequence (current row)

    @start INT, –the start point of the current solution

    @end INT, –the end – so far- of the current best solution

    @s int; — New variable to hold start index

    –so now we intialise all our variables

    SELECT @start = -1, @end = -1, @max_so_far = 0, @ii = 1,@s=1,

    @iiMax = MAX(#MaxSubarray.TheOrder)

    FROM #MaxSubarray;

    –iterate through the array (I know!)

    WHILE @ii @max_so_far

    SELECT @max_so_far = @max_ending_here, @end = @ii, @s = @start; — Storing to variable

    SELECT @ii = @ii + 1;

    –Now, if our accumulator has gone negative then we start again

    –at the next value following

    IF @max_ending_here < 0

    SELECT @max_ending_here = 0, @start = @ii;

    END;

    SELECT @max_so_far; — our highest positive sum fr4om a subsequence

    SELECT TheOrder,value –and list the subsequence that gave the highest sum

    FROM #MaxSubarray

    WHERE #MaxSubarray.TheOrder BETWEEN @s AND @end — change to @s

    ORDER BY #MaxSubarray.TheOrder;

    –if nothing in the array, the subarray is empty and the total is 0