Back in 2009 and 2010, Phil Factor hosted the SQL Speed Phreak Competition, in which programmers were challenged to construct the fastest solution to a series of common programming challenges. Unfailingly, the best solutions used set-based T-SQL techniques to achieve lightning-fast execution times, even when the data load scaled to millions of rows. Although fast, the solutions were often complex and hard to understand, and occasionally used slightly unorthodox techniques.
The Simple-Talk editors called in “Aunt Kathi” to help. My job was to explain as simply as I could the mechanics of these solutions, and so make them more accessible to developers who faced similar business problems, and needed a faster and more scalable solution than more conventional “row by row” processing techniques could offer. In each case, I also tried to offer an alternative solution that “learned” from the winning solution, but was simpler and therefore easier to maintain, but inevitably compromised a little on speed.
Skip forward a few years, and SQL Server 2012 has now introduced many powerful new T-SQL window functions and enhancements. Could they offer a simpler alternative to the original solutions that would match or perhaps even beat them on speed?
My previous article revisited the running total problem, where I proved that a solution that used the OVER clause and a ‘sliding window’ of data, with some aggregate functions, performed identically to the original solution, which relied on the unorthodox “query update” technique.
This article revisits the ‘FIFO (First In, First Out) Stock Inventory Problem’, a somewhat-simplified, but still realistic inventory accounting problem, where stock items are either purchased, sold, or returned, and where the oldest items in the queue of stock items (those that were the first to be purchased or returned) must be the first items sold. The requirement is to calculate the remaining count, and value, of each item after all stock transactions are performed.
The original winning solution from Dave Ballantyne is brilliant, but a brain twister for the uninitiated. It uses one statement comprised of several common table expressions (CTEs) and includes a couple of
CROSS APPLY type joins. Was there a simpler but equally-speedy window function-based solution? Let’s find out.
Understanding the FIFO Stock Inventory Problem
In this puzzle, stock items are purchased ( IN), returned ( RET), and sold ( OUT). Each type of stock item is identified by an
ArticleID. Each movement of stock in or out of the warehouse, due to a purchase, sale or return of a given type of stock, results in a row being added to the
Stock table, uniquely identified by a value in the
StockID identity column, and describing how many items of that type of stock were added or removed, the price for purchases, the date of the stock transaction, and so on.
The objective is to calculate the current value of the stock, by
ArticleID, after all transactions are complete. Prices are available only when items are purchased. If an item is returned, the most recent purchase price is used. Items are sold on a “first in, first out” basis, and the price to use is the price paid when the item was added to the inventory.
The following table shows an extract of data from the
Stock table, all relating to a single type of stock, and hopefully illustrating how the rules work.
|ArticleID||TranDate||TranCode||Items||Price||Current Count||Current Value||What Happened?|
|1001||2016-01-01||IN||10||$100||10||$1000||Added 10 at $100|
|1001||2016-01-02||IN||30||$101||40||$4030||Added 30 at $101|
|1001||2016-01-03||OUT||25||—||15||$1515||Added 10 at $100 and 15 at $101|
|1001||2016-01-04||RET||10||—||25||$2525||Added 10 at $101|
|1001||2016-01-05||IN||9||$99||34||$3416||Added 9 at $99|
|1001||2016-01-06||OUT||22||—||12||$1194||Sold 22 at $101|
The key to solving this puzzle is understanding that what is left after the transactions are complete is all that matters. You do not need to step through the transactions or even calculate the price at which items were sold.
I like to compare this puzzle to the common habit of cleaning out the spare change from one’s pockets at the end of each day. Instead of a change jar, imagine that you had a change queue on the dresser. Each night, you add the day’s change to the end of the queue. The next morning, you take some coins from the front of the queue if you think you would need them that day. If you want to know the current value of the coins, you only have to look at the existing coins in the queue to calculate the value. You do not need to know about the coins that have been previously removed. You only need to look at each remaining coin, figure out the value of each one, and add them up.
The FIFO Stock Inventory Problem is quite a bit more complex than coins on a dresser, but you still must concentrate on how many of each
ArticleID are left and the price when each item was added to stock. Here are the steps to solving the puzzle:
- Calculate a final count of the items by
ArticleID. This is easily done with an aggregate query. At the end of the transactions, there are 12 items left of
ArticleID1001 in the example above.
- Use a reverse running total of purchases and returns to figure out which transactions are responsible for the remaining items. A reverse running total is used because the purchases and returns at the end are responsible for the remaining stock.
- Find the date at which the reverse running total equals or exceeds the amount remaining at the end of all transactions. On 2016-01-04 the reverse running total, 19, exceeded the number remaining, 12.
- Sum the items from all transactions after that date, in this case the transaction on 2016-01-06, which was 9 items.
- On the date calculated in step 3, figure out how many of the items added to stock should be counted. This is the difference between the items remaining (12) and the previous running total (9). One way to calculate the previous reverse running total is by subtracting the items purchased or returned from the current reverse running total. Then the formula becomes Items Remaining – (Reverse Running Total – Items Purchased or Returned). For this example, that would be 12 – (19 – 10) which resolves to 3.
- Multiply remaining items by the price. For returned items, the price must be calculated. In the example above, the ten items returned are priced at $101 because that was the last purchase price.
|ArticleID||TrandDate||TranCode||Items||Price||Reverse Running Total||Source of Remaining Items|
Dave Ballantyne’s Original Winning Solution
Listing 1 shows Dave Ballantyne’s original solution. As part of his solution, he added several indexes, which was not against the rules. Also, evidently, creating the indexes didn’t count against the time to run the solution. On my laptop with 8 GB RAM, 4 procs and an SSD drive, the query took 1 second to run.
CREATENONCLUSTERED INDEX IX_Dave_General
INCLUDE ( Items, Price);
CREATENONCLUSTERED INDEX IX_Dave_Items
( ArticleID ASC,
TranDate ASC )
WHERE (TranCode IN('IN','RET'));
CREATENONCLUSTERED INDEX IX_Dave_Price
( ArticleID ASC,
INCLUDE ( Price)
AS (SELECT ArticleID ,
SUM(CASEWHEN TranCode ='OUT' THEN 0 - Items
GROUP BY ArticleID
AS (SELECT s.ArticleID,
( SELECT SUM(i.Items)
FROM dbo.StockAS i WITH( INDEX( IX_Dave_Items) )
WHERE i.ArticleID= s.ArticleID
AND i.TranCodeIN ( 'IN','RET' )
AND i.TranDate>= s.TranDate
) AS RollingStock,
s.Items AS ThisStock
FROM dbo.StockAS s
WHERE s.TranCodeIN ( 'IN','RET' )
/* Using the rolling balance above find the first stock movement in that meets (or exceeds) our required stock level */
/* and calculate how much stock is required from the earliest stock in */
AS (SELECT w.ArticleID,
+ LastPartialStock.StockToUseAS UseThisStock
FROM cteStockSum AS w
CROSS APPLY ( SELECTTOP ( 1)
z.ThisStockAS StockToUse ,
FROM cteReverseInSum AS z
WHERE z.ArticleID= w.ArticleID
AND z.RollingStock>= w.TotalStock
ORDER BY z.TranDateDESC
) AS LastPartialStock
/* Sum up the cost of 100% of the stock movements in after the returned stockid and for that stockid we need 'UseThisStock' items' */
y.TotalStockAS CurrentItems ,
SUM(CASEWHEN e.TranDate= y.TranDateTHEN y.UseThisStock
END * Price.Price)AS CurrentValue
FROM cteWithLastTranDate AS y
INNER JOIN dbo.StockAS e WITH( INDEX( IX_Dave_Items) )
ON e.ArticleID= y.ArticleID
AND e.TranDate>= y.TranDate
AND e.TranCodeIN ( 'IN','RET' )
CROSS APPLY (
/* Find the Price of the item in */SELECT TOP( 1 )
FROM dbo.StockAS p WITH
(INDEX ( IX_Dave_Price ) )
WHERE p.ArticleID= e.ArticleID
AND p.TranDate<= e.TranDate
AND p.TranCode= 'IN'
ORDER BY p.TranDate DESC
) AS Price
GROUP BY y.ArticleID ,
ORDER BY y.ArticleID;
You can find a complete description of this solution in my original article, but I’ll summarize the highlights. The solution starts with an aggregate query in
cteStockSum, calculating the total remaining stock (
TotalStock) of each
ArticleID, after all transactions have completed. It then uses a correlated sub-query to find the reverse running total of articles purchased or returned, in
Having calculated the final count and the reverse running total, you can find the date where the number of items at the end of the queue meets or exceeds the
TotalStock amount. This is done in the
cteLastTranDte CTE, using
CROSS APPLY and
1). On that date, you may need all of the items or just some of the items.
CROSS APPLY, all columns from the row can be returned. Dave takes advantage of this and figures out how many items are needed on that date. In the outer query, the value for each row is calculated by multiplying the items by the price. The query in the
CROSS APPLY returns the price for the latest purchase, which is then the price applied for any the returns in the remaining stock.
The Aunt Kathi Window Function Method
There is no one tool for the job when working with T-SQL, and using windowing functions is not an exception to this rule. I systematically tried replacing each of Dave’s calculations with the equivalent calculation using window function, but occasionally the performance suffered badly as a result.
Finally, I settled on the code shown in Listing 2, which is sort of a hybrid approach. On my laptop, the query took 2 seconds to run. I was never able to match the 1-second execution time of Dave’s original code.
AS (SELECT ArticleID ,
SUM(CASEWHEN TranCode ='OUT' THEN-Items
GROUP BY ArticleID
AS (SELECT StockID ,
SUM(Items)OVER (PARTITION BY ArticleID ORDER BY TranDate
ROWS BETWEEN CURRENTROW ANDUNBOUNDED FOLLOWING ) ReverseTotal
WHERE TranCode IN( 'IN','RET' )
AS (SELECT DISTINCT
LAST_VALUE(TranDate)OVER (PARTITION BY P.ArticleID ORDER BY TranDate
ROWS BETWEEN CURRENTROW ANDUNBOUNDED FOLLOWING )AS TheDate
FROM ItemEndTotal AS T
JOIN ReverseRunningTotal AS P
ON T.ArticleID= P.ArticleID
AND P.ReverseTotal>= T.FinalCount
SUM(CASEWHEN TheDate = TranDate
THEN FinalCount- ( ReverseTotal - Items )
END * PurchasePrice)AS Value
FROM ReverseRunningTotal RRT
ON RRT.ArticleID= FindDate.ArticleID
CROSS APPLY (SELECT TOP( 1 )
Price AS PurchasePrice
FROM ReverseRunningTotal AS R
WHERE RRT.ArticleID= R.ArticleID
AND TranCode= 'IN'
AND R.TranDate<= RRT.TranDate
ORDER BY TranDate DESC
) AS P
WHERE RRT.TranDate>= TheDate
GROUP BY RRT.ArticleID ,
ORDER BY RRT.ArticleID;
I started with an aggregate query in a CTE,
ItemEndTotal, which provides a final count of each
ArticleID. I could have eliminated the
ItemEndTotal CTE and instead used a windowing aggregate function in the second CTE, as follows:
SUM(CASE WHEN TranCode= 'OUT'THEN -ItemsELSE Items END)OVER(PARTITIONBY P.ArticleID)
This, unfortunately, caused the performance to suffer so I abandoned that idea.
ReverseRunningTotal CTE, I used an accumulating window aggregate to calculate the reverse running total. If you use this technique, be sure to specify the frame (
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) for the best performance. When running this step in isolation, the windowing function method out performs Dave’s method.
To find the date at which the reverse running total equals or exceeds the amount remaining at the end of all transactions, I used the
LAST_ROW function. The
LAST_ROW function allows you to access any column in the last row of a partition. I could have used the
LAST_ROW function multiple times to grab all of the other columns needed from that row and then calculated how many stock items would be needed on the date. However, nesting the
LAST_ROW expressions in a calculation made this part of the query extremely complex. Instead, I just performed the calculation in the outer query which ended up being much simpler. However, I believe this step is where my process took extra time over Dave’s solution. I needed to return only one row for each
ArticleID, so I had to use
GROUP BY would not have worked in this situation because aggregation happens before windowing functions. Nevertheless, the performance was still acceptable, so I left this windowing function in place.
The outer query also uses a
CROSS APPLY to find the prices for the returned items, which I just copied from Dave’s code. There is a way to use windowing functions to find the prices; Listing 3 shows a sample of the alternative code.
AS (SELECT ArticleID ,
MAX(CASEWHEN TranCode ='IN' THEN StockID
END)OVER (PARTITION BY ArticleID ORDER BY TranDate
ROWS BETWEEN UNBOUNDEDPRECEDING ANDCURRENT ROW) AS MaxID
WHERE TranCode IN( 'RET','IN' )
SELECT ArticleID ,
MAX(Price)OVER (PARTITION BY ArticleID, MaxID) AS ThePrice
This is a technique I learned from the great Itzik Ben-Gan. In the first level, I am using
MAX as an accumulating aggregate. I find the maximum
StockID for purchases only up to the current row. This means that the
MaxID for each return row will be the
StockID of the previous purchase row. In the outer query, by partitioning by
MaxID, the maximum price returned is the price I need. The technique is elegant but, in my tests, it slowed the query down quite a bit. The data had to be filtered first by the
TranCode and then sorted by
TranDate, so there is no way to create a good index for the query. Sorting is the bottleneck, and using this technique ends up adding several seconds to the overall time.
I still remember staring at Dave’s winning code for hours back in 2010 trying to figure out how his solution even worked. Eventually, I figured out that the stock items left, not the in and out transactions from the beginning, were what was important. In this article, I was able to apply windowing functions every step of the way. Unfortunately, I had to back out in a couple of places to get acceptable performance.
In my experience, windowing functions are great tools to help you solve complex business problems. Once you begin using them, you may find that you start coming up with set based solutions more quickly. Your queries will be easier to understand and maintain. Often, they will perform better than queries using traditional methods. Like anything else, you must test each solution to make sure that the performance is acceptable.