Bowled Over by SQL Window Functions

What better way to learn how to construct complex CHECK CONSTRAINTs, use the SQL 2012 window frame capability of the OVER clause and LEAD analytic function, as well as how to pivot rows into columns using a crosstab query? Create the SQL code to score Ten-pin Bowling, of course. Dwain Camps explains the how, what and why.

There is nothing like a real task to illustrate how to use the SQL 2012 LEAD analytical function, the new window frame facility of the OVER clause and a bit about constructing CHECK CONSTRAINTs.

When I was growing up, one of my favorite sports was bowling. It was only later in life after I moved overseas that I realized that bowling actually has another connotations, both in the game of cricket and in lawn Bowls. Bowling is, in Europe called Ten-pin bowling or skittles.

 In bowling, the way that the game is scored is rather complicated, with some arcane conventions. Because of this, it makes a good example to illustrate the power of the analytic functions in SQL 2012, especially when used together with constraints.

Scoring in Bowling

In bowling, the objective is to roll a ball down the lane and knock down all of the ten pins sitting at the end of the lane in each frame. Ideally, you do this in one roll of the ball, but you get two tries.

While in the modern age, we have computerized bowling lanes that tally and provide an overhead projection of your score in bowling, in the old days a bowling game was scored using a score sheet that looks like this.

2131-1-ab73eb1a-638c-4968-8a22-c4602cd3e

In bowling, balls are rolled and tallied in pairs, with the exception of the tenth frame where there are cases when you will be allowed to roll three balls. Each of the numbered boxes above represent a frame. There are three types of frame depending on the number of pins you knock down:

  • Strike – when the first ball you roll for the frame knocks down all of the ten pins (represented on the score sheet as an X).
  • Spare – when the first ball you roll knocks down (none or) some but not all of the ten pins, and you knock down the rest of the pins with your second ball (represented on the score sheet as a slash for the second ball).
  • Open – when you fail to knock down all ten pins on two attempts.

Strikes and spares are called “marks” (probably because they are represented by X’s and slashes), and if you get a mark in frame ten you’ll be allowed to roll three balls in that frame. We’ll explain the details of scoring in a moment, but for right now let’s focus on how many balls you can roll in a game.

The maximum number of balls you can roll in a game is 21, which results if you get a spare or open frame for the first nine frames, then roll a spare in the tenth frame allowing you to roll one additional ball in that frame. You’ll note that this is the same as the number of small boxes in the score sheet (two in each of the frames one through nine, plus three in the tenth frame).

The minimum number of balls you can roll in a game of bowling is 12, if you were to manage to roll a strike in each frame (knocking down all the pins on the first ball means you don’t need to throw another ball in that frame) and roll three strikes in the tenth frame. In this case, you would achieve a perfect game. The last two balls in the tenth frame are required to score the first strike in that frame.

Scoring for each frame depends on the type of frame you rolled:

  • Open – the frame score is the sum of the prior frame score plus the pin count for the two balls you rolled in the frame.
  • Spare – the frame score is the sum of the prior frame score plus ten (the spare) plus the next ball you roll, or the first ball of the next frame.
  • Strike – the frame score is the sum of the prior frame score plus ten (the strike) plus the number of pins you knock down on the next two balls. This could come from the next frame only or the next two frames if you roll two strikes in a row. In the tenth frame, if you roll a strike on your first ball you get to roll two more balls and these are all added together to get a maximum of 30 points for that frame.

I can still remember my dad teaching me how to score bowling!

Let’s look at a couple of example frames for Big Al and show how strikes and spares appear on the score sheet, and how this tallies out to a score.

2131-1-803597c5-3fbf-42e3-b4a8-d1c28d066

In this game, Big Al has rolled only nine pins (an open frame) in frame one and in the second frame he knocks down three pins on his first ball, plus seven more on his second ball (a spare). Frame two cannot be scored until he rolls one more ball.

2131-1-88c103cf-29b2-4dcd-8484-186bcc9c2

Let’s say that Big Al rolls a six in frame three on the spare in frame two (so now frame two can be scored), follows with another spare and then a strike in frame four as shown above. Sixteen (3 + 7 + 6) is added to his score for frame one to get 25 in frame 2, while his spare plus strike combination in frames three and four net him twenty additional points in frame three.

Now let’s say he gets on a roll and bowls another strike. Frame four cannot yet be scored until he rolls one more ball, because of the scoring rule that a strike scores ten pins plus the next two balls.

2131-1-aa4fb541-7b26-43f0-abe6-b31579c99

In bowling parlance, two strikes in a row is known as a “double.” So let’s say he really goes on a roll and bowls another strike.

2131-1-b505e0d2-893f-421b-a4e3-22866cf51

Now we can score frame four, but frames five and six cannot yet be scored. Big Al’s three strikes in a row is called a “triple” or more commonly a “turkey.” Now we’ll fill out frames seven, eight and nine, and score the results.

2131-1-2e87ad0c-90de-42c6-a650-c22c96ec6

At this point, Big Al is doing pretty good. All he needs is eight pins to bowl what is known as “scratch.” People that can average 180 pins or more per game are known as “scratch bowlers,” which in league bowling means they are assigned no handicap, which is usually calculated as 80% of 180 minus their league average.

Let’s look at a few possible outcomes for the tenth frame.

2131-1-933d0422-3df2-4d00-9417-d12e74297

In this case, Big Al did not get a strike or a spare: He got an open frame instead, so he doesn’t get to roll a third ball. Still, his game is quite good at a total score of 181.

2131-1-4d6622dc-71f1-4662-a3f5-3835e8e36

If Big Al rolls a spare in frame ten, he gets to roll a third ball. So his total score for that frame is nineteen. Another excellent game.

2131-1-2f6657e8-6404-4cb5-8496-34302f6c7

If he bowls a strike on his first ball, he gets to roll two more balls. In this case, he managed a spare on top of the strike, to add twenty to his score in frame nine.

2131-1-baa47118-3a35-4773-9533-d47fe48a1

A perfect tenth frame consists of another turkey, adding thirty points to his frame nine score for a total of 202. Bowling a 200 game in bowling is kind of a magic number. Any time you can do that you’ve had a great game!

Here is what a perfect game looks like.

2131-1-de92d5a1-6ad7-4a75-b9c7-e0e755243

Nearly as rare as a perfect game, there is this symmetrical combination of alternating strikes and spares, resulting in a total score of 200.

2131-1-2c0dc149-afa2-4ad5-a08b-4b4e995bc

This is called a “Dutch” game, for reasons I was never able to fathom. Like any other sport, bowling has its own lexicon of cool slang including gutter, curve and hook balls.

In case you’re wondering, I’ve never been a scratch bowler (without a handicap) but my high score was 233 and that came along with my personal best of seven strikes in a row! What can I say, I really messed up the other frames of that game to get only a 233.

Now that you’ve learned the ins-and-outs of scoring a game of bowling, let’s see how we might do this in SQL.

A Bowling Scores Table and some Validations

First let’s create a table to store our bowling scores, and provide you a drop statement you can use later to clean up your sandbox.

Note that the first ball in any particular frame must always be present (NOT NULL), while rolling additional balls in a frame may be optional (so they are allowed to be NULL).

A game in bowling only has ten frames, so we’ll add a CHECK CONSTRAINT for that.

The first ball in any frame can knock down as few as zero pins (known as a “gutter ball”) or as many as ten (a strike).

The CHECK CONSTRAINTs get a little more tricky at this point because they depend on whether you’re in the tenth frame or not.

Hopefully the comments above guide you through what is being checked to ensure a valid pin count for the second ball of each frame.

Once again, for the third ball we must break the check into pieces depending on whether it occurs in frame ten.

Because the PRIMARY KEY consists of game number, person ID and frame number, it won’t be possible to insert the same frame for a game/player more than once. About the only thing we can’t verify using CHECK CONSTRAINTs is a missing frame, or when the next frame is inserted but the second ball of the prior frame was not recorded (and that frame was not a strike).

As with any code you write, you should test your CHECK CONSTRAINTs thoroughly to make sure they don’t let any bad data get through. Here’s a series of test INSERTs that should all fail for the reasons specified in the comments.

A Quick Overview of the T-SQL LEAD Analytical Function

The T-SQL LEAD analytical function consists of the call to LEAD with one to three arguments supplied, and an OVER clause.

The second and third arguments are optional:

  • row_offset – default is 1 (return the next row).
  • default_value – default value returned if there is no next row (instead of NULL).

The OVER clause requires an ORDER BY, but PARTITION BY is optional.

Scoring a Bowling Game

Let’s start with some sample data that is captured on a per-ball basis, based on Big Al’s game above. The objective is to write a T-SQL query that will display the score for the frame even if the frame is incomplete. This is actually a bit harder to do than it would be if you had to insert the complete results for a frame.

We can begin creating our set-based scoring algorithm using the SQL 2012 window frame feature of the OVER clause to calculate a running sum across frames. As a starting point, we’ll mostly ignore strikes and spares, but we’ll create a flag (FrameType) for use later to tell us what kind of frame has been bowled.

Note that we have included some results (NextBall1, NextBall2 and NextBall3) that we are not yet using but will use later when we add the logic for scoring strikes and spares. The LEAD function returns a NULL value in cases where the next frame has not been recorded. We’ve also included a stub Common Table Expression (CTE) named FrameScore where we will place most of the scoring logic.

This gives us the correct result for scoring Big Al’s first three balls.

While the results are just what we need, our CASE logic in the final SELECT will require some additional work to handle the scoring when spares and strikes are the result in a frame.

On his next ball, Big Al gets seven more pins making a spare.

If we run the same query, these are our results:

We see that frame two has still not been scored, which is the correct result because Big Al must bowl another ball before it can be scored. On his next ball (on the first ball of frame three), Big Al knocks down six pins.

To score his spare in frame two properly, we must do two things to our query.

Our changes are:

  • In the CTE named FrameScore we changed the calculation of FramePins to be a CASE statement that looks ahead to the next ball to calculate total pins correctly when the total for two balls is ten (a spare).
  • We’ve modified the calculation of FrameScore to contain the score in a frame with a spare if the next ball has been rolled.

We see that the results are now calculated correctly for his spare.

When Big Al gets four more pins in the third frame to make his spare:

Our scoring algorithm remains correct.

Big Al then gets a strike in frame four.

Our scoring algorithm still works, but it will need some changes to make it score the next balls properly.

Here is that modified query.

We’ve added the scoring logic for a strike (adding the next two balls) in the FrameScore CTE and modified the CASE in the last select to display the score for a strike once it is completed. Let’s complete Big Al’s turkey with two more strikes.

Our results show that the first strike has now been scored properly, and the interested reader who is following along can verify that other combinations of pin counts also work.

We’ll dispense with the statements that add an individual ball at a time and simply insert the next three frames for Big Al’s game.

Big Al’s score now looks as follows (correct when matched against the scoring sheet we showed earlier):

Scoring the Tenth Frame

To score the tenth frame properly, just a little tweak is required to our query (in the PinCounts CTE).

The new logic that applies to frame ten can be seen as the third argument to LEAD (the result to return instead of NULL) for NextBall1 and NextBall2 in the PinCounts CTE.

Let’s see how this works for each of the outcomes of frame tens we suggested to complete Big Al’s game. I will show only the tenth frame for each.

The interested reader can verify that if frame ten is not complete, the FrameScore is NULL (for all possible cases).

Scoring a Game of Strike/Spare/Strike (in Consecutive Frames)

Let’s say that Big Al is on another roll in game two. We can insert a second game quite easily, making it consecutive strikes and spares.

Our bowling-scoring algorithm shows us this for the result:

Scoring a Perfect Game

In the final game of his three game series, Big Al rolls a perfect game, basically the game of his life!

If you ever do this in league bowling, trust me you will have too much to drink that night!

Averaging the Scores for a League Bowling Night

Typically in a league bowling night, each player bowls three games. To average all the games for the players, the same query can be used with some simple modifications (like encapsulating the former query into a second CTE).

The above query will show the three game averages for all players, assuming that you had that data available. In this case, we’ve used Big Al’s first game of 181 to calculate the average.

A Query to Display the Game Results Like a Score Sheet

Using the same three CTEs developed above (PinCounts, FrameScore, GameScores) we can construct a final SELECT that flattens the results for each player/game. The CTEs are omitted for brevity.

This is referred to as a crosstab query because the result pivots the rows to columns. The CROSS APPLY is used to calculate intermediate results that greatly simplifies the redundancy that would have otherwise been present in each MAX function.

If you’ve never seen or written a crosstab query before, we encourage you to add in the results for the total frame score where it makes sense to do so.

Alternative Data Modeling

A data modeling purist may argue that it would be better to have a BallID column in the table, with one column to record the pin score for that ball. Then BallID could be 1 or 2 in any frame, and 3 would be allowed in frame ten. This also allows for one row to be recorded in frames one through nine when a strike is obtained by the bowler.

However that approach would not allow for the same level of validations performed by the CHECK CONSTRAINTs that we’ve provided.

Another approach would be to include a column for FrameScore in the table. Since FrameScore can easily be calculated as we’ve shown, that is not a normalized data model. However if you chose to use that approach, you could also calculate the FrameScore in an AFTER INSERT, UPDATE TRIGGER. Because of the need to “look forward,” you would not be able to use a computed column.

Conclusions and What we have Learned

More than anything, I hope that you’ve concluded that learning complex SQL can be fun!

In this article, we have demonstrated quite a number of somewhat advanced T-SQL features including:

  • How to construct complex CHECK CONSTRAINTs to ensure that data being inserted into your target table is valid.
  • Using the SQL 2012 window frame capability of the OVER clause to calculating a running total.
  • Using the SQL 2012 LEAD analytic function to “look ahead” to later rows in a set.
  • Pivoting rows into columns using a crosstab query.

Because we have built up our query one little bit at a time working through a scoring example, it may be possible to further simplify the query and make it a little bit shorter. We’ll leave that as an exercise for our more intrepid readers, and we are interested to see those results posted to the comments section of this article.

Our methodology demonstrated how you can take a relatively complex problem and break it into sub-problems that you can solve one little bit at a time, and this is an essential skill in formulating and writing complex queries in any dialect of the SQL language.

Tags: , ,

  • 12368 views

  • Rate
    [Total: 12    Average: 4.5/5]
  • Anonymousest

    Just a slight correction.
    I will need to revisit this to learn further. It is thought overload on one sitting.

    However, I will politely point out that the minimum number of throws is 11, rather than 12. First 9 frames, all strikes. 10th frame, in the first two throws, you fail to knock all pins down.

    How about duck pin bowling?

  • Dwain.C

    Anonymousest’s minimum ball count
    Dude you got me! I think the scenario for the minimum number of balls in a game is likely something that no bowler would even want to think about, so that’s probably why I forgot about it.

    As to duck pin bowling, I recall my dad speaking of it but I’ve never seen it played so I can’t comment.

  • paschott

    Duck pin scoring
    Duck Pin is similar to bowling, but you get three balls per frame. The scoring then becomes Strike, Spare, or whatever you knock down. Strikes score 10 + the next two balls rolled. Spares score 10 + the next ball rolled. You still get no more than three rolls in the 10th frame. It’s a lot of fun to play – the balls are more like bocci balls in size, the pins are "cut down" ten pins so when knocked over, they look sort of like a duck and have less chance of knocking over other pins as they spin.

    It’s a Northeast/Eastern seaboard thing so finding it outside of those states is really unlikely. However if you get a chance to play and like bowling, it’s worth it for the experience.

    Great article and examples – scoring for bowling is a fun exercise for coding.

  • Dwain.C

    To paschott on Duck Pin Bowling
    I remember my dad talking about duck pin bowling when I was a kid, and the part about 3 balls per frame rings a bell.

    Since scoring sounds pretty similar to 10 pin bowling, I’d bet that the final query would look pretty close to the same, and that the same data structure could be used with some modifications to the constraints.

    Might be a good challenge for someone that wants to learn from this example. Thanks paschott!

  • Wilfred van Dijk

    Nice
    Discover the internals of the bowling computer! Nice approach for learning SQL

  • Anonymous

    Lead replacement?
    How would you accomplish what the lead function is doing on sql server 2008 r2? It is still a mainstay in corporate America.

  • Anonymous

    Lead replacement?
    Great article by the way. Really enjoyed the read.

  • Anonymous

    Nice Article
    I’d never really come across a use for LEAD, and you’ve shown me a great one! Nice article.

  • Farrell Thomas

    Good Example
    This was a great article that I could really relate to. Back in the day before electronic scoring, I was the only one of my bowling friends who knew how to keep score. Thanks for posting this.

  • Phil Factor

    Bumble Puppy
    At last I know how to score Bumble Puppy properly in the pub. (it is a form of skittles played by swinging a ball on a chain).

  • Dwain.C

    To: Anonymous (Lead Replacement)
    There are ways that LEAD can be simulated in older versions of SQL, but they are generally not as efficient without precisely the right indexing.

    The harder part actually might be the rolling totals part of the scoring. The whole query could become quite the bear in older SQL.

    Back in 2004 Andy Warren posted this challenge:
    http://www.sqlservercentral.com/articles/Miscellaneous/thebowlingchallenge/1440/

    I have not been able to locate the solutions that were submitted, but they should be interesting and answer the question about LEAD at the same time. I have pinged him via Twitter and if he sends a response I’ll post it here.

  • Dwain.C

    To Others
    To all of the others that seemed to like the article. Thanks for your support!

    When I thought of the idea (hadn’t seen Andy’s post that I referred to above prior to writing this) I knew it would be an excellent demo problem for the topics I covered. Solving it was a nice challenge. The solution provided is actually the simplest (on the surface) of the 3 that I came up with. Probably the easiest to explain too.

    Note that if anybody thinks of a cool challenge SQL problem that might be worthy of writing about as a SQL learning experience for others, I’d be proud to hear about it (especially if I can solve it). You can get in touch with me through this site or Twitter.

  • Anonymous

    10Pin is a Subset of Skittles
    Hi Dwain, as you seem interested in digging deeper, you should look at the Wikipedia entry for Skittles. This is not another name for ten-pin bowling but a generic term for a wide variety of games. Nine-pin was the most popular but the introduction of a tax specific to nine-pins made the most popular form chenge to ten. It would be interesting to devekop SQL to cope with all the varieties but your time until retirement might be better spent on some thing else!

  • Anonymous, well, actual RWillemain

    interesting application
    Thank you very much for this article.
    I might need to re-read abit more to really appreciate the whole of the example.

  • eremmel

    It might be written a little more condensed
    The solution focuses on the lead() of all three PinsBalls in the data model. Take only the next two values into consideration leads to a more condensed solution:
    WITH PinCounts as (
    select GameID, PlayerID, Frame, PinsBall1, PinsBall2, PinsBall3
    , PinsBall1/10 + (PinsBall1+coalesce(PinsBall2,0))/10 as FrameType
    , coalesce( lead(PinsBall1,1) over (PARTITION BY GameID, PlayerID ORDER BY Frame)
    , PinsBall3 ) as NextBall1
    , coalesce( lead(PinsBall2,1) over (PARTITION BY GameID, PlayerID ORDER BY Frame)
    , lead(PinsBall1,2) over (PARTITION BY GameID, PlayerID ORDER BY Frame)
    ) as NextBall2
    from wnf.BowlingScores
    )
    select GameID, PlayerID,Frame, PinsBall1, PinsBall2, PinsBall3, FrameType, NextBall1, NextBall2,
    CASE WHEN FrameType = 0 AND PinsBall2 IS NOT NULL
    OR FrameType = 1 AND NextBall1 IS NOT NULL
    OR FrameType >= 2 AND NextBall2 IS NOT NULL
    THEN
    sum (PinsBall1 + coalesce(PinsBall2,0)
    + case when FrameType >= 1 then NextBall1 else 0 end
    + case when FrameType >= 2 then NextBall2 else 0 end
    ) over ( PARTITION BY GameID, PlayerID
    ORDER BY Frame
    ROWS UNBOUNDED PRECEDING )
    END as Score
    from PinCounts
    order by 1,2,3;
    Note FrameType goes from 0,1,2 and 3 (in case of frame 10 with 10,10,10 score).

  • Dwain.C

    eremmel’s condensed version
    Thanks eremmel! Just goes to show there’s always more than one way to skin a SQL cat!