22 March 2012

Bin Packing Problems: The SQL

The 'bin packing' problem isn't just a fascination for computer scientists, but comes up in a whole range of real-world applications. It isn't that easy to come up with a practical, set oriented solution in SQL that gives a near-optimal result.

The bin packing problem is an NP-complete problem. It is a great way to make computer science students do some work and it is also useful in the real world. The basic problem statement is that you are given a set of (n) items. The items all have different volumes. We then have a supply of bins or boxes of the same size. The goal is to put the items into the bins and to minimizes the number of bins used.

As a real world application for this algorithm, look at television advertisements of arbitrary durations. Given time slots of, say, five minutes how do we pack the the most commercials into each time slot and maximize our daily profits. Think of free public service announcements or promo spots as unfilled bins.

Another example is scheduling tasks with known execution times on a set of identical machines. Minimize the number of machines needed for the whole project. This is actually the simplest form of a more complex problem. The task usually have an order of execution. The wire has to be on the wheel before the lug nuts, etc. But is another article for later.

To make the math easier, we assume that the items are sized in integer values from 1 to (k) where (k <= bin size). And to keep thing simple, the bins are size (v). I happen to like (v =12) because 12 has a lot of divisors and because I can use egg cartons and colored plastic eggs when I teach this problem.

It is obvious that we can not have an item of size zero. In spite of what people believe about suitcases, you cannot cram an oversized load into a bin. It is also obvious that if an item is the size of the bin then you have filled that bin and need to get to the next one. We know that the number of bins is between ceiling(sum(item_size)/ bin size) and (k), the number of items.

If the items are all of size one, then we can simply fill bins and use that lower limit of ceiling(sum(item_size) / bin_size). There are sophisticated algorithms for particular item sets, too.

But in the general case, we have to fall back on heuristics. Frankly, the good ones are done with procedural code and not SQL. But there are some quick heuristics that work very well in the real world.

The First Fit Algorithm provides ie fast, but often gives a non-optimal solution. It is just what you chink and the way that people pack a suitcase. This assumes that the items are in a queue and you have enough bins. Grab an item and put it into the first bin in which it will fit.

The First Fit Algorithm requires θ(n log n) execution time, where (n) is the number of items to be packed. This is the same as a non-stable sorting algorithm. You can boost this algorithm by first sorting the list of items in decreasing order (First-Fit Decreasing Algorithm).

This does not guarantee an optimal solution, and for longer queues, it may increase the running time because of the extra sorting. It is known, however, that there always exists at least one ordering of the items queue that produces an optimal solution. You just cannot find it easily. In 1973, Jeffrey Ullman (a very important name in computer science) proved that this algorithm can differ from an optimal packing by as much at 70%.That same year, S. S. Johnson showed that this strategy is never suboptimal by more than 22%, and furthermore that no efficient bin-packing algorithm can be guaranteed to do better than 22%.

Another version is if this algorithm is the First-Fit Increasing Algorithm, which is the same queue sorted from smallest to largest item size. It has all the same problems and limits.

The Best Fit Algorithm needs some smarts. We you get the next item in the queue, you look over the partially filled bins for one which will be completely full if the item is added to it. This sounds like it ought to be optimal. Sorry, but it can fail, too. As an exercise, try writing a query to pick subsets of (n) items that total to the bin target bin size. If you cannot fill a bin, insert itnto the gin that will have the smallest space left; the best fit.

The (n = 1) case is trivial and it is one configuration that you have to have in the final solution. The cases for (n > 1) have to be done with self-joins.

The Worst Fit Algorithm has an unfortunate name. We look at the available partially filled bins and put the item into the bin with the most empty space.

There is a website with animations of of various algorithms from the University of Arizona’s computer science department. It is a demonstration of the ICON programming language. It is as high level language that started off as a string manipulation language to replace SNOBOL and then it evolved.

SQL and Bins

Getting a model for bin packing in SQL is not that hard, but the constraints are harder because they have to be at different level of aggregation. :et me jump into my DDL and then explain it.

The size and the item numbers are constants which we load the set of items. Bin zero is essentially loose items on the floor, waiting to get packed. A VIEW of the bins will be handy, so let’s do that:

The WITH CHECK OPTION is not well-known but it has been a part of SQL since the SQL-92 Standards. The updatable view is protected from any update that would change a row in violation of the WHERE clause. Try to do this insertion:

leads to this error message:

However, this is just fine:

This might be easier to see with actual data, Start with this queue {6, 6, 5, 5, 5, 4, 4, 4, 2, 2, 2, 2, 3, 3, 7, 7, 5, 5, 8, 8, 4, 4, 5}   which gives us 110 units to go into bins of size of 9 units. If we did a first fit algorithm, the table would look like this:

Doing the math, the theoretical lower bound is 13 boxes.  This gave us 17 boxes which looks like this:

Box nbr

Packing

1

6

2

6

3

5

4

5

5

9

6

8

7

8

8

7

9

3

10

7

11

7

12

5

13

5

14

8

15

8

16

8

17

5

Can we actually hit that lower bin count? Maybe, but the only way to find out is to try all possible packing arrangements in the general case.

One other estimation we can make is to try to try all combinations of (k) items and get a heretical upper bound.  The basic approach is a self join and hope that you do not a lot of small items.

I wonder if I can pack three items in a b, so we try that:

 Let’s try packing all combinations of four items. It still works. Push this to four items per bin with an extension of the pattern we have been using, we get this query.

Again, we can extend the query to five items with this:

but now we get a failures at five items in a bin. Obviously, it would be better to start with a high value of (k) and keep reducing the value until you get a result back.

As it works out, with this data the Decreasing First Fit algorithm gives an optimal result. That queue would look like this: {8, 8, 7, 7, 6, 6, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 3, 3, 2, 2, 2, 2}. The results are like this:

Bin nbr

Packing

1

8

2

8

3

9

4

9

5

9

6

9

7

9

8

9

9

9

10

9

11

9

12

9

13

4

Do not be fooled by this!  If we has a bin of size 10 and five items of size 6, the theoretical lower bound would be three bins using ((5 * 6)/10). But it requires five bins no matter which algorithm you pick.

Without generating all possible permutations, see if you can come up with a practical, set oriented solution that gives a near-optimal result.

References:

  • Albers, S. and Mitzenmacher, M. “Average-Case Analysis of First Fit and Random Fit Bin Packing.” Random Structures Alg. 16, 240-259, 2000.
  • Coffman, E. G. Jr.; Garey, M. R.; and Johnson, D. S. “Approximation Algorithms for Bin-Packing–An Updated Survey.” In Algorithm Design for Computer System Design. Vienna: Springer-Verlag, pp. 49-106, 1984.
  • Garey, M. R.; Graham, R. L.; and Ullman, J. D. “An Analysis of Some Packing Algorithms.” In Combinatorial Algorithms. New York: Algorithmics Press, pp. 39-47, 1973.
  • Graham, R. L. “Bounds on Performance of Scheduling Algorithms.” In Computer and Job-Shop Scheduling Theory (Ed. E. G. Coffman Jr.). New York: Wiley, pp. 165-227, 1976.
  • Johnson, D. S. “Approximation Algorithms for Combinatorial Problems.” In J. Comput. System Sci. 9, 256-278, 1974.
  • Johnson, D. S. “Approximation Algorithms for Combinatorial Problems.” In Fifth Annual ACM Symposium on Theory of Computing (Austin, Tex., 1973). New York: Assoc. Comput. Mach., pp. 38-49, 1973.
  • Hoffman, P. The Man Who Loved Only Numbers: The Story of Paul ErdÅs and the Search for Mathematical Truth. New York: Hyperion, 1998.
  • Lewis, R.; 2009. “A General-Purpose Hill-Climbing Method for Order Independent Minimum Grouping Problems: A Case Study in Graph Colouring and Bin Packing”; Computers

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

This post has been viewed 18526 times – thanks for reading.

Tags: , ,

  • Rate
    [Total: 20    Average: 4.3/5]
  • Share

Joe Celko is one of the most widely read of all writers about SQL, and was the winner of the DBMS Magazine Reader's Choice Award four consecutive years. He is an independent consultant living in Austin, TX. He has taught SQL in the US, UK, the Nordic countries, South America and Africa.
He served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards.
He has written over 800 columns in the computer trade and academic press, mostly dealing with data and databases. He is the author of eight books on SQL for Morgan-Kaufmann, including the best selling SQL FOR SMARTIES.
Joe is a well-known figure on Newsgroups and Forums, and he is famous for his his dry wit. He is also interested in Science Fiction.

View all articles by Joe Celko

  • Hugo Kornelis

    Bin packing
    “see if you can come up with a practical, set oriented solution that gives a near-optimal result.”

    I’ve done so (and published it) a long time ago already. See http://sqlblog.com/blogs/hugo_kornelis/archive/tags/Bin+Packing/default.aspx

  • Ross Presser

    Editing
    Mr. Celko, I appreciate the technical smarts of this article, but what the heck happened to the writing? There are a few dozen typos before I even get to the first bit of SQL code. The page you linked to has VISUALIZATIONS of various algorithms, not ANIMATIONS — you are not shown the process of packing, just the result. And the typos continue right through to the end … “now we get a failures”??? Please, let someone proofread before you submit an article.

  • Joseph

    typo, or religious mathematics?
    I’m curious what a “heretical upper bound” would be.

    Is it a bound that causes the Inquisitor to pay you a visit? Has it been banned by The Church?

    Or, sadly, is it just a typo for “theoretical”?

  • aikimark

    BIF
    Granted, my approach (Big-In-First) would have been more iterative than set, it should have a T-SQL implementation.

    Similar to your Bin table, create a table that has a capacity (9) column for the bin. Initialize the table with 23 rows. For big sets of data, define an index on the Capacity column.

    With a cursor (there, I said it), iterate through a descending sorted list of items, assigning the item to the first bin with capacity. A second (update) query is invoked to decrement the available space for that bin.

    Delete any bin table rows that still have their initial capacity (9). This produces the following configuration of bin assignments.

    —— Items ———- Sum Capacity
    8 8 1
    8 8 1
    7 2 9 0
    7 2 9 0
    6 3 9 0
    6 3 9 0
    5 4 9 0
    5 4 9 0
    5 4 9 0
    5 4 9 0
    5 4 9 0
    5 2 7 2
    2 2 7

  • aikimark

    sorry about the formatting
    The tab-delimited data looks so much nicer, trust me.

  • aikimark

    sorry about the formatting
    The tab-delimited data looks so much nicer, trust me.

  • Bryant

    Take the next step
    Here is an extension to the problem. Packing bins is only 1 part of the issue. My assumption is that you are packing them fore some reason other than esthetics…or theoretical problem-solving.

    Now that you have everything neatly packed away in the fewest number of bins, make them fit into a standard container, like a semi-trailer or a shipping container. You need to maximize the cubage used because anything less represents wasted fuel and lost revenue. That is a practical application.

  • Anonymous

    Knapsack Problem and free book download
    The Knapsack Problem is related to Bin Packing and it has been covered in other articles. A hueristic for Bin packing is to do it as a series of knapsack problems. Pack a knapsack, remove those items from the inventory and pack a second knapsack. Repeat until the inventory is empty.

    Ond toeh best books on this topic is “Knapsack Problems: Algorithms and Computer Implementation” by Martello and Toth (John Wiley and Sons, 1990) is avaialble as a pdf on line at http://www.or.deis.unibo.it/kp/Chapter1.pdf.

    The bad news is that the code is in unstructured FORTRAN.

  • Robert young

    Yet Another Step
    Bins
    Ship Containers

    then

    54 footer loaded for customer drops. This is bin packing meets the traveling salesman. The semi has to be: weight balanced, in delivery order, and maximally filled. People earn a nice living writing such subsystems.