The 'bin packing' problem isn't just a fascination for computer scientists, but comes up in a whole range of real-world applications. It isn't that easy to come up with a practical, set oriented solution in SQL that gives a near-optimal result.
The bin packing problem is an NP-complete problem. It is a great way to make computer science students do some work and it is also useful in the real world. The basic problem statement is that you are given a set of (n) items. The items all have different volumes. We then have a supply of bins or boxes of the same size. The goal is to put the items into the bins and to minimizes the number of bins used.
As a real world application for this algorithm, look at television advertisements of arbitrary durations. Given time slots of, say, five minutes how do we pack the the most commercials into each time slot and maximize our daily profits. Think of free public service announcements or promo spots as unfilled bins.
Another example is scheduling tasks with known execution times on a set of identical machines. Minimize the number of machines needed for the whole project. This is actually the simplest form of a more complex problem. The task usually have an order of execution. The wire has to be on the wheel before the lug nuts, etc. But is another article for later.
To make the math easier, we assume that the items are sized in integer values from 1 to (k) where (k <= bin size). And to keep thing simple, the bins are size (v). I happen to like (v =12) because 12 has a lot of divisors and because I can use egg cartons and colored plastic eggs when I teach this problem.
It is obvious that we can not have an item of size zero. In spite of what people believe about suitcases, you cannot cram an oversized load into a bin. It is also obvious that if an item is the size of the bin then you have filled that bin and need to get to the next one. We know that the number of bins is between ceiling(sum(item_size)/ bin size) and (k), the number of items.
If the items are all of size one, then we can simply fill bins and use that lower limit of ceiling(sum(item_size) / bin_size). There are sophisticated algorithms for particular item sets, too.
But in the general case, we have to fall back on heuristics. Frankly, the good ones are done with procedural code and not SQL. But there are some quick heuristics that work very well in the real world.
The First Fit Algorithm provides ie fast, but often gives a non-optimal solution. It is just what you chink and the way that people pack a suitcase. This assumes that the items are in a queue and you have enough bins. Grab an item and put it into the first bin in which it will fit.
The First Fit Algorithm requires θ(n log n) execution time, where (n) is the number of items to be packed. This is the same as a non-stable sorting algorithm. You can boost this algorithm by first sorting the list of items in decreasing order (First-Fit Decreasing Algorithm).
This does not guarantee an optimal solution, and for longer queues, it may increase the running time because of the extra sorting. It is known, however, that there always exists at least one ordering of the items queue that produces an optimal solution. You just cannot find it easily. In 1973, Jeffrey Ullman (a very important name in computer science) proved that this algorithm can differ from an optimal packing by as much at 70%.That same year, S. S. Johnson showed that this strategy is never suboptimal by more than 22%, and furthermore that no efficient bin-packing algorithm can be guaranteed to do better than 22%.
Another version is if this algorithm is the First-Fit Increasing Algorithm, which is the same queue sorted from smallest to largest item size. It has all the same problems and limits.
The Best Fit Algorithm needs some smarts. We you get the next item in the queue, you look over the partially filled bins for one which will be completely full if the item is added to it. This sounds like it ought to be optimal. Sorry, but it can fail, too. As an exercise, try writing a query to pick subsets of (n) items that total to the bin target bin size. If you cannot fill a bin, insert itnto the gin that will have the smallest space left; the best fit.
The (n = 1) case is trivial and it is one configuration that you have to have in the final solution. The cases for (n > 1) have to be done with self-joins.
The Worst Fit Algorithm has an unfortunate name. We look at the available partially filled bins and put the item into the bin with the most empty space.
There is a website with animations of of various algorithms from the University of Arizona's computer science department. It is a demonstration of the ICON programming language. It is as high level language that started off as a string manipulation language to replace SNOBOL and then it evolved.
SQL and Bins
Getting a model for bin packing in SQL is not that hard, but the constraints are harder because they have to be at different level of aggregation. :et me jump into my DDL and then explain it.
CREATE TABLE Bins
(item_nbr SMALLINT DEFAULT 0 NOT NULL PRIMARY KEY
CHECK(item_nbr > 0),
item_size SMALLINT DEFAULT 0 NOT NUlL
CHECK(item_size > 0),
bin_nbr SMALLINT DEFAULT 0 NOT NULL
CHECK(bin_nbr >= 0));
The size and the item numbers are constants which we load the set of items. Bin zero is essentially loose items on the floor, waiting to get packed. A VIEW of the bins will be handy, so let's do that:
CREATE VIEW Bins_Loads (bin_nbr, load_qty)
AS
SELECT bin_nbr, SUM (item_size)
FROM Bins
GROUP BY bin_nbr;
Let's set the bin size at 9. That means we need a constraint to enforce the condition that no bin is over-filled.
CREATE VIEW Bins2 (item_nbr, item_size, bin_nbr)
AS
SELECT item_nbr, item_size, bin_nbr
FROM Bins
WHERE 9
>= ALL (SELECT SUM (item_size)
FROM Bins
WHERE bin_nbr > 0
GROUP BY bin_nbr)
WITH CHECK OPTION ;
The WITH CHECK OPTION is not well-known but it has been a part of SQL since the SQL-92 Standards. The updatable view is protected from any update that would change a row in violation of the WHERE clause. Try to do this insertion:
INSERT INTO Bins2 (item_nbr, item_size, bin_nbr)
VALUES (99, 4, 1);
leads to this error message:
Msg 550, Level 16, State 1, Line 2
The attempted insert or update failed because the target view either specifies WITH CHECK OPTION or spans a view that specifies WITH CHECK OPTION and one or more rows resulting from the operation did not qualify under the CHECK OPTION constraint.
The statement has been terminated.
However, this is just fine:
INSERT INTO Bins2 (item_nbr, item_size, bin_nbr)
VALUES (99, 1, 1);
This might be easier to see with actual data, Start with this queue {6, 6, 5, 5, 5, 4, 4, 4, 2, 2, 2, 2, 3, 3, 7, 7, 5, 5, 8, 8, 4, 4, 5} which gives us 110 units to go into bins of size of 9 units. If we did a first fit algorithm, the table would look like this:
INSERT INTO Bins(item_nbr, item_size, bin_nbr)
VALUES (1, 6, 1), (2, 6, 2), (3, 5, 3),
(4, 5, 4), (5, 5, 5), (6, 4, 5),
(7, 4, 6), (8, 4, 6), (9, 4, 7),
(10, 2, 7), (11, 2, 7), (12, 2, 8),
(13, 2, 8), (14, 3, 8), (15, 3, 9),
(16, 7, 10), (17, 7, 11), (18, 5, 12),
(19, 5, 13),(20, 8, 14),(21, 8, 15),
(22, 4, 16), (23, 4, 16), (24, 5, 17);
Doing the math, the theoretical lower bound is 13 boxes. This gave us 17 boxes which looks like this:
Box nbr |
Packing |
1 |
6 |
2 |
6 |
3 |
5 |
4 |
5 |
5 |
9 |
6 |
8 |
7 |
8 |
8 |
7 |
9 |
3 |
10 |
7 |
11 |
7 |
12 |
5 |
13 |
5 |
14 |
8 |
15 |
8 |
16 |
8 |
17 |
5 |
Can we actually hit that lower bin count? Maybe, but the only way to find out is to try all possible packing arrangements in the general case.
One other estimation we can make is to try to try all combinations of (k) items and get a heretical upper bound. The basic approach is a self join and hope that you do not a lot of small items.
SELECT B1.item_nbr AS item_nbr_1, B2.item_nbr AS item_nbr_2,
(B1.item_size + B2.item_size) AS packing
FROM Bins AS B1, Bins AS B2
WHERE B1.item_nbr < B2.item_nbr
AND (B1.item_size + B2.item_size) <= 9;
I wonder if I can pack three items in a b, so we try that:
SELECT B1.item_nbr AS item_nbr_1,
B2.item_nbr AS item_nbr_2,
B3.item_nbr AS item_nbr_3,
(B1.item_size + B2.item_size + B3.item_size) AS packing
FROM Bins AS B1, Bins AS B2, Bins AS B3
WHERE B1.item_nbr < B2.item_nbr
AND B2.item_nbr < B3.item_nbr
AND (B1.item_size + B2.item_size + B3.item_size) <= 9;
Let's try packing all combinations of four items. It still works. Push this to four items per bin with an extension of the pattern we have been using, we get this query.
SELECT B1.item_nbr AS item_nbr_1,
B2.item_nbr AS item_nbr_2,
B3.item_nbr AS item_nbr_3,
B4.item_nbr AS item_nbr_4,
(B1.item_size + B2.item_size + B3.item_size+ B4.item_size) AS packing
FROM Bins AS B1, Bins AS B2, Bins AS B3, Bins AS B4
WHERE B1.item_nbr < B2.item_nbr
AND B2.item_nbr < B3.item_nbr
AND B3.item_nbr < B4.item_nbr
AND (B1.item_size + B2.item_size
B3.item_size + B4.item_size) <= 9;
Again, we can extend the query to five items with this:
SELECT B1.item_nbr AS item_nbr_1,
B2.item_nbr AS item_nbr_2,
B3.item_nbr AS item_nbr_3,
B4.item_nbr AS item_nbr_4,
B4.item_nbr AS item_nbr_5,
(B1.item_size + B2.item_size + B3.item_size
+ B4.item_size + B5.item_size) AS packing
FROM Bins AS B1, Bins AS B2, Bins AS B3,
Bins AS B4, Bins AS B5
WHERE B1.item_nbr < B2.item_nbr
AND B2.item_nbr < B3.item_nbr
AND B3.item_nbr < B4.item_nbr
AND B4.item_nbr < B5.item_nbr
AND (B1.item_size + B2.item_size + B3.item_size
+ B4.item_size + B5.item_size) <= 9;
but now we get a failures at five items in a bin. Obviously, it would be better to start with a high value of (k) and keep reducing the value until you get a result back.
As it works out, with this data the Decreasing First Fit algorithm gives an optimal result. That queue would look like this: {8, 8, 7, 7, 6, 6, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 3, 3, 2, 2, 2, 2}. The results are like this:
Bin nbr |
Packing |
1 |
8 |
2 |
8 |
3 |
9 |
4 |
9 |
5 |
9 |
6 |
9 |
7 |
9 |
8 |
9 |
9 |
9 |
10 |
9 |
11 |
9 |
12 |
9 |
13 |
4 |
Do not be fooled by this! If we has a bin of size 10 and five items of size 6, the theoretical lower bound would be three bins using ((5 * 6)/10). But it requires five bins no matter which algorithm you pick.
Without generating all possible permutations, see if you can come up with a practical, set oriented solution that gives a near-optimal result.
References:
- Albers, S. and Mitzenmacher, M. "Average-Case Analysis of First Fit and Random Fit Bin Packing." Random Structures Alg. 16, 240-259, 2000.
- Coffman, E. G. Jr.; Garey, M. R.; and Johnson, D. S. "Approximation Algorithms for Bin-Packing--An Updated Survey." In Algorithm Design for Computer System Design. Vienna: Springer-Verlag, pp. 49-106, 1984.
- Garey, M. R.; Graham, R. L.; and Ullman, J. D. "An Analysis of Some Packing Algorithms." In Combinatorial Algorithms. New York: Algorithmics Press, pp. 39-47, 1973.
- Graham, R. L. "Bounds on Performance of Scheduling Algorithms." In Computer and Job-Shop Scheduling Theory (Ed. E. G. Coffman Jr.). New York: Wiley, pp. 165-227, 1976.
- Johnson, D. S. "Approximation Algorithms for Combinatorial Problems." In J. Comput. System Sci. 9, 256-278, 1974.
- Johnson, D. S. "Approximation Algorithms for Combinatorial Problems." In Fifth Annual ACM Symposium on Theory of Computing (Austin, Tex., 1973). New York: Assoc. Comput. Mach., pp. 38-49, 1973.
- Hoffman, P. The Man Who Loved Only Numbers: The Story of Paul Erdős and the Search for Mathematical Truth. New York: Hyperion, 1998.
- Lewis, R.; 2009. "A General-Purpose Hill-Climbing Method for Order Independent Minimum Grouping Problems: A Case Study in Graph Colouring and Bin Packing"; Computers