I have a table which includes a column of decimal values, such as this:
id value size
-- ----- ----
1 100 .02
2 99 .38
3 98 .13
4 97 .35
5 96 .15
6 95 .57
7 94 .25
8 93 .15
What I need to accomplish is a little difficult to describe, so please bear with me. What I am trying to do is create an aggregate value of the size column which increments by 1 each time the preceding rows sum up to 1, when in descending order according to value. The result would look something like this:
id value size bucket
-- ----- ---- ------
1 100 .02 1
2 99 .38 1
3 98 .13 1
4 97 .35 1
5 96 .15 2
6 95 .57 2
7 94 .25 2
8 93 .15 3
My naive first attempt was to keep a running SUM and then CEILING that value, however it doesn't handle the case where some records' size end up contributing to the total of two separate buckets. The below example might clarify this:
id value size crude_sum crude_bucket distinct_sum bucket
-- ----- ---- --------- ------------ ------------ ------
1 100 .02 .02 1 .02 1
2 99 .38 .40 1 .40 1
3 98 .13 .53 1 .53 1
4 97 .35 .88 1 .88 1
5 96 .15 1.03 2 .15 2
6 95 .57 1.60 2 .72 2
7 94 .25 1.85 2 .97 2
8 93 .15 2.00 2 .15 3
As you can see, if I were to simply use CEILING on crude_sum record #8 would be assigned to bucket 2. This is caused by the size of records #5 and #8 being split across two buckets. Instead, the ideal solution is to reset the sum each time it reaches 1, which then increments the bucket column and begins a new SUM operation starting at the size value of the current record. Because the order of the records is important to this operation, I've included the value column, which is intended to be sorted in descending order.
My initial attempts have involved making multiple passes over the data, once to perform the SUM operation, once more to CEILING that, etc. Here is an example of what I did to create the crude_sum column:
SELECT
id,
value,
size,
(SELECT TOP 1 SUM(size) FROM table t2 WHERE t2.value<=t1.value) as crude_sum
FROM
table t1
Which was used in an UPDATE operation to insert the value into a table to work with later.
Edit: I'd like to take another stab at explaining this, so here goes. Imagine each record is a physical item. That item has a value associated with it, and a physical size less than one. I have a series of buckets with a volume capacity of exactly 1, and I need to determine how many of these buckets I will need and which bucket each item goes in according to the value of the item, sorted from highest to lowest.
A physical item cannot exist in two places at once, so it must be in one bucket or the other. This is why I can't do a running total + CEILING solution, because that would allow records to contribute their size to two buckets.
width_bucket()window function if I'm not mistaken. Does SQL Server have that? – Jun 24 '13 at 22:55distinct_countcomplicates things. Aaron Bertrand has a great summary of your options on SQL Server for this kind of windowing work. I have used the "quirky update" method to calculatedistinct_sum, which you can see here on SQL Fiddle, but this is unreliable. – Nick Chammas Jun 25 '13 at 17:26