Consider the following table:
ID | GROUP_ID | ORDER_VAL | RESET_VAL | VAL
---+----------+-----------+-----------+-----
1 | 1 | 1 | (null) | 3
2 | 1 | 2 | (null) | 2
3 | 1 | 3 | (null) | 1
4 | 1 | 4 | 4 | 2
5 | 1 | 5 | (null) | 1
6 | 2 | 1 | (null) | 4
7 | 2 | 2 | 2 | 3
8 | 2 | 3 | (null) | 4
9 | 2 | 4 | (null) | 2
10 | 2 | 5 | (null) | 2
11 | 2 | 6 | (null) | 4
12 | 2 | 7 | 14 | 2
13 | 2 | 8 | (null) | 2
For each row, I need to compute the cumulative sum of VAL for all previous rows (ordered by ORDER_VAL and grouped by GROUP_ID), but each time a non-NULL RESET_VAL is encountered, I need to use that value for the sum. The rows that follow also need to build on top of the RESET_VAL instead of using the actual sum. Note that each group can have multiple reset values.
This is the result I expect for the above table:
ID | GROUP_ID | ORDER_VAL | RESET_VAL | VAL | CUMSUM
---+----------+-----------+-----------+-----+-------
1 | 1 | 1 | (null) | 3 | 0
2 | 1 | 2 | (null) | 2 | 3
3 | 1 | 3 | (null) | 1 | 5
4 | 1 | 4 | 4 | 2 | 4
5 | 1 | 5 | (null) | 1 | 6
6 | 2 | 1 | (null) | 4 | 0
7 | 2 | 2 | 2 | 3 | 2
8 | 2 | 3 | (null) | 4 | 5
9 | 2 | 4 | (null) | 2 | 9
10 | 2 | 5 | (null) | 2 | 11
11 | 2 | 6 | (null) | 4 | 13
12 | 2 | 7 | 14 | 2 | 14
13 | 2 | 8 | (null) | 2 | 16
If not for the reset values, I could use a window query:
SELECT temp.*,
COALESCE(SUM(val) OVER (PARTITION BY group_id ORDER BY order_val ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
0) AS cumsum
FROM temp;
I was originally mistakenly thinking that I could just put RESET_VAL at the beginning of the COALESCE, but this won’t work since it won’t reset the value for following rows.
I also tried this solution, but it only resets back to zero, not to a value in the column. Adapting it to do so is proving non-trivial because that value must propagate to all following rows.
A recursive query seems like a natural fit, but I haven’t been able to figure out how to do that as of yet.
I should probably mention that the table I actually have to deal with is much larger (hundreds of thousands to a couple million rows) than the example above, so please mention if there are any performance pitfalls with any answer.