Efficient disk spilling for fully concurrent (non-partitioned) hash aggregation
Develop an efficient method to spill fully concurrent (non-partitioned) hash-based GROUP BY aggregation—implemented using a shared global hash table and a vector of partial aggregates—to disk so that such operators can handle datasets that exceed main-memory capacity without resorting to partitioning.
References
For very large datasets at high thread counts, this overhead could be a concern, especially since there is no clear way to spill non-partitioned hash aggregations efficiently to disk.
— Global Hash Tables Strike Back! An Analysis of Parallel GROUP BY Aggregation
(2505.04153 - Xue et al., 7 May 2025) in Section 3.2, Thread Local Update Method