Conjectured universal 1.98 lower bound for MVP of mergeable, reproducible distinct-count sketches

Establish whether a universal lower bound of 1.98 holds for the memory-variance product (MVP) of approximate distinct-count data sketches that support both mergeability and reproducibility, by proving the lower bound or exhibiting a counterexample. Here, MVP is defined as the relative variance of an unbiased distinct-count estimate multiplied by the storage size in bits.

Background

The paper defines the memory-variance product (MVP) as Var(\hat{n}/n) multiplied by the storage size in bits, and uses it as a metric to compare space-efficiency across mergeable, reproducible approximate distinct-count sketches. HyperLogLog with 6-bit registers has a theoretical MVP of 6.48, and the proposed ExaLogLog (ELL) achieves an MVP around 3.67, improving space-efficiency by 43% while maintaining constant-time insertions.

Within this context, the authors reference prior theoretical work that conjectures a universal lower bound of 1.98 for the MVP of sketches that support mergeability and reproducibility. Validating or refuting this conjecture is central to understanding the ultimate limits of space-efficiency for practical distributed counting sketches and would clarify how close current designs (including ELL, CPC, ULL, and HLL variants) are to the fundamental limit.

References

A recent theoretical work conjectured a general lower bound of 1.98 for the MVP of sketches supporting mergeability and reproducibility [Pettie2021], which shows the potential for improvement.

ExaLogLog: Space-Efficient and Practical Approximate Distinct Counting up to the Exa-Scale  (2402.13726 - Ertl, 2024) in Introduction (Section 1)