Single-Pass Streaming CSPs via Two-Tier Sampling

Published 2 Apr 2026 in cs.DS | (2604.01575v1)

Abstract: We study the maximum constraint satisfaction problem, Max-CSP, in the streaming setting. Given $n$ variables, the constraints arrive sequentially in an arbitrary order, with each constraint involving only a small subset of the variables. The objective is to approximate the maximum fraction of constraints that can be satisfied by an optimal assignment in a single pass. The problem admits a trivial near-optimal solution with $O(n)$ space, so the major open problem in the literature has been the best approximation achievable when limiting the space to $o(n)$. The answer to the question above depends heavily on the CSP instance at hand. The integrality gap $α$ of an LP relaxation, known as the BasicLP, plays a central role. In particular, a major conjecture of the area is that in the single-pass streaming setting, for any fixed $\varepsilon > 0$, (i) an $(α-\varepsilon)$-approximation can be achieved with $o(n)$ space, and (ii) any $(α+\varepsilon)$-approximation requires $Ω(n)$ space. In this work, we fully resolve the first side of the conjecture by proving that an $(α- \varepsilon)$-approximation of Max-CSP can indeed be achieved using $n^{{1-Ω_\varepsilon(1)}$} space and in a single pass. Given that Max-DiCut is a special case of Max-CSP, our algorithm fully recovers the recent result of [ABFS26, STOC'26] via a completely different algorithm and proof. On a technical level, our algorithm simulates a suitable local algorithm on a reduced graph using a technique that we call two-tier sampling: the algorithm combines both edge sampling and vertex sampling to handle high- and low-degree vertices at the same time.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces a single-pass streaming algorithm that achieves a (β_F-ε)-approximation for Max-CSPs using n^(1-Ω(1)) space.
It employs a novel two-tier sampling strategy that decouples the handling of low-degree and high-degree variables via hash-based and reservoir sampling techniques.
The work resolves half of the streaming dichotomy conjecture, offering practical implications for large-scale CSP approximations in adversarial data streams.

Single-Pass Streaming CSPs via Two-Tier Sampling: An Authoritative Analysis

Problem Setting and Motivation

The paper "Single-Pass Streaming CSPs via Two-Tier Sampling" (2604.01575) addresses the space-efficient approximation of the maximum constraint satisfaction problem (Max-CSP) in the streaming model. In this regime, constraints arrive sequentially and must be processed using $o(n)$ space, where $n$ is the number of variables. The focus is on constructing a single-pass streaming algorithm that achieves an approximation ratio aligned with the integrality gap $\beta_F$ of the BasicLP relaxation. The central conjecture posited in the literature is a dichotomy: an $(\beta_F-\varepsilon)$ -approximation is attainable in $o(n)$ space, while any $(\beta_F+\varepsilon)$ -approximation requires linear space.

Main Contributions

The principal contribution is a single-pass streaming algorithm that, for any $\varepsilon>0$ , achieves a $(\beta_F-\varepsilon)$ -approximation for Max-CSP using $n^{1-\Omega_\varepsilon(1)}$ space. This resolves the first side of the streaming dichotomy conjecture for arbitrary CSPs, including instances with unbounded variable degree, a long-standing open problem. The algorithm is generic, leveraging local algorithms for bounded-degree instances via a carefully designed reduction that allows its deployment in the streaming setting without explicit construction of the reduced instance.

Technical Framework

BasicLP Relaxation and Integrality Gap

Max-CSP is relaxable via the BasicLP, whose integrality gap $\beta_F$ is conjectured to characterize the attainable approximation ratio in sublinear space streaming. The integrality gap is formally defined as:

$n$ 0

where $n$ 1 is the value of the optimal assignment, and $n$ 2 is the LP relaxation value.

Trevisan Reduction: Probabilistic Degree Bounding

Traditionally, a reduction by Trevisan converts high-degree CSPs to bounded-degree ones by creating variable and constraint copies and redistributing adjacency stochastically. However, explicit construction of this reduced instance is infeasible in streaming due to linear memory requirements and the need for degrees known a priori.

Two-Tier Sampling

The novel technical advance is a two-tier sampling strategy that decouples the treatment of low-degree and high-degree variables:

Low-degree variables: Sampled via hash functions; storing all their adjacent constraints. This guarantees bounded neighborhood sizes and collects the required information for local algorithm simulation.
High-degree variables: Handled by constraint sampling; only a fraction of their constraint copies are sampled, allowing simulation of variable copies without full construction.

This sampling is refined so that, via clever post-processing, the distributed local algorithm (used as a black box) can be executed on collected neighborhoods, and the estimator is corrected for sampling bias by scaling contributions appropriately. The probabilistic nature is crucial to bounding both space and error.

Algorithmic Details

The algorithm consists of three phases:

Sketching: Variables are sampled using high-wise independent hash functions. Adjacent constraint copies are sampled either due to adjacency to a sampled variable (low-degree case) or by sampling constraint-variable pairs (high-degree case). Reservoir sampling is used to select constraints uniformly without replacement for evaluation.
Reduction Simulation: The reduction is simulated online by incrementally constructing the sampled induced subgraph needed for the local algorithm.
Aggregation and Estimation: For each sampled constraint, the algorithm attempts to reconstruct its $n$ 3-neighborhood in the reduced instance. If all dependencies (variable copies) are present, a local algorithm is invoked, and the output is appropriately scaled to account for the probability that the neighborhood was captured.

The algorithm achieves $n$ 4 space usage with high probability, and is independent of stream order due to reservoir sampling, hash-based variable sampling, and constraint sampling strategies.

Strong error bounds and variance control are achieved via Chebyshev and Chernoff inequalities, ensuring the estimator concentrates near the true optimum up to an additive $n$ 5 error.

Theoretical Implications

This result settles the first half of the streaming dichotomy conjecture for general CSPs, confirming that the BasicLP integrality gap indeed governs the approximation threshold for sublinear space single-pass algorithms in adversarial streams. Notably, it generalizes recent results for Max-DiCut and avoids algorithmic idiosyncrasies specific to that problem. The approach is modular with respect to the choice of local algorithm for bounded-degree instances, abstracting away specifics of individual CSPs.

Practically, the algorithm enables space-efficient deployment of streaming CSP approximation in large-scale environments, including network analysis, computational biology, and distributed optimization, where variable degrees can be highly non-uniform and constraints arrive in unpredictable order.

Comparison to Prior Work

Earlier works resolved the conjecture only for bounded-degree instances or via multi-pass algorithms. The fundamental advance here is the ability to process arbitrary CSPs in a single pass, regardless of degree distribution. Previous Max-DiCut-specific constructions relied on specialized submodular maximization subroutines; the present work replaces such dependencies with a generic local algorithm (e.g., from [Yoshida 2011]), thereby broadening applicability and simplifying analysis.

Future Directions

Space Lower Bounds: The second side of the dichotomy conjecture remains open; tightening space lower bounds for $n$ 6-approximation in single-pass streaming is a pivotal next step.
Algorithmic Robustness: Exploring adaptive variants tolerant to variable alphabet size, constraint arity, and stream non-uniformity could enhance practical robustness.
Distributed Streaming: Extensions to distributed streaming models might further reduce resource requirements and enable real-time distributed CSP solutions.
Beyond CSPs: The two-tier sampling paradigm may transfer to other streaming combinatorial optimization tasks, motivating theoretical investigation across hypergraph and database query processing.

Conclusion

The paper provides a rigorous resolution of the streaming dichotomy for Max-CSPs in the single-pass regime, establishing that a $n$ 7-approximation is achievable in $n$ 8 space via a sophisticated two-tier sampling technique. The results implicitly recover prior special case analyses and shift theoretical understanding of streaming CSP approximation to a more unified, generic footing. This approach paves the way for further exploration of tight lower bounds and broader algorithmic generalizations within and beyond streaming constraint satisfaction.