Adaptively Robust Sketches

Updated 6 February 2026

Adaptively robust sketches are streaming algorithms designed for resettable models that use polylog-space and differential privacy to resist adaptive adversaries.
They replace traditional linear sketches with dedicated Bernoulli sampling and the Binary Tree Mechanism to ensure strong prefix-max accuracy under dynamic updates and resets.
They enable robust and efficient approximation of various statistics such as cardinality, sum, and Bernstein statistics, supporting advanced data monitoring and unlearning applications.

Adaptively robust sketches are streaming algorithms designed for the @@@@1@@@@, providing polylogarithmic-space, adversary-resilient data summaries for dynamic systems with both increment and reset capabilities. These sketches address fundamental vulnerabilities of standard linear and composable sketching techniques under adaptive adversaries and enable robust approximation of a wide class of statistical queries, including (sub)linear moments and Bernstein statistics, while offering strong prefix-max accuracy guarantees and efficient memory usage (Cohen et al., 29 Jan 2026).

1. Resettable Streaming Model: Formalism and Scope

The resettable streaming model is characterized by a universe of keys $x \in \mathcal U$ , each associated with a nonnegative value $v_x \ge 0$ , and an update stream with two operations: $\mathrm{Inc}(x, \Delta)$ , incrementing $v_x$ by $\Delta \ge 0$ , and $\mathrm{Reset}(x)$ , setting $v_x$ to zero. While the reset operation can be generalized to predicates over the key set, single-key resets are sufficient to establish information-theoretic lower and upper bounds.

At time $t$ , the vector $(v_x^{(t)})_{x}$ naturally defines a broad family of streaming statistics:

$F_t = \sum_x f(v_x^{(t)})$

where $f$ can represent the indicator for nonzero entries (cardinality, $\ell_0$ ), the identity (sum, $\ell_1$ ), or any sublinear, soft-concave “Bernstein” function.

This model is especially relevant for applications requiring fine-grained reset or deletion support, such as resource monitoring with deletions and machine unlearning.

2. Vulnerabilities of Classical Sketches under Adaptive Attacks

Classical streaming sketches, including sampling-based and linear sketches, provide low-variance, unbiased estimates for $F_t$ in the non-adaptive (oblivious) setting but are vulnerable to adaptive adversaries. In the adaptive scenario, the adversary can issue updates based on intermediate estimates, exploiting the deterministic or information-leaking properties of the sketch’s internal randomness.

Key attacks include:

Re-insertion Attack (for Insertion-Only and Bernoulli Sampling): The adversary inserts a key and, if the sample size increases, immediately re-inserts it, reducing the effective probability of being retained—yielding a $p$ -fold underestimation of cardinality.
Sample-and-Delete Attack (with Resets): For each new key, the adversary inserts it, checks the sample, and resets if it is present. The adversary empties the sample even as the underlying set $|A|$ grows, leading to unbounded relative error.
Lower Bounds for Linear and Composable Sketches: All known union-composable or linear sketches for these statistics are subject to $\Omega(k^2)$ -query universal attacks for sketches of size $k$ , and thus require $\mathrm{poly}(T)$ -size to resist adaptive streams of length $T$ .

These vulnerabilities render oblivious or linear sketches fundamentally unsuitable for robust streaming in the resettable model.

3. Adaptively Robust Sketching Framework: Differential Privacy and Binary Tree Mechanism

The adaptation to adversarial streaming hinges on two core design choices: abandoning composability and linearity in favor of dedicated sampling sketches, and shielding their internal randomness with differential privacy (DP). The key privacy tool is the Binary Tree Mechanism (BTM) for continual observation.

Fixed-Rate Robust Cardinality Sketch

Sampling paradigm: Maintain a Bernoulli sample $S_t$ where each active key in $A_t$ enters $S_t$ with independent probability $p$ .
Increment Logging: Instead of directly releasing $|S_t|$ , release the increments $u_t = |S_t|-|S_{t-1}| \in \{-1,0,+1\}$ .
Noisy Aggregation: Feed the increments into the BTM, releasing

$\tilde S_t = \sum_{t'=1}^t u_{t'} + \text{Lap}\bigl(L \log T / \varepsilon_{\mathrm{dp}}\bigr)$

with sensitivity $L=2$ and DP parameter $\varepsilon_{\mathrm{dp}}$ .

Estimate:

$\hat N_t = \frac{\tilde S_t}{p}$

Intuitive protection arises from the Laplace noise, ensuring that even with adaptive access to $\hat N_t$ , an adversary cannot infer more than a small multiplicative factor regarding any key’s sampled status.

Error Analysis and Robustness via DP-Generalization

Standard BTM analyses yield, uniformly for all $t$ , with probability $1-\delta/2$ :

$|\tilde S_t - |S_t|| \le O\left(\frac{L}{\varepsilon_{\mathrm{dp}}} \log^{3/2}T \log\frac{T}{\delta}\right)$

DP-generalization theorems ensure, even under adaptive querying, the difference between $|S_t|$ and $p N_t$ remains tightly bounded (up to additive $O(\varepsilon_{\mathrm{dp}} p N_t + 1/(\varepsilon_{\mathrm{dp}} \log (T/\delta)))$ for all $t$ ).

By judicious parameter selection:

$\varepsilon_{\mathrm{dp}} = \Theta(\varepsilon)$
$p = \Theta\left(\frac{\log^{3/2} T \log (T/\delta)}{\varepsilon^2 N_{\max}}\right)$

yields for all $t$ :

$|\hat N_t - N_t| \le \varepsilon N_{\max}$

where $N_{\max} = \max_{t'} N_{t'}$ .

Adjustable-Rate Prefix-Max Accuracy

A fixed- $p$ scheme requires foreknowledge of $N_{\max}$ . To circumvent this and guarantee

$|\hat N_t - N_t| \le \varepsilon \max_{t' \le t} N_{t'}$

at each $t$ , the sketch adaptively halves $p$ so sample size never exceeds a fixed budget $k = O(\varepsilon^{-2} \log^{3/2} T \log(T/\delta))$ . Subsampling and corresponding BTM updates ensure continued DP guarantees and error bounds, while maintaining $O(k)$ total space.

4. Robustness for Sum and Bernstein Statistics

The framework generalizes to both sum ( $\ell_1$ ) and Bernstein statistics.

Resettable Sum ( $\ell_1$ ) Sketch: Uses a related sampler with clipping, deterministically includes large-value keys, and applies BTM to normalized updates. The resulting sketch achieves

$O\left(\varepsilon^{-2} \log^{11/2} T \log^2 \frac{1}{\delta}\right)$

space for prefix-max error

$|\hat F_t - F_t| \le \varepsilon \max_{t'\le t} F_{t'}$

with probability $1-\delta$ .

Bernstein Statistics: For any function $f$ admitting a Lévy–Khintchine representation,

$f(w) = \int_0^\infty a(t)\left(1-e^{-wt}\right)\,dt, \quad a(t)\ge0,$

e.g., $f(w)=w^p$ for $p\in(0,1), \ln(1+w)$ , soft-capping, etc., a known reduction expresses $f$ as a sum plus Max-Distinct over randomized mappings. Parallel application of $O(\log T)$ cardinality samplers and robust sum sketches yields overall prefix-max accuracy and $\mathrm{poly}(\varepsilon^{-1}, \log T, \log 1/\delta)$ space.

5. Error Guarantees and Lower Bounds

An information-theoretic lower bound via set-disjointness precludes pure relative error $|\hat N_t-N_t|\le \varepsilon N_t$ with sublinear space, even in the resettable model. However, the “prefix-max” error

$|\hat N_t-N_t| \le \varepsilon \max_{t'\le t} N_{t'}$

is simultaneously achievable in polylogarithmic space and often operationally sufficient, provided the target statistic seldom shrinks by more than a constant factor from prior maxima.

6. Technical Lemmas and Concentration under Adaptivity

Three essential technical tools underpin adaptively robust sketches:

Binary Tree Mechanism Accuracy ([Chan–Shi–Song’11]): For all $1\le t\le T$ , with $O(\log T)$ counters and Laplace noise per node, prefix-noisy sums satisfy

$\Pr\left[\max_{1\le t\le T}|\tilde S_t - S_t| \le O\left(\frac{L}{\varepsilon}\log^{3/2}T\log(T/\delta)\right)\right] \ge 1 - \delta$

DP-Generalization for Bernoulli Sampling: For any DP mechanism applied to Bernoulli samples, the posterior probability that a specific key is in the sample remains within $e^{\pm 2\varepsilon_{\mathrm{dp}}}$ of the base rate $p$ . The absolute bias in mean sample size remains $O(\varepsilon_{\mathrm{dp}}p\mathbb E[N_t])$ .
Concentration under Adaptivity: By expressing the sample size as $S_t = \sum_{i\in A_t} B_i$ ( $B_i$ i.i.d. Bernoulli), Freedman-style martingale analysis augmented with DP-posterior stability establishes that for all $t$

$|S_t-\mathbb E[S_t]| \le O\left(\sqrt{p N_{\max} \log(T/\delta)} + \log(T/\delta)\right)$

with probability $1-\delta$ .

7. Summary of Contributions and Impact

Adaptively robust sketches provide the first polylog-space, provably adversary-resistant streaming algorithms for the resettable model, with strong prefix-max error guarantees for a large class of statistics. The blend of non-composable sampling frameworks and the binary tree mechanism for continual, differentially private release enables these robust properties, sidestepping the impossibility results for conventional sketching. These advances contribute foundational tools for adversarially robust, memory-efficient data processing in streaming applications with support for deletions and unlearning, with immediate implications for areas such as active monitoring and privacy-preserving analytics (Cohen et al., 29 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Adaptively Robust Resettable Streaming (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adaptively Robust Sketches.