Papers
Topics
Authors
Recent
Search
2000 character limit reached

Adaptively Robust Sketches

Updated 6 February 2026
  • Adaptively robust sketches are streaming algorithms designed for resettable models that use polylog-space and differential privacy to resist adaptive adversaries.
  • They replace traditional linear sketches with dedicated Bernoulli sampling and the Binary Tree Mechanism to ensure strong prefix-max accuracy under dynamic updates and resets.
  • They enable robust and efficient approximation of various statistics such as cardinality, sum, and Bernstein statistics, supporting advanced data monitoring and unlearning applications.

Adaptively robust sketches are streaming algorithms designed for the @@@@1@@@@, providing polylogarithmic-space, adversary-resilient data summaries for dynamic systems with both increment and reset capabilities. These sketches address fundamental vulnerabilities of standard linear and composable sketching techniques under adaptive adversaries and enable robust approximation of a wide class of statistical queries, including (sub)linear moments and Bernstein statistics, while offering strong prefix-max accuracy guarantees and efficient memory usage (Cohen et al., 29 Jan 2026).

1. Resettable Streaming Model: Formalism and Scope

The resettable streaming model is characterized by a universe of keys xUx \in \mathcal U, each associated with a nonnegative value vx0v_x \ge 0, and an update stream with two operations: Inc(x,Δ)\mathrm{Inc}(x, \Delta), incrementing vxv_x by Δ0\Delta \ge 0, and Reset(x)\mathrm{Reset}(x), setting vxv_x to zero. While the reset operation can be generalized to predicates over the key set, single-key resets are sufficient to establish information-theoretic lower and upper bounds.

At time tt, the vector (vx(t))x(v_x^{(t)})_{x} naturally defines a broad family of streaming statistics:

Ft=xf(vx(t))F_t = \sum_x f(v_x^{(t)})

where ff can represent the indicator for nonzero entries (cardinality, 0\ell_0), the identity (sum, 1\ell_1), or any sublinear, soft-concave “Bernstein” function.

This model is especially relevant for applications requiring fine-grained reset or deletion support, such as resource monitoring with deletions and machine unlearning.

2. Vulnerabilities of Classical Sketches under Adaptive Attacks

Classical streaming sketches, including sampling-based and linear sketches, provide low-variance, unbiased estimates for FtF_t in the non-adaptive (oblivious) setting but are vulnerable to adaptive adversaries. In the adaptive scenario, the adversary can issue updates based on intermediate estimates, exploiting the deterministic or information-leaking properties of the sketch’s internal randomness.

Key attacks include:

  • Re-insertion Attack (for Insertion-Only and Bernoulli Sampling): The adversary inserts a key and, if the sample size increases, immediately re-inserts it, reducing the effective probability of being retained—yielding a pp-fold underestimation of cardinality.
  • Sample-and-Delete Attack (with Resets): For each new key, the adversary inserts it, checks the sample, and resets if it is present. The adversary empties the sample even as the underlying set A|A| grows, leading to unbounded relative error.
  • Lower Bounds for Linear and Composable Sketches: All known union-composable or linear sketches for these statistics are subject to Ω(k2)\Omega(k^2)-query universal attacks for sketches of size kk, and thus require poly(T)\mathrm{poly}(T)-size to resist adaptive streams of length TT.

These vulnerabilities render oblivious or linear sketches fundamentally unsuitable for robust streaming in the resettable model.

3. Adaptively Robust Sketching Framework: Differential Privacy and Binary Tree Mechanism

The adaptation to adversarial streaming hinges on two core design choices: abandoning composability and linearity in favor of dedicated sampling sketches, and shielding their internal randomness with differential privacy (DP). The key privacy tool is the Binary Tree Mechanism (BTM) for continual observation.

Fixed-Rate Robust Cardinality Sketch

  • Sampling paradigm: Maintain a Bernoulli sample StS_t where each active key in AtA_t enters StS_t with independent probability pp.
  • Increment Logging: Instead of directly releasing St|S_t|, release the increments ut=StSt1{1,0,+1}u_t = |S_t|-|S_{t-1}| \in \{-1,0,+1\}.
  • Noisy Aggregation: Feed the increments into the BTM, releasing

S~t=t=1tut+Lap(LlogT/εdp)\tilde S_t = \sum_{t'=1}^t u_{t'} + \text{Lap}\bigl(L \log T / \varepsilon_{\mathrm{dp}}\bigr)

with sensitivity L=2L=2 and DP parameter εdp\varepsilon_{\mathrm{dp}}.

  • Estimate:

N^t=S~tp\hat N_t = \frac{\tilde S_t}{p}

Intuitive protection arises from the Laplace noise, ensuring that even with adaptive access to N^t\hat N_t, an adversary cannot infer more than a small multiplicative factor regarding any key’s sampled status.

Error Analysis and Robustness via DP-Generalization

Standard BTM analyses yield, uniformly for all tt, with probability 1δ/21-\delta/2:

S~tStO(Lεdplog3/2TlogTδ)|\tilde S_t - |S_t|| \le O\left(\frac{L}{\varepsilon_{\mathrm{dp}}} \log^{3/2}T \log\frac{T}{\delta}\right)

DP-generalization theorems ensure, even under adaptive querying, the difference between St|S_t| and pNtp N_t remains tightly bounded (up to additive O(εdppNt+1/(εdplog(T/δ)))O(\varepsilon_{\mathrm{dp}} p N_t + 1/(\varepsilon_{\mathrm{dp}} \log (T/\delta))) for all tt).

By judicious parameter selection:

  • εdp=Θ(ε)\varepsilon_{\mathrm{dp}} = \Theta(\varepsilon)
  • p=Θ(log3/2Tlog(T/δ)ε2Nmax)p = \Theta\left(\frac{\log^{3/2} T \log (T/\delta)}{\varepsilon^2 N_{\max}}\right)

yields for all tt:

N^tNtεNmax|\hat N_t - N_t| \le \varepsilon N_{\max}

where Nmax=maxtNtN_{\max} = \max_{t'} N_{t'}.

Adjustable-Rate Prefix-Max Accuracy

A fixed-pp scheme requires foreknowledge of NmaxN_{\max}. To circumvent this and guarantee

N^tNtεmaxttNt|\hat N_t - N_t| \le \varepsilon \max_{t' \le t} N_{t'}

at each tt, the sketch adaptively halves pp so sample size never exceeds a fixed budget k=O(ε2log3/2Tlog(T/δ))k = O(\varepsilon^{-2} \log^{3/2} T \log(T/\delta)). Subsampling and corresponding BTM updates ensure continued DP guarantees and error bounds, while maintaining O(k)O(k) total space.

4. Robustness for Sum and Bernstein Statistics

The framework generalizes to both sum (1\ell_1) and Bernstein statistics.

  • Resettable Sum (1\ell_1) Sketch: Uses a related sampler with clipping, deterministically includes large-value keys, and applies BTM to normalized updates. The resulting sketch achieves

O(ε2log11/2Tlog21δ)O\left(\varepsilon^{-2} \log^{11/2} T \log^2 \frac{1}{\delta}\right)

space for prefix-max error

F^tFtεmaxttFt|\hat F_t - F_t| \le \varepsilon \max_{t'\le t} F_{t'}

with probability 1δ1-\delta.

  • Bernstein Statistics: For any function ff admitting a Lévy–Khintchine representation,

f(w)=0a(t)(1ewt)dt,a(t)0,f(w) = \int_0^\infty a(t)\left(1-e^{-wt}\right)\,dt, \quad a(t)\ge0,

e.g., f(w)=wpf(w)=w^p for p(0,1),ln(1+w)p\in(0,1), \ln(1+w), soft-capping, etc., a known reduction expresses ff as a sum plus Max-Distinct over randomized mappings. Parallel application of O(logT)O(\log T) cardinality samplers and robust sum sketches yields overall prefix-max accuracy and poly(ε1,logT,log1/δ)\mathrm{poly}(\varepsilon^{-1}, \log T, \log 1/\delta) space.

5. Error Guarantees and Lower Bounds

An information-theoretic lower bound via set-disjointness precludes pure relative error N^tNtεNt|\hat N_t-N_t|\le \varepsilon N_t with sublinear space, even in the resettable model. However, the “prefix-max” error

N^tNtεmaxttNt|\hat N_t-N_t| \le \varepsilon \max_{t'\le t} N_{t'}

is simultaneously achievable in polylogarithmic space and often operationally sufficient, provided the target statistic seldom shrinks by more than a constant factor from prior maxima.

6. Technical Lemmas and Concentration under Adaptivity

Three essential technical tools underpin adaptively robust sketches:

  • Binary Tree Mechanism Accuracy ([Chan–Shi–Song’11]): For all 1tT1\le t\le T, with O(logT)O(\log T) counters and Laplace noise per node, prefix-noisy sums satisfy

Pr[max1tTS~tStO(Lεlog3/2Tlog(T/δ))]1δ\Pr\left[\max_{1\le t\le T}|\tilde S_t - S_t| \le O\left(\frac{L}{\varepsilon}\log^{3/2}T\log(T/\delta)\right)\right] \ge 1 - \delta

  • DP-Generalization for Bernoulli Sampling: For any DP mechanism applied to Bernoulli samples, the posterior probability that a specific key is in the sample remains within e±2εdpe^{\pm 2\varepsilon_{\mathrm{dp}}} of the base rate pp. The absolute bias in mean sample size remains O(εdppE[Nt])O(\varepsilon_{\mathrm{dp}}p\mathbb E[N_t]).
  • Concentration under Adaptivity: By expressing the sample size as St=iAtBiS_t = \sum_{i\in A_t} B_i (BiB_i i.i.d. Bernoulli), Freedman-style martingale analysis augmented with DP-posterior stability establishes that for all tt

StE[St]O(pNmaxlog(T/δ)+log(T/δ))|S_t-\mathbb E[S_t]| \le O\left(\sqrt{p N_{\max} \log(T/\delta)} + \log(T/\delta)\right)

with probability 1δ1-\delta.

7. Summary of Contributions and Impact

Adaptively robust sketches provide the first polylog-space, provably adversary-resistant streaming algorithms for the resettable model, with strong prefix-max error guarantees for a large class of statistics. The blend of non-composable sampling frameworks and the binary tree mechanism for continual, differentially private release enables these robust properties, sidestepping the impossibility results for conventional sketching. These advances contribute foundational tools for adversarially robust, memory-efficient data processing in streaming applications with support for deletions and unlearning, with immediate implications for areas such as active monitoring and privacy-preserving analytics (Cohen et al., 29 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adaptively Robust Sketches.