Resettable Streaming Model
- Resettable Streaming Model is a framework that supports non-monotonic updates by allowing both incremental increases and resets to zero, essential for handling deletions.
- It is vital for applications like active resource monitoring and machine unlearning, addressing challenges from adversarial updates and privacy constraints.
- Advanced algorithmic techniques, such as binary tree mechanisms and SAFE, leverage differential privacy to guarantee accurate, robust statistics with provable error bounds.
The resettable streaming model is a computational framework for streaming algorithms in which the value of each key in a universe can be increased (by increments) or reset to zero at arbitrary points in the input stream. This model, motivated by applications requiring support for deletion, such as active resource monitoring and machine unlearning, generalizes the standard streaming paradigm by enabling non-monotonic updates. Recent research investigates efficient, robust, and theoretically sound algorithms for estimating statistics of interest (e.g., cardinality, moments, soft-sublinear functionals) in the presence of adversarial update sequences, adaptive attacks, and privacy constraints (Cohen et al., 29 Jan 2026, Shen et al., 21 Jul 2025).
1. Formal Foundations of the Resettable Streaming Model
The resettable streaming model operates over a universe of keys (potentially infinite) and maintains at each time step a nonnegative counter vector representing the state of each key. The stream consists of updates, each of which is either:
- : increment the counter of key by ,
- (or more generally, ): reset the counter for key , or all satisfying predicate , to zero.
For any function , the statistic of interest at time is . Examples include the cardinality (), sum (), sublinear moments (, ), and soft-capped statistics () (Cohen et al., 29 Jan 2026). The resettable streaming model abstracts the online "forgetting" (unlearning) of data points by resetting their contributions to model updates (Shen et al., 21 Jul 2025).
2. Adversarial Robustness and Streaming Unlearning
The model's semantics make it susceptible to adaptive adversarial attacks—scenarios where adversaries exploit knowledge of intermediate outputs to bias sketch-based estimators. Two key attack paradigms are:
- Re-insertion attack (insertion-only): The adversary inserts a key , observes if it is in the sample, and then re-inserts it to force a further bias in sample selection, degrading accuracy.
- Sample-and-delete attack (resettable): The adversary inserts a key, queries to see if it was sampled, and if so, immediately deletes it, thereby manipulating the statistical properties of the estimator. In the cardinality case, such manipulation can reduce the estimated count to zero while the true number is (Cohen et al., 29 Jan 2026).
To guarantee correctness under all adaptive sequences, an algorithm is termed adaptively robust if it maintains that, for all ,
where each update may depend on previous outputs.
Unlearning methods in this streaming model treat a sequence of deletion ("forgetting") requests as inducing a nonstationary process; each deletion request changes the effective empirical distribution, challenging both statistical estimation and model update procedures. The streaming-unlearning setting formalized in (Shen et al., 21 Jul 2025) requires models to closely approximate the ideal retrained model at each timestep without ever re-accessing the full original dataset.
3. Algorithmic Frameworks: Privacy and Robustness
Recent advances leverage differential privacy (DP) and continual observation mechanisms to construct adaptively robust sketches:
- Binary Tree Mechanism: Each update (insertions or deletions/resets) produces a unit, which is aggregated into a prefix-sum via a binary tree structure; each node in the tree adds Laplace noise calibrated to the global sensitivity. This mechanism guarantees -DP under unit-level change, providing privacy and shielding the sketch's internal randomness against adaptive attacks (Cohen et al., 29 Jan 2026).
- The sketch's output (e.g., for cardinality, sum, or more general moments) is derived from the tree's noisy aggregate, with accuracy guarantees that hold uniformly for all ("prefix-max error") with high probability:
Total space for cardinality estimation is ; for sum estimation, .
- Streaming Unlearning as Distribution Shift (SAFE): In the context of machine unlearning, the SAFE algorithm formalizes unlearning as adapting to the distribution shift induced by removing points. Distributional ratios are tracked via incrementally updated Gaussian statistics in a random projection space and label marginal counts, maintained efficiently in the streaming setting (Shen et al., 21 Jul 2025).
The following table summarizes principal algorithmic components:
| Component | Role | Source |
|---|---|---|
| Binary Tree Mechanism | Prefix-sum aggregation with DP noise | (Cohen et al., 29 Jan 2026) |
| Streaming Adjustable-Rate | Cardinality/sublinear moment sketching | (Cohen et al., 29 Jan 2026) |
| SAFE | Efficient streaming unlearning with regret bounds | (Shen et al., 21 Jul 2025) |
4. Theoretical Guarantees and Metrics
Robust algorithms for the resettable streaming model are analyzed under adaptive adversaries and nonstationary data. The principal metrics include:
- Prefix-max error: For all simultaneously, the estimation error is proportional to the largest true statistic so far.
- Space complexity: Polylogarithmic in stream length and inverse error, e.g., for cardinality.
- Dynamic regret: For streaming unlearning, the cumulative discrepancy between the actual and ideal unlearned models, quantified as
where is the cumulative variation in the optimal solutions (i.e., ) (Shen et al., 21 Jul 2025). This rate matches best-known nonstationary online optimization bounds even absent convexity.
Empirical evaluation (e.g., on MNIST, CIFAR-10, TinyImageNet) confirms that adaptively robust algorithms match retrain-based gold standards in accuracy and deletion effectiveness, while affording $2$– speedups (Shen et al., 21 Jul 2025).
5. Supported Statistics and Extensions
The resettable model supports computation of a wide class of statistics:
- Cardinality (): Approximate the number of active keys with -relative accuracy, resisting adaptive attacks via DP-noised sketching and rate control.
- Sum (): Employs an entry-threshold scheme using exponential random variables and partitioning the estimate into deterministic (revealed) and bounded-error (uncertain) parts aggregated with the tree mechanism.
- Bernstein/soft-sublinear statistics: Functions of the form can be handled via Laplace transform decompositions into sum and "maxdistinct" estimators, both robustified using the framework above (Cohen et al., 29 Jan 2026).
- Machine unlearning: The SAFE algorithm tracks class-conditional statistics and marginal label counts under a stream of deletion requests, approximating the retrain solution without access to the original training data (Shen et al., 21 Jul 2025).
6. Connections to Streaming Unlearning and Distributional Shift
The resettable streaming model provides a formal foundation for streaming approaches to machine unlearning. In these approaches, the original dataset defines the initial model ; successive data removal operations yield sets , and the goal is to track an online solution closely approximating the true retrained model . SAFE interprets these updates as a distributional shift problem and maintains sufficient statistics through efficient updates of Gaussian parameters (means, covariances) and label counts using only the information in current and deleted minibatches, with theoretical guarantees on regret and approximate-unlearning (Shen et al., 21 Jul 2025).
A plausible implication is that the resettable streaming formalism will underpin future work in data privacy, model management, and robust streaming computation in adversarial and dynamic environments. The uniform, prefix-max error guarantees enabled by adaptively robust resettable sketches position the model as the default abstraction when monitoring, deletion, and data right-to-be-forgotten operations must be performed at scale in streaming settings (Cohen et al., 29 Jan 2026, Shen et al., 21 Jul 2025).