Two-Stage Update Scheme (TSUS)

Updated 27 January 2026

TSUS is a two-stage design that decomposes updates and estimations into sequential phases to balance accuracy, efficiency, and resource utilization.
It is rigorously analyzed using mathematical models in queueing theory, structured estimation, reinforcement learning, and erasure-coded storage systems.
By enabling targeted control of stage-specific parameters, TSUS mitigates system bottlenecks and optimizes trade-offs between timeliness, freshness, and computational resource demands.

The Two-Stage Update Scheme (TSUS) is a foundational architectural and algorithmic design pattern in modern information processing, estimation, reinforcement learning, and storage systems. It decomposes a complex update, estimation, or decision process into two sequential or functionally distinct stages, facilitating fine-grained trade-offs between accuracy, efficiency, freshness, stability, and resource utilization. This organization enables explicit control of stage-specific parameters and targeted mitigation of system bottlenecks, non-stationarities, and inefficiencies.

1. Canonical Architectures and Mathematical Models

TSUS is rigorously defined and mathematically analyzed in several contexts, including queueing networks with sequential multi-step processing, estimation in stochastic processes, hierarchical reinforcement learning, and erasure-coded storage systems.

Queueing Theory and Timeliness: In (Ramani et al., 2024), TSUS is formulated as a tandem M/M/1/1 system with updates generated “on-demand” and processed through two serial stages, each with exponential service times ( $T_1 \sim \text{Exp}(\mu_1)$ , $T_2 \sim \text{Exp}(\mu_2)$ ). The time-average Age of Information (AoI) is

$\bar{\Delta}_{\mathrm{TSUS}} = \frac{1}{\mu_1} + \frac{1}{\mu_2}.$

Structured Estimation: In (Liyanaarachchi et al., 26 Jan 2026), TSUS (here, $p=2$ MAP estimator) governs state estimation for continuous-time Markov chains, with each stage corresponding to a time window: before a time threshold $\tau_i$ , retain the last sample; after $\tau_i$ , estimate the most likely state $i^*$ . The threshold $\tau_i^*$ is the unique solution to $P_{i,i}(\tau) = P_{i,i^*}(\tau)$ , where $P_{ij}(t)$ is the CTMC’s transition function.
Reinforcement Learning: In hierarchical RL (see (Wang et al., 2023)), TSUS orchestrates high-level (manager) policy updates in two stages, first restricting updates to successful low-level actions to avoid non-stationarity, before relaxing this constraint in later training epochs.
Cluster File Systems: In erasure-coded storage (Wei et al., 24 Apr 2025), TSUS divides data updates into (1) a synchronous log-append stage and (2) an asynchronous log recycling stage, converting random I/O into sequential operations and dramatically enhancing throughput and device longevity.

2. Analytical Properties and Performance Characterization

Queueing and AoI

TSUS’s AoI admits a closed-form expression,

$\bar{\Delta}_{\mathrm{TSUS}} = \frac{1}{\mu_1} + \frac{1}{\mu_2},$

with Stage 1 acting as an M/M/∞ source and Stage 2 as an M/M/1/1 queue under preemption. The additive decomposition reflects that MRLC systems’ delays combine linearly; faster stages decrease the respective term but may increase wasted work due to preemption (Ramani et al., 2024).

Wasted Power and Preemption

In the tandem model, when a new Stage 1 completion preempts Stage 2, the partial effort of the preempted update is lost. The steady-state probability of a Stage 2 preemption is

$\pi_b = \rho/(1+\rho), \text{ with } \rho = \mu_1/\mu_2.$

Optimization under Power Constraints

The TSUS power-budgeted AoI minimization problem is

$\begin{align*} \min_{\mu_1,\mu_2 \ge 0} \quad & \bar{\Delta}_{\mathrm{TSUS}}(\mu_1,\mu_2) \ \text{s.t.} \quad & k_1(\mu_1)^\alpha + k_2(\mu_2)^\alpha \leq P_\mathrm{max}/E[C]^\alpha. \end{align*}$

Stage 1 is always active, $k_1=1$ ; Stage 2 is busy with probability $\rho/(1+\rho)$ . The problem reduces to one-dimensional convex minimization in $\rho$ , using KKT conditions for optimal allocation (Ramani et al., 2024).

Structured Estimation

In the TSUS estimator for information freshness (Liyanaarachchi et al., 26 Jan 2026), with system state $X(t)$ and observed last sample at time $G(t)$ ,

$\hat{X}(t) = \begin{cases} X(G(t)), & 0 \leq \delta(t) < \tau_i, \ i^*, & \delta(t) \geq \tau_i. \end{cases}$

where $\delta(t) = t - G(t)$ , and $i^*$ is the stationary highest-probability state. The optimization of mean binary freshness (MBF) and AoI-minimized estimator proceeds via explicit renewal and spectral decompositions.

3. Algorithmic Realizations and Practical Workflows

Deep RL: Hierarchical Policy Training

In (Wang et al., 2023), the TSUS algorithm manages manager-level Q-network updates:

Stage 1: Update the manager only on successful “push” low-level actions (positive reward) to avoid propagating transitions contaminated by highly random or ineffective push behavior.
Stage 2: Following a fixed epoch threshold ( $\tau$ ), broaden updates to all non-random push transitions (i.e., “policy-driven” pushes).

Pseudocode for high-level TD update (simplified):

for each minibatch sample:
    if action == 'push':
        if epoch < τ:
            P = (reward > 0)
        else:
            P = (reward > 0) or not is_random
    else:
        P = 1
    loss += P * Huber(delta)

Empirical results demonstrate a +35 percentage point increase in success rate and 23% improvement in efficiency when using TSUS gating as opposed to naive updates (Wang et al., 2023).

Storage Systems: TSUS in Erasure-Coded Clusters

TSUS-based methods such as TSUE (Wei et al., 24 Apr 2025) are characterized by:

Stage 1: Synchronous data logging—each write is appended sequentially to a replicated log (latency $O(μ\mathrm{s})$ ), immediately acknowledging the client.
Stage 2: Asynchronous log recycling—merging updates and performing in-place erasure code updates in the background, exploiting spatio-temporal locality to reduce write amplification by a factor $R$ (average number of merged writes per block).

Experimental data: In Ali-Cloud workloads, TSUE achieves 7.6 $\times$ throughput improvement and up to 13 $\times$ SSD lifetime extension by reducing random I/O and merging updates (Wei et al., 24 Apr 2025).

4. Applications in Numerical Schemes and Scientific Computing

Two-stage fourth-order (TSUS) time discretizations provide a compact and efficient alternative to multi-stage Runge–Kutta schemes for solving ODEs/PDEs, particularly in the presence of stiffness.

Explicit and Implicit TSUS: Both explicit (Zhang et al., 2022) and implicit (Huo, 1 Dec 2025) variants exist. The latter achieves A-stability and fourth-order temporal accuracy within two nonlinear (Newton-solved) stages:

$\begin{aligned} u^{n+1/2} & = u^n + \text{weighted sum of } \mathcal{L}(u) \text{ and time deriv. at } n, n+1/2, \ u^{n+1} & = u^n + \text{weighted sum of } \mathcal{L}(u) \text{ and time deriv. at } n, n+1/2, n+1. \end{aligned}$

These methods achieve uniform fourth-order accuracy on stiff test problems, often with lower computational costs than classical four-stage Runge–Kutta integrators, especially for problems with a separation of time scales—a direct consequence of the staged temporal coupling (Huo, 1 Dec 2025, Zhang et al., 2022).

Stability: The explicit TSUS method is subject to a slightly more restrictive CFL condition than classic four-stage methods ( $C_\text{TSUS} \approx 0.4$ –$0.5$ vs $0.6$ for RK4) (Zhang et al., 2022).

5. Design Trade-Offs, Optimization, and Comparative Analysis

Key Trade-Offs

Throughput vs. Timeliness: In multi-stage update queues, increasing the speed of one stage may introduce wasted effort via preemption or buffer overflow; optimal AoI is achieved by balancing service rates under power or resource constraints (Ramani et al., 2024).
Freshness vs. Robustness: In estimation, TSUS enables “risk-averse” estimation by switching from stale to stationary prior as age increases, dominating the martingale estimator unless the initial state is always the most probable (Liyanaarachchi et al., 26 Jan 2026).
Exploration vs. Learning Stability: In hierarchical RL, stage gating by reward/policy confidence enables effective learning even with unstable low-level behaviors, directly addressing non-stationarity (Wang et al., 2023).
Latency vs. Endurance: In storage systems, TSUS transforms bursty random updates into merges that maximize device endurance and minimize user-perceived delay (Wei et al., 24 Apr 2025).

Comparative Analysis

Domain	TSUS Structure	Principal Metric	Chief Benefit
Queueing/Updates	Sequential stages	Age of Information (AoI)	Minimal AoI, power-aware
Estimation/Freshness	Time-based staging	Mean Binary Freshness (MBF)	Strict dominance over baseline
Reinforcement Learning	Stage-based loss gating	Success rate, efficiency	Stability, non-stationarity mitigation
Storage Systems	Sync/async staging	Throughput, device lifetime	Burst absorption, endurance

TSUS is generally superior to single-stage or naive parallel designs in timeliness (for equal power/throughput), or stability/robustness (in learning) whenever system parameters are chosen appropriately. In storage, sequential-logging followed by asynchronous reconciliation unleashes substantial throughput and durability gains, especially in SSD-based architectures (Wei et al., 24 Apr 2025).

6. Limitations and Extensions

TSUS’s gains depend critically on system-level parameters such as spatio-temporal locality, load, and the ratio of sequential to random I/O latency (Wei et al., 24 Apr 2025). In estimation, the advantage over martingale estimators vanishes when the “last-sample” state is highest-probability, or as sampling rate grows arbitrarily large (Liyanaarachchi et al., 26 Jan 2026).
In RL, the efficacy of stage thresholding (e.g., epoch $\tau$ in (Wang et al., 2023)) may require empirical tuning; excessive caution may delay convergence, while insufficient filtering reinstates non-stationarity.
Extensions include multi-stage schemes ( $p>2$ in estimation), dynamic adjustment of staging rules, or adaptation to non-SSD or hardware-specific constraints.

In summary, TSUS functions as a general mechanism for the hierarchical decomposition of updates, inference, or control—enabling tractable analysis, optimization, and robust system performance across a wide spectrum of applications in information systems, storage architectures, statistical signal processing, and learning algorithms (Ramani et al., 2024, Liyanaarachchi et al., 26 Jan 2026, Wang et al., 2023, Wei et al., 24 Apr 2025, Huo, 1 Dec 2025, Zhang et al., 2022).