Interleaved Randomized Benchmarking

Updated 29 January 2026

Interleaved Randomized Benchmarking (IRB) is a quantum protocol that estimates the error rate of targeted operations by interleaving them within randomized gate sequences from a unitary 2-design.
It leverages the depolarizing model and rigorous fitting methods to extract decay constants that are robust to SPAM errors, ensuring precise fidelity measurements.
The protocol extends to dynamic circuits, non-Clifford gates, and leakage error characterization, making it a scalable benchmark for modern quantum devices.

Interleaved Randomized Benchmarking (IRB) is a quantum characterization protocol designed to estimate the average error rate of a specific quantum gate or operation by alternating—or “interleaving”—its application within sequences of randomly sampled gates from a unitary 2-design (typically the Clifford group). IRB provides an operationally meaningful, SPAM-robust, and scalable method to isolate and quantify the fidelity of individual operations, including Clifford and non-Clifford gates, mid-circuit measurements, resets, and general dynamic circuit primitives. The method and its rigorous guarantees underlie much of the modern literature on quantum benchmarking and have been extended to address time-dependent, gate-dependent, leakage, and coherent error mechanisms.

1. Principles and Protocol Structure

Interleaved Randomized Benchmarking builds on the theory of randomized benchmarking (RB), which leverages random gate sequences from a unitary 2-design to twirl general errors into (approximately) depolarizing channels. Standard RB provides an average gate error rate for a reference gate set (e.g., the Clifford group) by fitting the decay of the survival probability in sequences of increasing length:

$F_{\rm seq}(m) = A p^m + B$

Here $p$ is the depolarizing decay parameter, and $A,B$ absorb state preparation and measurement (SPAM) errors.

Interleaved RB modifies this protocol to insert a specific gate or block $\mathcal{C}$ (or, more generally, a dynamic circuit primitive $\mathcal{F}$ ) between each random gate. The procedure is as follows (Magesan et al., 2012, Shirizly et al., 2024):

For each sequence length $m$ , generate multiple random sequences of $m$ gates $G_1, \ldots, G_m$ from the reference group (e.g., Clifford).
Build two sets of sequences:
- Reference: $G_1, G_2, ..., G_m$ , followed by the unique group inverse.
- Interleaved: $G_1, \mathcal{C}, G_2, \mathcal{C}, ..., G_m, \mathcal{C}$ , followed by the inversion of the total ideal sequence.
Implement each sequence on the device, measure the survival probability, and average over many randomizations.

The interleaved protocol yields two decay curves:

$F_{\mathrm{seq}}^{\mathrm{ref}}(m) = A p^m + B \qquad F_{\mathrm{seq}}^{\mathrm{int}}(m) = A' \tilde{p}^m + B'$

where $p$ and $\tilde{p}$ are the reference and interleaved decay rates, respectively.

The estimated average error rate of $\mathcal{C}$ (for $d$ -dimensional Hilbert space) is then (Magesan et al., 2012, Wallman et al., 2014):

$r_\mathcal{C}^{\mathrm{est}} = \frac{d-1}{d}\left(1 - \frac{\tilde{p}}{p}\right)$

Confidence intervals and error bounds are rigorously derived, enabling quantitative fidelity statements even in the presence of gate-dependent and time-dependent Markovian noise (Wallman et al., 2014, Helsen et al., 2020).

2. Mathematical Foundations and Fidelity Extraction

The mathematical basis for IRB is the unitary twirl, which ensures that under the action of a 2-design (such as the Clifford group), arbitrary noise channels are depolarized. This process equates exponential decay rates with average gate fidelity metrics. IRB relies on the following key relationships (Magesan et al., 2012, Helsen et al., 2018, Helsen et al., 2020):

Depolarizing Model: Averaged noise over a 2-design reduces to a single-parameter depolarizing channel:

$\mathcal{E}_{\mathrm{dep}}(\rho) = p \rho + (1-p)\frac{I}{d}$

Gate Fidelity: The average fidelity of $\mathcal{C}$ is obtained as

$F_{\mathrm{avg}}(\mathcal{C}) = \frac{(d-1)\tilde{p}/p + 1}{d}$

Robustness to SPAM: SPAM errors enter only as offsets $A$ , $B$ in the fits and do not affect the decay constants $p$ , $\tilde{p}$ .
Rigorous Bounds: The true gate infidelity lies within an interval $[r_\mathcal{C}^{\mathrm{est}}-E,\, r_\mathcal{C}^{\mathrm{est}}+E]$ , where $E$ is a computable bound depending on the twirling group and sequence parameters (Wallman et al., 2014).

When multiple nontrivial irreducible representations or non-depolarizing noise are present, the protocol generalizes to multi-exponential fits or filtered postprocessing, but the gate fidelity is still extractable using representation-theoretic tools (Helsen et al., 2020, Helsen et al., 2018).

3. Extensions: Non-Clifford Gates, Dynamic Circuits, Leakage, and Measurement

IRB has been systematically extended along several axes:

Non-Clifford and Arbitrary Gates

Variants including 2-for-1 RB, CNOT-dihedral RB, and hybrid Monte-Carlo RB enable benchmarking of non-Clifford gates (e.g., T, CS gates) using arbitrary reference groups, as long as sufficient randomness and invertibility are maintained (Harper et al., 2016, Garion et al., 2020, Helsen et al., 2018, Chasseur et al., 2016). These methods preserve SPAM independence and enable estimation of average fidelity even when the target gate does not belong to the reference gate set.

Dynamic-Circuit Elements

Recent protocols interleave entire dynamic-circuit operation blocks, such as measurement, reset, and feedforward primitives, within RB sequences. By constructing identity blocks (e.g., $H$ -CNOT, conditional Pauli operations with measurement and feedforward), one quantifies the error budgets for mid-circuit measurement, feedforward, and coherence effects (Shirizly et al., 2024). Error sources, including readout assignment, measurement-induced phase, and decoherence during idle windows, can be isolated and mitigated via dynamical decoupling techniques.

Leakage-Aware IRB

For systems with leakage (e.g., transmons with $|2\rangle$ states), IRB is modified by adding phase randomization layers to destroy residual coherences and by fitting multi-exponential models to capture population flow into and out of the computational subspace (Chasseur et al., 2015). The protocol robustly estimates both average gate error and leakage/seepage rates under mild assumptions.

Measurement-Based and Bias-Optimized IRB

Variants have also been demonstrated for measurement-based quantum computation (using cluster state measurements to generate unitary 2-designs) (Strydom et al., 2022), and for scenarios with biased noise (e.g., dephasing-favored architectures). In the latter, character-weighted survival probabilities and Z-group twirls replace full Pauli twirling to accommodate hardware constraints (Claes et al., 2022).

4. Statistical Confidence and Practical Implementation

IRB protocols exhibit exponential decay in survival probability, with finite-sampling variance that grows only polynomially in the sequence length and error rate. Explicit sample complexity formulas guarantee sub-percent estimation precision with moderate numbers of sequences and shots (Wallman et al., 2014). The median-of-means estimator, gate-set shadow tomography, and filtered RB post-processing further improve statistical robustness and minimize resource requirements (Silva et al., 21 Oct 2025).

Modern implementations routinely use the following practical parameters (Shirizly et al., 2024, Magesan et al., 2012):

Sequence length: up to $m=30$ –$100$ gates.
Number of sequences per $m$ : $20$–$50$.
Number of shots per sequence: $>300$ .
Fitting: least-squares or subspace signal processing (e.g., MUSIC, ESPRIT).
Device: multi-qubit superconducting architectures up to $127$ qubits.

Coherent errors and gate-dependent cross-talk can induce systematic bias in the IRB estimator if not properly mitigated; recent work demonstrates the superiority of single-qubit Pauli-twirl protocols (“cycle benchmarking”) in suppressing such bias compared to multi-qubit Clifford twirls (Sannamoth et al., 31 Dec 2025).

5. Applications and Notable Results

Interleaved RB is the standard for high-fidelity benchmarking of critical quantum operations, including:

Single- and two-qubit gate errors in superconducting, ion-trap, and silicon devices (Magesan et al., 2012, Garion et al., 2020).
Mid-circuit measurement, reset, and feedforward primitives in dynamic circuits, isolating assignment error, $T_1/T_2$ decoherence, and cross-talk (Shirizly et al., 2024).
Non-Clifford gate benchmarking (T, CS, controlled-S, Toffoli) for fault-tolerant gate sets (Harper et al., 2016, Garion et al., 2020).
Measurement-based gate fidelities on large-scale cluster-state hardware (Strydom et al., 2022).
Leakage and bias characterization in advanced device candidates (Chasseur et al., 2015, Claes et al., 2022).
Benchmarking protocols adapted for NISQ hardware and circuits with stabilizer verification or native-gate synthesis (Derbyshire et al., 2021).

An illustrative experimental result: On the IBM 127-qubit “Eagle” processor, interleaved RB for dynamic-circuit primitives revealed ε_F of ≈ $10^{-2}$ dominated by measurement-assignment error and idling, which could be mitigated to the expected incoherent sum via feedforward-aware dynamical decoupling (Shirizly et al., 2024). Non-Clifford two-qubit gates (CS) were benchmarked to error rates below standard CNOTs and approaching $T_2$ -limited performance (Garion et al., 2020). Cycle benchmarking yielded order-of-magnitude lower systematic uncertainty under coherent error drift compared to Clifford-twirl IRB (Sannamoth et al., 31 Dec 2025).

6. Limitations, Advanced Techniques, and Future Directions

Standard IRB protocols assume gate-independent, Markovian noise and may exhibit systematic bias in the presence of strong coherent or gate-dependent errors. Refined bounds and protocol variants, including hybrid (Monte Carlo) IRB, subspace signal-processing, cycle benchmarking, and character-filtered RB, address these limitations by:

Explicitly quantifying systematic uncertainty due to incomplete twirling (Sannamoth et al., 31 Dec 2025).
Filtering multi-exponential decay data to extract target gate contributions (Helsen et al., 2020, Helsen et al., 2018).
Utilizing gate-set shadow tomography and median-of-means estimators for optimal finite-sample confidence and reduced group sizes (Silva et al., 21 Oct 2025).
Implementing minimal or hardware-tailored gate sets for efficient on-device benchmarking (Silva et al., 21 Oct 2025, Claes et al., 2022, Shirizly et al., 2024).

Application of interleaved RB is expanding to multi-qubit correlated noise, dynamic circuit benchmarking, quantum error-correcting code performance, and adapted tomography-free fidelity estimation (Silva et al., 21 Oct 2025, Shirizly et al., 2024). Systematic study and minimization of error bias, leakage corrections, and crosstalk remain active topics of research.

7. Summary Table: Core Elements of Interleaved RB

Element	Purpose / Action	Key Formula or Concept
Reference RB Sequence	Baseline decay, depolarization parameter $p$	$F_{\rm seq}(m) = A p^m + B$
Interleaved RB Sequence	Alternate random gates and target $\mathcal{C}$ , extract $\tilde{p}$	$F_{\rm seq}^{\rm int}(m) = A' \tilde{p}^m + B'$
Error Rate Estimate	Quantifies additional error due to $\mathcal{C}$	$r_\mathcal{C} = \frac{d-1}{d}\left(1 - \frac{\tilde{p}}{p}\right)$
SPAM Robustness	Fitting offsets absorb SPAM, does not bias decay constants	$A, B$ (reference), $A', B'$ (interleaved)
Confidence Interval / Bound	Rigorous error bar on estimate	See explicit $E$ in (Wallman et al., 2014)
Extensions (Non-Clifford, Leakage, etc)	Embedded via sequence, fitting, or representation theory	Multi-exponential or filtered single-exponential fits

Interleaved Randomized Benchmarking thus provides a robust, scalable, and theoretically principled methodology for isolating and quantifying the fidelity of individual quantum operations—Clifford, non-Clifford, measurement, reset, or dynamic circuit—in experimental platforms, with rigorous confidence even in nonideal devices (Shirizly et al., 2024, Magesan et al., 2012, Helsen et al., 2018, Helsen et al., 2020, Sannamoth et al., 31 Dec 2025).