Even More Efficient Soft-Output Decoding with Extra-Cluster Growth and Early Stopping

Published 3 Feb 2026 in quant-ph | (2602.03336v1)

Abstract: In fault-tolerant quantum computing, soft outputs from real-time decoders play a crucial role in improving decoding accuracy, post-selecting magic states, and accelerating lattice surgery. A paper by Meister et al. [arXiv:2405.07433 (2024)] proposed an efficient method to evaluate soft outputs for cluster-based decoders, including the Union-Find (UF) decoder. However, in parallel computing environments, its computational complexity is comparable to or even surpasses that of the UF decoder itself, resulting in a substantial overhead. Furthermore, this method requires global information about the decoding graph, making it poorly suited for existing hardware implementations of the UF decoder on Field-Programmable Gate Arrays (FPGAs). In this paper, to alleviate these issues, we develop more efficient methods for evaluating high-quality soft outputs in cluster-based decoders by introducing several early-stopping techniques. Our central idea is that the precise value of a large soft output is often unnecessary in practice. Based on this insight, we introduce two types of novel soft-outputs: the bounded cluster gap and the extra-cluster gap. The former reduces the computational complexity of Meister's method by terminating the calculation at an early stage. Our numerical simulations show that this method achieves improved scaling with code distance $d$ compared to the original proposal. The latter, the extra-cluster gap, quantifies decoder reliability by performing a small, additional growth of the clusters obtained by the decoder. This approach offers the significant advantage of enabling soft-output computation without modifying the existing architecture of FPGA-implemented UF decoders. These techniques offer lower computational complexity and higher hardware compatibility, laying a crucial foundation for future real-time decoders with soft outputs.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper introduces early stopping and extra-cluster growth techniques to drastically reduce the computational overhead of soft-output decoding for surface codes.
It leverages bounded cluster gap and extra-cluster gap metrics to scale down Dijkstra’s search complexity, improving node exploration efficiency from O(d^3 log d) to nearly quadratic scaling.
Numerical simulations confirm that the methods enable real-time, hardware-friendly decoding ideal for FPGA-based and multi-logical qubit fault-tolerant architectures.

Efficient Soft-Output Decoding with Extra-Cluster Growth and Early Stopping

Introduction and Context

Soft-output metrics have increasingly become a central component in quantum error correction (QEC) workflows, being instrumental for tasks such as enhanced post-selection, magic state distillation, dynamic decoder switching, and real-time adaptation in fault-tolerant quantum computing (FTQC). Cluster-based decoders, especially Union-Find (UF) decoders, are state-of-the-art for surface codes due to their parallelism and hardware efficiency, often implemented on FPGAs. Prior work, specifically the cluster gap method ("Efficient soft-output decoders for the surface code" (Meister et al., 2024)), provided an approach for efficient soft-output evaluation but introduced a significant computational bottleneck, especially in parallel settings: the overhead of soft-output estimation could rival or exceed the decoding cost itself, and global graph information requirements hampered hardware implementations.

This paper introduces two synergistic techniques—early stopping and extra-cluster growth—yielding the bounded cluster gap and the extra-cluster gap metrics, which dramatically reduce computational overhead and unlock efficient soft-output computation compatible with real-time, parallel, and hardware-accelerated decoding architectures.

Methods: Early Stopping and Extra-Cluster Growth

The analysis begins by reframing the performance bottlenecks in existing soft-output calculations for cluster-based decoders. Notably, the precise evaluation of large soft-output values is rarely required for practical post-selection and decoder-switching schemes; it is typically sufficient to confirm if the confidence metric crosses a fixed threshold.

Bounded Cluster Gap

Early stopping is leveraged to restrict Dijkstra's search in the contracted cluster graph, terminating when all paths exceed a predetermined threshold $\epsilon_\mathrm{max}$ . This yields the bounded cluster gap: the search region is drastically reduced if $\epsilon_\mathrm{max}$ is independent of code distance, especially in the low physical error probability regime. Complexity analysis shows this reduces the node exploration from $O(d^3\log d)$ to $O(d^2\log d)$ on average for low $p$ , approaching quadratic scaling.

Figure 1: Node exploration count for standard (dotted) and bounded (solid) cluster gap calculations indicates significant reduction, especially at low $p$ .

Numerical simulations confirm the performance improvement: for $p=0.05\%$ , node visits are reduced by nearly two orders of magnitude compared to the original method, with exponents in the scaling law dropping from $\sim2.9$ to $\sim1.0$ for code distance $d$ in the low- $p$ regime.

Extra-Cluster Gap

While bounded cluster gap reduces complexity, it still requires executing Dijkstra’s algorithm as a post-processing step. To achieve full hardware compatibility and minimal overhead, the authors propose the extra-cluster gap: after decoding, clusters are further grown by a small, bounded radius, reusing the cluster growth logic of the decoder itself. If a connection between the relevant code boundaries is detected within this limited extra growth, the soft output is computed; otherwise, the result is deemed above threshold.

Figure 2: Schematic showing standard cluster-based decoding, complementary gap, cluster gap, bounded search, and extra-cluster growth.

This approach does not require graph rewiring, shortest-path computation, or non-local operations—allowing for immediate integration into FPGA-based architectures. Moreover, two variants are introduced:

Extra-cluster gap without cluster graph (w/o CG): Terminates with the minimal extra growth that connects boundaries.
Extra-cluster gap with cluster graph (w/ CG): If a connection is detected under bounded extra growth, the exact shortest path is computed within the cluster graph, preserving accuracy for marginal cases.

Rigorous theoretical analysis demonstrates:

For any instance where the original cluster gap is below threshold, the extra-cluster gap always identifies it ( $g_\text{ec} \leq g_\text{c}$ ).
The w/ CG variant exactly matches the cluster gap in this case ( $g_\text{eccg} = g_\text{c}$ ).
Above threshold, the computation aborts early, saving resources.

Numerical Results

Comprehensive numerical simulations using circuit-level noise models and large-scale UF decoding benchmarks validate the efficiency of the proposed schemes.

Node Exploration Efficiency

At $p=0.10\%$ , the number of nodes visited in bounded cluster gap scales as $O(d^{2.31})$ vs. $O(d^{2.88})$ for the original method.
For very low error rates, the search is local and essentially independent of $d$ .

Extra-Cluster Method Performance

The fraction of samples requiring soft-output calculation below threshold decays exponentially with $d$ , confirming utility for post-selection and dynamic decoder switching.
For example, at $d=25$ and $p=0.10\%$ , the switching probability needed for dynamic decoder switching is $4 \times 10^{-10}$ , guaranteeing that the high-accuracy decoder is invoked only when truly necessary, always below the backlog threshold.
Figure 3: Maximum extra-cluster growth required is always bounded and significantly less than that for full UF decoding, minimizing latency for FPGA implementations.

Applications and Scaling to Architecture with Multiple Logical Boundaries

A critical advantage of extra-cluster methods is scaling to architectures with many logical qubits (as in spatial or temporal partitioning for lattice surgery or qLDPC codes). While the complementary and cluster gap approaches require $O(M^2)$ (complementary) or $O(M)$ (cluster gap, bounded cluster gap) computations for $M$ logical boundaries, the extra-cluster gap needs only a single bounded-growth operation, with at most $O(1)$ additional cluster graph computations for the rare ambiguous cases.

Figure 4: Multiple logical boundary regions (blue rectangle) illustrate the overhead of conventional approaches versus the single-growth extra-cluster gap for soft-output calculation.

Practical and Theoretical Implications

These techniques fundamentally decouple the scaling of soft-output estimation from decoding cost, enabling:

Real-time soft-output decoding compatible with the throughput of hardware-accelerated (FPGA) UF decoders.
Scalable deployment for large code distances and QEC code architectures harboring many logical qubits or entangling regions.
Reliable operation within dynamic switching frameworks without inducing backlogs.

Theoretically, these results clarify the minimal sufficient computation required for high-confidence estimation: using only cluster-local operations suffices to identify all samples warranting further processing, and the accurate variant (w/ CG) matches the cluster gap when accuracy is critical.

Conclusion

The bounded cluster gap and extra-cluster gap family of metrics deliver sharp reductions in computational overhead for soft-output calculations in cluster-based decoders. The architecture-blind, hardware-friendly extra-cluster gap enables real-time, reliable soft-output decoding in large-scale FTQC scenarios, removing a key performance bottleneck. Its adoption offers immediate practical benefits for decoder switching, post-selection protocols, and magic state distillation, as well as extending to general qLDPC decoders. Future research directions include hardware implementation and extending the technique to non-CSS codes and correlated noise models.

References

"Efficient soft-output decoders for the surface code" (Meister et al., 2024)
"Decoder Switching: Breaking the Speed-Accuracy Tradeoff in Real-Time Quantum Error Correction" (Toshio et al., 29 Oct 2025)
"FPGA-Based Distributed Union-Find Decoder for Surface Codes" [liyanage2024heliosv2]
"Fault-Tolerant Postselection for Low-Overhead Magic State Preparation" [bombin2024faulttolerantmsp]
"Mitigating errors in logical qubits" [smith2024mitigatingerrorsinlq]