- The paper introduces early stopping and extra-cluster growth techniques to drastically reduce the computational overhead of soft-output decoding for surface codes.
- It leverages bounded cluster gap and extra-cluster gap metrics to scale down Dijkstra’s search complexity, improving node exploration efficiency from O(d^3 log d) to nearly quadratic scaling.
- Numerical simulations confirm that the methods enable real-time, hardware-friendly decoding ideal for FPGA-based and multi-logical qubit fault-tolerant architectures.
Efficient Soft-Output Decoding with Extra-Cluster Growth and Early Stopping
Introduction and Context
Soft-output metrics have increasingly become a central component in quantum error correction (QEC) workflows, being instrumental for tasks such as enhanced post-selection, magic state distillation, dynamic decoder switching, and real-time adaptation in fault-tolerant quantum computing (FTQC). Cluster-based decoders, especially Union-Find (UF) decoders, are state-of-the-art for surface codes due to their parallelism and hardware efficiency, often implemented on FPGAs. Prior work, specifically the cluster gap method ("Efficient soft-output decoders for the surface code" (Meister et al., 2024)), provided an approach for efficient soft-output evaluation but introduced a significant computational bottleneck, especially in parallel settings: the overhead of soft-output estimation could rival or exceed the decoding cost itself, and global graph information requirements hampered hardware implementations.
This paper introduces two synergistic techniques—early stopping and extra-cluster growth—yielding the bounded cluster gap and the extra-cluster gap metrics, which dramatically reduce computational overhead and unlock efficient soft-output computation compatible with real-time, parallel, and hardware-accelerated decoding architectures.
Methods: Early Stopping and Extra-Cluster Growth
The analysis begins by reframing the performance bottlenecks in existing soft-output calculations for cluster-based decoders. Notably, the precise evaluation of large soft-output values is rarely required for practical post-selection and decoder-switching schemes; it is typically sufficient to confirm if the confidence metric crosses a fixed threshold.
Bounded Cluster Gap
Early stopping is leveraged to restrict Dijkstra's search in the contracted cluster graph, terminating when all paths exceed a predetermined threshold ϵmax. This yields the bounded cluster gap: the search region is drastically reduced if ϵmax is independent of code distance, especially in the low physical error probability regime. Complexity analysis shows this reduces the node exploration from O(d3logd) to O(d2logd) on average for low p, approaching quadratic scaling.
Figure 1: Node exploration count for standard (dotted) and bounded (solid) cluster gap calculations indicates significant reduction, especially at low p.
Numerical simulations confirm the performance improvement: for p=0.05%, node visits are reduced by nearly two orders of magnitude compared to the original method, with exponents in the scaling law dropping from ∼2.9 to ∼1.0 for code distance d in the low-p regime.
While bounded cluster gap reduces complexity, it still requires executing Dijkstra’s algorithm as a post-processing step. To achieve full hardware compatibility and minimal overhead, the authors propose the extra-cluster gap: after decoding, clusters are further grown by a small, bounded radius, reusing the cluster growth logic of the decoder itself. If a connection between the relevant code boundaries is detected within this limited extra growth, the soft output is computed; otherwise, the result is deemed above threshold.
Figure 2: Schematic showing standard cluster-based decoding, complementary gap, cluster gap, bounded search, and extra-cluster growth.
This approach does not require graph rewiring, shortest-path computation, or non-local operations—allowing for immediate integration into FPGA-based architectures. Moreover, two variants are introduced:
- Extra-cluster gap without cluster graph (w/o CG): Terminates with the minimal extra growth that connects boundaries.
- Extra-cluster gap with cluster graph (w/ CG): If a connection is detected under bounded extra growth, the exact shortest path is computed within the cluster graph, preserving accuracy for marginal cases.
Rigorous theoretical analysis demonstrates:
- For any instance where the original cluster gap is below threshold, the extra-cluster gap always identifies it (gec≤gc).
- The w/ CG variant exactly matches the cluster gap in this case (geccg=gc).
- Above threshold, the computation aborts early, saving resources.
Numerical Results
Comprehensive numerical simulations using circuit-level noise models and large-scale UF decoding benchmarks validate the efficiency of the proposed schemes.
Node Exploration Efficiency
- At p=0.10%, the number of nodes visited in bounded cluster gap scales as O(d2.31) vs. O(d2.88) for the original method.
- For very low error rates, the search is local and essentially independent of d.
Applications and Scaling to Architecture with Multiple Logical Boundaries
A critical advantage of extra-cluster methods is scaling to architectures with many logical qubits (as in spatial or temporal partitioning for lattice surgery or qLDPC codes). While the complementary and cluster gap approaches require O(M2) (complementary) or O(M) (cluster gap, bounded cluster gap) computations for M logical boundaries, the extra-cluster gap needs only a single bounded-growth operation, with at most O(1) additional cluster graph computations for the rare ambiguous cases.
Figure 4: Multiple logical boundary regions (blue rectangle) illustrate the overhead of conventional approaches versus the single-growth extra-cluster gap for soft-output calculation.
Practical and Theoretical Implications
These techniques fundamentally decouple the scaling of soft-output estimation from decoding cost, enabling:
- Real-time soft-output decoding compatible with the throughput of hardware-accelerated (FPGA) UF decoders.
- Scalable deployment for large code distances and QEC code architectures harboring many logical qubits or entangling regions.
- Reliable operation within dynamic switching frameworks without inducing backlogs.
Theoretically, these results clarify the minimal sufficient computation required for high-confidence estimation: using only cluster-local operations suffices to identify all samples warranting further processing, and the accurate variant (w/ CG) matches the cluster gap when accuracy is critical.
Conclusion
The bounded cluster gap and extra-cluster gap family of metrics deliver sharp reductions in computational overhead for soft-output calculations in cluster-based decoders. The architecture-blind, hardware-friendly extra-cluster gap enables real-time, reliable soft-output decoding in large-scale FTQC scenarios, removing a key performance bottleneck. Its adoption offers immediate practical benefits for decoder switching, post-selection protocols, and magic state distillation, as well as extending to general qLDPC decoders. Future research directions include hardware implementation and extending the technique to non-CSS codes and correlated noise models.
References
- "Efficient soft-output decoders for the surface code" (Meister et al., 2024)
- "Decoder Switching: Breaking the Speed-Accuracy Tradeoff in Real-Time Quantum Error Correction" (Toshio et al., 29 Oct 2025)
- "FPGA-Based Distributed Union-Find Decoder for Surface Codes" [liyanage2024heliosv2]
- "Fault-Tolerant Postselection for Low-Overhead Magic State Preparation" [bombin2024faulttolerantmsp]
- "Mitigating errors in logical qubits" [smith2024mitigatingerrorsinlq]