Papers
Topics
Authors
Recent
Search
2000 character limit reached

Neural Minimum Weight Perfect Matching

Updated 5 January 2026
  • NMWPM is a hybrid quantum error correction decoder that leverages neural networks to predict syndrome-dependent edge weights for the classical MWPM algorithm.
  • It integrates a Graph Neural Network and Transformer encoder to capture both local and global features of the syndrome graph, producing dynamic weights for matching.
  • NMWPM achieves improved logical error rates on toric and rotated surface codes under different noise models while maintaining near distance-independent parameter efficiency.

Neural Minimum Weight Perfect Matching (NMWPM) is a data-driven hybrid decoding framework designed to improve quantum error correction (QEC) performance by integrating learned syndrome-dependent edge weights with the classical Minimum Weight Perfect Matching (MWPM) algorithm. The method combines a Graph Neural Network (GNN) and a Transformer encoder to predict the probability that each edge in a syndrome graph should be included in the correction, translating these probabilities into edge weights for MWPM. NMWPM maintains the provable correctness of MWPM while leveraging machine learning to enhance logical error rate (LER) performance across quantum code families, particularly toric and rotated surface codes under independent and depolarizing noise models (Peled et al., 1 Jan 2026).

1. Architectural Design and Inference Pipeline

NMWPM retains the MWPM algorithm for final decoding but introduces the Quantum Weight Predictor (QWP), a neural component tasked with producing syndrome-dependent edge weights. The architecture consists of:

  • GNN Backbone (TransformerConv): Encodes local spatial and topological syndrome features through multiple stacked graph attention layers on the defect syndrome graph.
  • Transformer Encoder: Captures long-range and global dependencies between candidate error chains (edges) across the entire lattice.
  • Nonlinear Weight Mapping: The QWP outputs edge-inclusion probabilities pijp_{ij} for each directed edge; the final MWPM edge weights are constructed as wij=lnpijw_{ij} = -\ln p'_{ij}, with pij=max(pij,pji)p'_{ij} = \max(p_{ij}, p_{ji}) for the undirected edge.

During inference, the following sequence is performed:

  1. Construct the defect graph G=(V,E)G=(V,E) from the raw syndrome SS, including feature-rich representations for nodes and edges.
  2. Run QWP on the syndrome to output pijp_{ij} for every edge.
  3. Aggregate probabilities and convert them to MWPM weights via the negative log transform.
  4. Execute the classical MWPM algorithm on GG using these dynamic weights to produce the correction.

2. Feature Engineering and Representation

The method distinguishes between local feature encoding (via the GNN) and global reasoning (via the Transformer):

  • Node Features: For each stabilizer ii, the input consists of two-dimensional coordinates (xi,yi)(x_i, y_i), a stabilizer type indicator (X vs Z), distance to the lattice center ρi\rho_i, a sinusoidal positional encoding PEiPE_i, and a learned embedding rir_i. Inactive stabilizers retain only their positional encoding. All features are projected through small MLPs into a hidden dimension dhiddend_{\text{hidden}}.
  • Edge Features: For an ordered pair (ij)(i \rightarrow j), features include Manhattan distance dijd_{ij}, displacements (Δx,Δy)(\Delta x, \Delta y), and an error-type flag τedge\tau_{\text{edge}}. The scalar dijd_{ij} is embedded and concatenated to the result of an MLP applied to the remaining features.
  • Local Aggregation: LlayersL_{\text{layers}}-layer GNN (TransformerConv) with multi-head attention, residual updates, and feedforward MLPs using GELU activation, producing refined node embeddings hi(L)h_i^{(L)}.
  • Global Edge Reasoning: Form vectors uij=[hi(L)hj(L)eij]u_{ij} = [h_i^{(L)} \Vert h_j^{(L)} \Vert e'_{ij}] for each edge and input the sequence to a LencL_{\text{enc}}-layer Transformer encoder, enabling the model to assess global edge competition before producing output probabilities via a final sigmoid projection.

3. Integration with Minimum Weight Perfect Matching

Classically, MWPM seeks a perfect matching MM^* of G=(V,E)G=(V,E) with nonnegative weights {wuv}\{w_{uv}\} that minimizes total weight:

M=argminMM(u,v)MwuvM^* = \arg\min_{M \in \mathcal{M}} \sum_{(u,v) \in M} w_{uv}

In NMWPM, the weights are determined dynamically:

puv=QWP output,wuv=lnpuvp_{uv} = \operatorname{QWP} \text{ output}, \quad w_{uv} = -\ln p_{uv}

Higher output probability yields a lower weight, biasing MWPM toward selecting the corresponding edge. This scheme allows MWPM to remain the underlying decoding routine, while adjusting its input weights flexibly based on the observed syndrome.

4. Differentiable Proxy Loss and Training Strategy

Because the MWPM (Blossom) algorithm is non-differentiable, the training objective is recast as a proxy binary edge classification task. Let pip_i denote the predicted edge inclusion probability for the iith directed edge, with yi{0,1}y_i \in \{0,1\} as the ground-truth indicator. Given de=2Ed_e=2|E| directed edges, the loss is:

  • Binary Cross-Entropy: BCE(p,y)=i=1de[yilogpi+(1yi)log(1pi)]\mathrm{BCE}(p, y) = -\sum_{i=1}^{d_e} [ y_i \log p_i + (1-y_i)\log(1-p_i) ]
  • Entropy Regularization: H(p)=i[pilogpi+(1pi)log(1pi)]H(p) = -\sum_i [ p_i \log p_i + (1-p_i)\log(1-p_i) ]
  • Full Loss: L=BCE(p,y)+λH(p)\mathcal{L} = \mathrm{BCE}(p, y) + \lambda H(p), with λ=0.01\lambda = 0.01

This composite objective encourages peaked confidence in edge predictions, yielding probability distributions that are strongly bimodal. Such peaky distributions, in turn, foster clear separations in MWPM weights, resulting in improved matching accuracy.

5. Training Data Generation and Optimization Procedure

Ground-truth labels for supervised training are created via the following process:

  • Error Simulation: Simulate random physical errors under independent or depolarizing noise models.
  • Clustering and Extraction: Group qubits by shared stabilizers and extract the syndrome endpoints.
  • Local Matching: Apply local MWPM per cluster based on lattice distance to propose a matching MlocalM_{\text{local}}.
  • Correction Validation: If MlocalM_{\text{local}} leads to a logical error, permutation or brute-force search is used to yield a valid matching.

Training data is sampled across code sizes and physical error rates p(0,0.2)p \in (0,0.2), generating approximately 10510^510610^6 examples per epoch. Optimization employs Adam with batch size 32, an initial learning rate of 9×1059 \times 10^{-5} decayed to 10510^{-5} via cosine annealing. Each epoch consists of 500 mini-batches, with total training running for 200–1000 epochs depending on code size. Hardware utilized for experiments includes an NVIDIA L40 GPU with 48 GB of memory. Major hyperparameter settings are:

  • dhidden=128d_{\text{hidden}}=128
  • Llayers=4L_{\text{layers}}=4 (GNN), K=4K=4 attention heads
  • Lenc=2L_{\text{enc}}=2 (Transformer)
  • λ=0.01\lambda=0.01 (entropy regularization)

6. Benchmark Performance and Parameter Efficiency

Quantitative comparison was conducted against classical MWPM (distance-based), BPOSD-2, and QECCT (state-of-the-art Transformer decoder):

Code & Noise Model Threshold pthp_{\text{th}} (NMWPM) Threshold (competing methods)
Toric, independent 10.95%\simeq 10.95\% MWPM: 10.3%10.3\%, BPOSD-2: 10.8%10.8\%, QECCT: 10.7%10.7\%, ML bound: 11.0%11.0\%
Toric, depolarizing 17.9%\simeq 17.9\% MWPM/BPOSD-2: 16.0%16.0\%, QECCT: 17.8%17.8\%, ML bound: 18.9%18.9\%

For toric codes of size L=6,8,10L=6,8,10 under depolarizing noise, at p>0.12p>0.12 NMWPM reduces the logical error rate by $17$–50%50\% relative to traditional MWPM and exceeds the performance of QECCT at L=10L=10. On rotated surface codes (depolarizing noise), NMWPM outperforms MWPM and matches or slightly surpasses QECCT at larger code distances.

NMWPM exhibits a nearly distance-independent parameter count of approximately $3.9$ million, compared to QECCT’s $6.71$ million at L=10L=10.

7. Component Insights, Scalability, and Future Prospects

Although no full ablation table is provided, analysis indicates that entropy regularization drives edge probabilities to a highly bimodal distribution, directly aiding MWPM by separating candidate edges for selection. Complexity analysis reveals that the GNN scales as O(Ndhidden2+Edhidden)\mathcal{O}\left(N d_{\text{hidden}}^2 + E d_{\text{hidden}}\right) and the Transformer as O(E2dhidden+Edhidden2)\mathcal{O}\left(E^2 d_{\text{hidden}} + E d_{\text{hidden}}^2\right), which may constrain practical scalability for large code distances without further innovations such as sparsification or approximate attention (e.g., BigBird).

The ground-truth generation process involves heuristic assignment and brute-force correction steps; refinement or replacement with exact methods could alleviate potential label noise. Current experiments focus on toric and rotated surface codes under independent and depolarizing channels; extension to correlated noise or QLDPC codes is a prospective direction. In hardware-constrained regimes, efficiency optimizations (e.g., network pruning or quantization) and rapid on-the-fly adaptation are promising avenues for practical deployment.

NMWPM exemplifies the integration of algorithmic and data-driven approaches in QEC decoding—combining the reliability of MWPM with syndrome-adaptive, learned edge weighting to achieve enhanced logical error rates and noise thresholds across multiple code families and error models (Peled et al., 1 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Neural Minimum Weight Perfect Matching (NMWPM).