Neural Minimum Weight Perfect Matching

Updated 5 January 2026

NMWPM is a hybrid quantum error correction decoder that leverages neural networks to predict syndrome-dependent edge weights for the classical MWPM algorithm.
It integrates a Graph Neural Network and Transformer encoder to capture both local and global features of the syndrome graph, producing dynamic weights for matching.
NMWPM achieves improved logical error rates on toric and rotated surface codes under different noise models while maintaining near distance-independent parameter efficiency.

Neural Minimum Weight Perfect Matching (NMWPM) is a data-driven hybrid decoding framework designed to improve quantum error correction (QEC) performance by integrating learned syndrome-dependent edge weights with the classical Minimum Weight Perfect Matching (MWPM) algorithm. The method combines a Graph Neural Network (GNN) and a Transformer encoder to predict the probability that each edge in a syndrome graph should be included in the correction, translating these probabilities into edge weights for MWPM. NMWPM maintains the provable correctness of MWPM while leveraging machine learning to enhance logical error rate (LER) performance across quantum code families, particularly toric and rotated surface codes under independent and depolarizing noise models (Peled et al., 1 Jan 2026).

1. Architectural Design and Inference Pipeline

NMWPM retains the MWPM algorithm for final decoding but introduces the Quantum Weight Predictor (QWP), a neural component tasked with producing syndrome-dependent edge weights. The architecture consists of:

GNN Backbone (TransformerConv): Encodes local spatial and topological syndrome features through multiple stacked graph attention layers on the defect syndrome graph.
Transformer Encoder: Captures long-range and global dependencies between candidate error chains (edges) across the entire lattice.
Nonlinear Weight Mapping: The QWP outputs edge-inclusion probabilities $p_{ij}$ for each directed edge; the final MWPM edge weights are constructed as $w_{ij} = -\ln p'_{ij}$ , with $p'_{ij} = \max(p_{ij}, p_{ji})$ for the undirected edge.

During inference, the following sequence is performed:

Construct the defect graph $G=(V,E)$ from the raw syndrome $S$ , including feature-rich representations for nodes and edges.
Run QWP on the syndrome to output $p_{ij}$ for every edge.
Aggregate probabilities and convert them to MWPM weights via the negative log transform.
Execute the classical MWPM algorithm on $G$ using these dynamic weights to produce the correction.

2. Feature Engineering and Representation

The method distinguishes between local feature encoding (via the GNN) and global reasoning (via the Transformer):

Node Features: For each stabilizer $i$ , the input consists of two-dimensional coordinates $(x_i, y_i)$ , a stabilizer type indicator (X vs Z), distance to the lattice center $\rho_i$ , a sinusoidal positional encoding $w_{ij} = -\ln p'_{ij}$ 0, and a learned embedding $w_{ij} = -\ln p'_{ij}$ 1. Inactive stabilizers retain only their positional encoding. All features are projected through small MLPs into a hidden dimension $w_{ij} = -\ln p'_{ij}$ 2.
Edge Features: For an ordered pair $w_{ij} = -\ln p'_{ij}$ 3, features include Manhattan distance $w_{ij} = -\ln p'_{ij}$ 4, displacements $w_{ij} = -\ln p'_{ij}$ 5, and an error-type flag $w_{ij} = -\ln p'_{ij}$ 6. The scalar $w_{ij} = -\ln p'_{ij}$ 7 is embedded and concatenated to the result of an MLP applied to the remaining features.
Local Aggregation: $w_{ij} = -\ln p'_{ij}$ 8-layer GNN (TransformerConv) with multi-head attention, residual updates, and feedforward MLPs using GELU activation, producing refined node embeddings $w_{ij} = -\ln p'_{ij}$ 9.
Global Edge Reasoning: Form vectors $p'_{ij} = \max(p_{ij}, p_{ji})$ 0 for each edge and input the sequence to a $p'_{ij} = \max(p_{ij}, p_{ji})$ 1-layer Transformer encoder, enabling the model to assess global edge competition before producing output probabilities via a final sigmoid projection.

3. Integration with Minimum Weight Perfect Matching

Classically, MWPM seeks a perfect matching $p'_{ij} = \max(p_{ij}, p_{ji})$ 2 of $p'_{ij} = \max(p_{ij}, p_{ji})$ 3 with nonnegative weights $p'_{ij} = \max(p_{ij}, p_{ji})$ 4 that minimizes total weight:

$p'_{ij} = \max(p_{ij}, p_{ji})$ 5

In NMWPM, the weights are determined dynamically:

$p'_{ij} = \max(p_{ij}, p_{ji})$ 6

Higher output probability yields a lower weight, biasing MWPM toward selecting the corresponding edge. This scheme allows MWPM to remain the underlying decoding routine, while adjusting its input weights flexibly based on the observed syndrome.

4. Differentiable Proxy Loss and Training Strategy

Because the MWPM (Blossom) algorithm is non-differentiable, the training objective is recast as a proxy binary edge classification task. Let $p'_{ij} = \max(p_{ij}, p_{ji})$ 7 denote the predicted edge inclusion probability for the $p'_{ij} = \max(p_{ij}, p_{ji})$ 8th directed edge, with $p'_{ij} = \max(p_{ij}, p_{ji})$ 9 as the ground-truth indicator. Given $G=(V,E)$ 0 directed edges, the loss is:

Binary Cross-Entropy: $G=(V,E)$ 1
Entropy Regularization: $G=(V,E)$ 2
Full Loss: $G=(V,E)$ 3, with $G=(V,E)$ 4

This composite objective encourages peaked confidence in edge predictions, yielding probability distributions that are strongly bimodal. Such peaky distributions, in turn, foster clear separations in MWPM weights, resulting in improved matching accuracy.

5. Training Data Generation and Optimization Procedure

Ground-truth labels for supervised training are created via the following process:

Error Simulation: Simulate random physical errors under independent or depolarizing noise models.
Clustering and Extraction: Group qubits by shared stabilizers and extract the syndrome endpoints.
Local Matching: Apply local MWPM per cluster based on lattice distance to propose a matching $G=(V,E)$ 5.
Correction Validation: If $G=(V,E)$ 6 leads to a logical error, permutation or brute-force search is used to yield a valid matching.

Training data is sampled across code sizes and physical error rates $G=(V,E)$ 7, generating approximately $G=(V,E)$ 8– $G=(V,E)$ 9 examples per epoch. Optimization employs Adam with batch size 32, an initial learning rate of $S$ 0 decayed to $S$ 1 via cosine annealing. Each epoch consists of 500 mini-batches, with total training running for 200–1000 epochs depending on code size. Hardware utilized for experiments includes an NVIDIA L40 GPU with 48 GB of memory. Major hyperparameter settings are:

$S$ 2
$S$ 3 (GNN), $S$ 4 attention heads
$S$ 5 (Transformer)
$S$ 6 (entropy regularization)

6. Benchmark Performance and Parameter Efficiency

Quantitative comparison was conducted against classical MWPM (distance-based), BPOSD-2, and QECCT (state-of-the-art Transformer decoder):

Code & Noise Model	Threshold $S$ 7 (NMWPM)	Threshold (competing methods)
Toric, independent	$S$ 8	MWPM: $S$ 9, BPOSD-2: $p_{ij}$ 0, QECCT: $p_{ij}$ 1, ML bound: $p_{ij}$ 2
Toric, depolarizing	$p_{ij}$ 3	MWPM/BPOSD-2: $p_{ij}$ 4, QECCT: $p_{ij}$ 5, ML bound: $p_{ij}$ 6

For toric codes of size $p_{ij}$ 7 under depolarizing noise, at $p_{ij}$ 8 NMWPM reduces the logical error rate by $p_{ij}$ 9– $G$ 0 relative to traditional MWPM and exceeds the performance of QECCT at $G$ 1. On rotated surface codes (depolarizing noise), NMWPM outperforms MWPM and matches or slightly surpasses QECCT at larger code distances.

NMWPM exhibits a nearly distance-independent parameter count of approximately $G$ 2 million, compared to QECCT’s $G$ 3 million at $G$ 4.

7. Component Insights, Scalability, and Future Prospects

Although no full ablation table is provided, analysis indicates that entropy regularization drives edge probabilities to a highly bimodal distribution, directly aiding MWPM by separating candidate edges for selection. Complexity analysis reveals that the GNN scales as $G$ 5 and the Transformer as $G$ 6, which may constrain practical scalability for large code distances without further innovations such as sparsification or approximate attention (e.g., BigBird).

The ground-truth generation process involves heuristic assignment and brute-force correction steps; refinement or replacement with exact methods could alleviate potential label noise. Current experiments focus on toric and rotated surface codes under independent and depolarizing channels; extension to correlated noise or QLDPC codes is a prospective direction. In hardware-constrained regimes, efficiency optimizations (e.g., network pruning or quantization) and rapid on-the-fly adaptation are promising avenues for practical deployment.

NMWPM exemplifies the integration of algorithmic and data-driven approaches in QEC decoding—combining the reliability of MWPM with syndrome-adaptive, learned edge weighting to achieve enhanced logical error rates and noise thresholds across multiple code families and error models (Peled et al., 1 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Neural Minimum Weight Perfect Matching for Quantum Error Codes (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Neural Minimum Weight Perfect Matching (NMWPM).