Tensor Network Framework for Quantum Kernels

Updated 9 February 2026

The paper presents a tensor network framework that represents quantum kernel circuits via optimized tensor contractions to achieve polynomial scaling for large qubit systems.
Hybrid quantum–classical dual kernel modeling is introduced to combine quantum feature map expressivity with classical kernel stability, improving machine learning performance.
Advanced contraction algorithms and GPU slicing strategies enable scalable, near state-vector fidelity simulations for circuits with hundreds of qubits.

A tensor network framework for quantum kernel circuit simulation delivers an efficient, scalable methodology for evaluating quantum kernels, fundamentally lowering the computational cost associated with simulating high-qubit, data-encoded quantum circuits. Such frameworks enable systematic and large-scale quantum kernel analysis and facilitate hybrid quantum-classical kernel models that leverage both quantum feature map expressivity and classical model robustness for machine learning tasks. This approach is particularly impactful in quantum machine learning, where practical simulation of circuits with hundreds of qubits, such as for image-classification datasets like Fashion-MNIST, can be achieved on classical hardware at near state-vector fidelity with polynomial resources (Sam et al., 1 Feb 2026).

1. Tensor Network Representation of Quantum Kernel Circuits

In quantum kernel methods, a classical input $x\in \mathbb{R}^n$ is embedded into a quantum state via a parameterized circuit $U(x)$ , typically acting on $n$ qubits, producing the feature-map state $|\psi(x)\rangle = U(x) |0\rangle^{\otimes n}$ . The quantum kernel between two inputs $x_i$ , $x_j$ is defined as the squared overlap $K_q(x_i, x_j) = |\langle\psi(x_j)|\psi(x_i)\rangle|^2$ . Practically, this is computed using a "compute–uncompute" circuit $W(x_i, x_j) = U(x_j)^\dagger U(x_i)$ , and the kernel is evaluated as the squared modulus of the all-zero amplitude: $K_q(x_i, x_j) = |\langle 0|^{\otimes n} W(x_i, x_j) |0\rangle^{\otimes n}|^2.$

The entire overlap circuit is mapped to a tensor-network (TN) graph, where single-qubit gates are rank-2 tensors and two-qubit gates are rank-4 tensors. The connectivity of the gates defines the tensor contraction network. For certain topologies, such as the Block-Product-State (BPS) brick-wall circuit with only nearest-neighbor entangling gates, the resulting tensor network exhibits low entanglement, allowing for compact representations as Matrix-Product States (MPS) with moderate maximum bond dimensions (often $\chi \leq 4\text{--}8$ ), even for $U(x)$ 0 qubits (Sam et al., 1 Feb 2026, Brennan et al., 2021, Chen et al., 2023, Zhang et al., 2022).

2. Gate Application, Contraction Algorithms, and Implementation

The evaluation of a quantum kernel entry consists of constructing the TN for $U(x)$ 1 and $U(x)$ 2, forming a combined network with $U(x)$ 3 layers. An optimized contraction path is then determined—e.g., using hypergraph partitioning algorithms such as FlowCutter or KaHyPar, or heuristic path optimizers like cotengra. The actual contraction may be performed efficiently by controlling the growth of bond dimension: after each two-site gate is contracted into an MPS-like chain, an SVD is applied and truncated to a maximal $U(x)$ 4 reflective of the circuit's entangling structure.

For large-scale evaluation, tensor network slicing is employed, wherein large indices are split to create multiple smaller subtasks, each fitting into the hardware memory (CPU or GPU). The entire $U(x)$ 5 kernel matrix is computed by assigning $U(x)$ 6 tiles to parallel GPU workers, further reducing per-worker memory pressure (Sam et al., 1 Feb 2026, Brennan et al., 2021, Chen et al., 2023, Zhang et al., 2022).

3. Scalability and Computational Complexity

The computational cost for each kernel overlap is $U(x)$ 7 in time and $U(x)$ 8 in memory, where $U(x)$ 9 is the number of qubits, $n$ 0 is the circuit depth, and $n$ 1 is the largest bond dimension encountered. The full $n$ 2 kernel matrix is computed in $n$ 3, potentially parallelized across $n$ 4 GPUs for wall-clock scaling $n$ 5. Empirically, for Fashion-MNIST ( $n$ 6, $n$ 7 up to 784, $n$ 8, $n$ 9), single-overlap times on a GPU are $|\psi(x)\rangle = U(x) |0\rangle^{\otimes n}$ 0 s, and a $|\psi(x)\rangle = U(x) |0\rangle^{\otimes n}$ 1 block can be built in under 1 hour using 4 GPUs, maintaining per-GPU memory below 3 GB via slicing (Sam et al., 1 Feb 2026).

In contrast to generic quantum circuits, where state vector simulation scales exponentially in $|\psi(x)\rangle = U(x) |0\rangle^{\otimes n}$ 2, tensor network frameworks exploit circuit structure—moderate depth and local connectivity—to maintain polynomial scaling, provided $|\psi(x)\rangle = U(x) |0\rangle^{\otimes n}$ 3 remains bounded. QuantEx and related platforms demonstrate exascale simulation capabilities through optimized contraction planning, slicing, and multilevel parallelism (Brennan et al., 2021, Chen et al., 2023, Zhang et al., 2022).

4. Quantum–Classical Dual Kernel Methodology

A key extension is the construction of a hybrid quantum–classical dual kernel: $|\psi(x)\rangle = U(x) |0\rangle^{\otimes n}$ 4 where $|\psi(x)\rangle = U(x) |0\rangle^{\otimes n}$ 5 is the quantum kernel and $|\psi(x)\rangle = U(x) |0\rangle^{\otimes n}$ 6 is a classical RBF kernel. The mixing parameter $|\psi(x)\rangle = U(x) |0\rangle^{\otimes n}$ 7 is treated as a hyperparameter, optimized—along with SVM penalty and RBF width—via cross-validation (Sam et al., 1 Feb 2026).

The feature-to-qubit mapping is one-to-one: each (possibly PCA-reduced and min–max scaled) feature $|\psi(x)\rangle = U(x) |0\rangle^{\otimes n}$ 8 is encoded via $|\psi(x)\rangle = U(x) |0\rangle^{\otimes n}$ 9 rotations, typically preceded by Hadamard gates, ensuring direct interpretability and systematic scaling up to $x_i$ 0 qubits.

The dual kernel SVM classifier operates via identical dual formulations, with the kernel combined as above and multiclass handled via One-Against-One schemes (Sam et al., 1 Feb 2026).

5. Experimental Evaluation and Performance Analysis

Using Fashion-MNIST, the tensor network framework enables quantum kernel evaluation and comparison with classical and dual kernels for feature map sizes $x_i$ 1 through $x_i$ 2, with a direct mapping from PCA features to qubits. The classification pipeline includes data flattening, normalization, PCA, min–max scaling, and SVM training with cross-validated penalties and kernel hyperparameters.

Test accuracy for the pure quantum kernel degrades for $x_i$ 3 due to concentration effects, while the pure classical kernel remains stable. The dual kernel retains high accuracy across all $x_i$ 4, with superior performance relative to either baseline. At $x_i$ 5, confusion matrix-based metrics indicate dual kernel accuracy $x_i$ 6 (classical $x_i$ 7, quantum $x_i$ 8), with F1 and recall similarly aligned.

Tuning the mixing parameter $x_i$ 9 reveals that quantum contributions dominate ( $x_j$ 0) for $x_j$ 1, with classical contributions becoming dominant at higher $x_j$ 2, indicating the classical kernel’s role in mitigating degradation from concentration and hardware noise while preserving quantum advantages at small $x_j$ 3 (Sam et al., 1 Feb 2026).

6. Software Frameworks and Practical Implementations

Prominent tensor network simulation platforms for quantum kernel analysis include:

Framework	Language	Distinct Features
QuantEx	Julia	Exascale TN/graph decompositions, MPI+GPU, slicing, treewidth optimization
TeD-Q	Python	Auto-diff, Ray+KaHyPar hypergraph search, GPU slicing, hybrid ML-QML
TensorCircuit	Python	JAX/TF/PyTorch backend, JIT+AD, cotengra optimization, slicing

These frameworks build the TN corresponding to a given quantum circuit, optimize contraction order via hypergraph partitioning or other heuristics, and provide multi-GPU distributed execution. Slicing and memory control strategies maintain scalability for high-qubit, shallow circuits typical of quantum kernel methods (Brennan et al., 2021, Chen et al., 2023, Zhang et al., 2022).

Tensor-network platforms also expose differentiability, enabling end-to-end gradient-based learning (via auto-diff, parameter-shift, or finite-difference), and batch evaluation through vectorized parallelism (vmap), enabling practical, large-scale quantum kernel and hybrid kernel matrix computation.

7. Limitations, Trade-Offs, and Future Prospects

Tensor network contraction for kernel simulation is most effective for circuits of low depth and local connectivity, where $x_j$ 4 remains small. As depth or non-locality increases, bond dimensions may grow prohibitively, degrading performance. Approximate contractions via SVD truncation or low-rank methods can mitigate some overheads but introduce accuracy trade-offs (Sam et al., 1 Feb 2026, Zhang et al., 2022). Automatic detection of reusable contraction subgraphs and the extension of contraction-path heuristics to broad classes of parametrized circuits remain open research areas.

A key insight is that embedding a quantum kernel in a tunable, linearly mixed dual kernel with a classical anchor systematically balances quantum expressivity and classical stability. The dual kernel mitigates overfitting and the exponential concentration of pure quantum kernels at high dimensions, enabling robust generalization and performance across a broad spectrum of feature dimensions (Sam et al., 1 Feb 2026).

Tensor network frameworks thus constitute a critical toolset for advancing quantum machine learning beyond the limits of brute-force state vector simulations, making it possible to prototype and analyze quantum kernel methods at scales relevant to practical machine learning and beyond the near-term quantum hardware frontiers.