Quantum Kernel and State Overlap

Updated 29 December 2025

Quantum Kernel and State Overlap is defined as a similarity function based on the Hilbert–Schmidt inner product or squared inner product, serving as a foundation for quantum-enhanced kernel methods.
State overlap estimation uses protocols like the SWAP test, Bell-Basis Algorithm, and Schur measurements, each balancing circuit depth, measurement complexity, and noise resilience.
Quantum kernels integrate with machine learning by forming kernel matrices for SVMs and regression tasks, while careful error analysis guides noise mitigation and protocol optimization.

A quantum kernel is a similarity function between data points implemented via the overlap of quantum states in a high-dimensional Hilbert space, most commonly taking the form of the Hilbert–Schmidt inner product Tr[ρx ρ{x′}] or, for pure states, the squared amplitude of the inner product |⟨ψ(x)|ψ(x′)⟩|². State overlap estimation thus forms the foundational primitive for quantum-enhanced kernel methods in machine learning, quantum benchmarking, and quantum information processing. These overlaps offer both a formally valid Mercer kernel and a direct connection to quantum fidelities, enabling support vector machines, kernel ridge regression, hypothesis testing, and discrimination tasks to be instantiated on quantum hardware.

1. Mathematical Definition and Properties

Given a classical input domain and an encoding (quantum feature map) Φ: x↦ρ_x associating each point x with a density operator ρ_x, the canonical quantum kernel is

$K(x,x′) = \operatorname{Tr}[ρ_x ρ_{x′}]$

For pure states, ρ_x = |ψ(x)⟩⟨ψ(x)|, the kernel reduces to the squared state overlap:

$K(x, x′) = |\langle ψ(x) | ψ(x′) \rangle|^2$

This kernel is symmetric by construction and positive semidefinite, satisfying K(x, x′) = K(x′, x), and for any complex c_i,

$\sum_{i,j} \bar{c}_i\,c_j\,K(x_i, x_j) = \left\| \sum_i c_i\,ρ_{x_i} \right\|_{HS}^2 \geq 0$

ensuring that the associated Gram matrix is positive semidefinite for all sample sets—a requirement for kernel methods (Park et al., 2020, Beigi, 2022).

2. Physical Realizations of Overlap Estimation

State overlap can be estimated using several quantum algorithms and measurement strategies, each with trade-offs in circuit depth, measurement complexity, and resilience to hardware noise. The most common protocol is the SWAP test:

SWAP test: Uses an ancilla qubit and controlled-SWAP gate; outputs the probability of ancilla outcome |0⟩, from which |⟨ψ|φ⟩|² or Tr[ρσ] is estimated (Park et al., 2020, Beigi, 2022).
Bell-Basis Algorithm (BBA): Employs parallel CNOT and Hadamard gates, followed by Z-basis measurement and classical post-processing, to estimate the overlap at constant circuit depth irrespective of qubit number (Cincio et al., 2018).
Collective/Schur measurements: For multiple copies per state, the optimal measurement decomposes the state into Schur–Weyl irreducible subspaces, yielding the maximum Fisher information and lowest mean square error, especially for small overlaps or high dimension (Fanizza et al., 2019).
Quasiprobabilistic estimators: Use only local POVMs and classical post-processing, obviating the need for multi-qubit gates while incurring a moderate sample complexity increase, and outperforming SWAP test circuits for n≳2-3 under hardware noise (Guerini et al., 2021).
Photonic protocols: Leverage linear optics (Schur, optical SWAP via Hong–Ou–Mandel interference, or projective tomography) to benchmark resource cost and measurement variance in the estimation of F(ρ,σ) (Zhan et al., 2024).

The table below summarizes resource scaling for several primitive overlap estimation protocols:

Protocol	Circuit Depth	Gate/Resource Scaling	Optimality/Comments
SWAP Test	O(n)	Controlled-SWAP, ancilla	Baseline; sensitive to c→0
BBA	O(1)	2n CNOTs + Hadamards, no ancilla	Constant depth; classically post-processed (Cincio et al., 2018)
Schur/Collective	poly(N+M, log d)	Schur transform, block-wise	MSE-optimal for pure states (Fanizza et al., 2019)
Quasiprobabilistic	O(1)	n single-qubit rotations, measurements	Sample complexity O(2^{1.1n}/ε²)
Photonic OST/SCM	O(1)	Linear optics/interference, detectors	Dimension-independent variance

3. Quantum Kernel Integration in Machine Learning

Quantum kernel methods import the principles of classical kernel-based learning, such as SVMs, into the quantum context by constructing a kernel matrix $K_{ij}$ whose entries are overlaps of quantum feature-mapped data. In the quantum setting:

Prepare quantum feature states |ψ(x_i)⟩ for each data point x_i, using variational circuits or channel-based encodings (Coelho et al., 12 Feb 2025, Tancara et al., 2022).
For all pairs, estimate the overlap via quantum measurement protocols, populating the Gram matrix K_{ij}=|\langle ψ(x_i)|ψ(x_j)\rangle|^2.
The resulting kernel enables standard kernel algorithms—SVM, kernel ridge regression, clustering, etc.—to run using the quantum-evaluated similarity.

For example, the SVM dual optimization problem proceeds identically, simply substituting the quantum kernel for its classical counterpart (Park et al., 2020, Tancara et al., 2022).

Advanced protocols allow the kernel to be systematically “tailored” by raising overlaps to an integer power or by introducing arbitrary weights, achieved using multiple copies and superpositions on quantum registers (Blank et al., 2019).

4. Error Analysis and Noise Robustness

The physical implementation of overlap-based quantum kernels introduces both statistical and hardware noise:

Statistical error arises from finite sampling: if each SWAP test is repeated m times per entry, the mean square error for a Gram matrix entry scales as O(1/m).
Depolarizing noise and gate infidelity perturb state preparation: Under local depolarizing noise with strength p, the feature state is mapped as ˜ρ_x=(1-p)ρ_x+p I/D, leading to kernel suppression:

$\tilde K_{ij} = (1-p)^2 K_{ij} + q/D, \quad q = 1-(1-p)^2$

Statistically, the excess risk in regularized quantum kernel classifiers remains bounded and preserves the favorable O(1/N) learning curve, with a noise-prefactor scaling as (1-p)^2, as long as m = O(\ln(N/\delta)) shots per entry (Beigi, 2022).

Optimality: For small overlaps in high-dimension (where typical |⟨ψ|φ⟩|² ∼ O(1/d)), SWAP test and similar pairwise methods become exponentially inefficient in MSE, whereas collective Schur sampling and adaptive estimators achieve quadratic improvements in relative error (Fanizza et al., 2019, Zhan et al., 2024).

5. Experimental Benchmarks and Photonic Implementations

Recent photonic experiments benchmarked four primary overlap estimation strategies: tomography–tomography, tomography–projection, optical SWAP, and Schur collective measurement. The variances and resource costs were compared for qubits and higher-dimensional (d-level) systems. The Schur measurement and optical SWAP have MSE scaling v(c,N)= (1−c²)/N, dimension-independent, and require only linear optics or static projective elements. Tomography-based protocols exhibit poorer scaling with d and resource demand (Zhan et al., 2024).

Noise models and small-scale superconducting implementations confirm that circuit depth is the limiting factor in accurate kernel estimation on near-term devices: protocols with constant or minimal depth deliver empirically lower RMS error (Cincio et al., 2018, Guerini et al., 2021, Blank et al., 2019).

6. Advanced Kernel Tailoring and Theoretical Extensions

The flexibility of quantum kernel construction allows for:

Kernel exponentiation and weighting: By preparing n copies of feature states, the physical observable becomes the n-th power of fidelity, sharpening decision boundaries in classification (Blank et al., 2019).
Ensemble averaging: Varying weights or exponents across runs allows for integration over ensembles or adaptive kernel learning (Park et al., 2020).
Integration with variational circuits: The use of parameterized circuits combined with kernel target alignment cost functions enables trainable kernel evaluation, with low-rank approximations (e.g. Nyström method) reducing circuit executions for kernel-matrix construction (Coelho et al., 12 Feb 2025).

7. Outlook and Regimes of Applicability

Quantum kernel and state overlap estimation strategies are a central resource across quantum learning, verification, and discrimination tasks. Their practical utility in NISQ devices—where circuit depth and hardware noise limit precision—is largely determined by the choice of overlap estimation protocol and the specifics of the quantum mapping. Constant-depth, minimal-gate-count protocols and collective strategies offer empirical and theoretical advantages for both small training sets and high-dimensional or low-overlap regimes (Cincio et al., 2018, Fanizza et al., 2019, Guerini et al., 2021, Zhan et al., 2024). Further improvements may result from hybrid methods, optimized measurements, and adaptive sampling strategies.

A plausible implication is that as quantum hardware continues to scale, the relative advantages of collective or quasiprobabilistic overlap estimators will become more pronounced, and methods for kernel matrix compression and noise mitigation will be required to maintain learning performance (Coelho et al., 12 Feb 2025, Beigi, 2022). Open questions center on noise-resilient estimator design, kernel function optimization, and the generalization properties of quantum kernel classifiers in realistic device environments.