HyperRBM: Conditional Neural Quantum States

Updated 31 January 2026

HyperRBM is a neural quantum state model that uses a hypernetwork to condition RBM biases on external control parameters.
It enables efficient interpolation of quantum states and computes diagnostics like fidelity susceptibility to pinpoint phase transitions.
The approach reduces both experimental and computational resources by sharing RBM weights across parameter values for scalable tomography.

HyperRBM refers to a class of neural quantum state models in which a hypernetwork conditions a Restricted Boltzmann Machine (RBM) on external control parameters, enabling the representation of an entire manifold of quantum states—most notably, families of ground states indexed by system Hamiltonian parameters. The central innovation is the use of hypernetwork-modulated bias terms within the RBM architecture while retaining a shared parameterization of the weight matrix. HyperRBMs directly address the prohibitive cost of traditional point-wise quantum state tomography, where a separate RBM must be trained at each value of a control parameter, by producing a single, continuous parametric model of the quantum state landscape. This conditional formulation yields efficient, scalable, and differentiable models that can interpolate physical observables and compute sensitivity diagnostics such as fidelity susceptibility across extended phase diagrams (Tonner et al., 28 Jan 2026).

1. Wavefunction Ansatz and Probabilistic Formulation

In the HyperRBM approach, a family of real, nonnegative ground-state wavefunctions $\Psi(s \mid g)$ , with $s \in \{0,1\}^N$ a spin configuration and $g$ a Hamiltonian parameter (e.g., transverse field strength), is represented as follows. A conditional RBM incorporates $M$ hidden units $h \in \{0,1\}^M$ and defines the joint energy

$E_\theta(s,h|g) = -\sum_{i=1}^N b_i(g) s_i - \sum_{j=1}^M c_j(g) h_j - \sum_{i,j} s_i W_{ij} h_j,$

where $W \in \mathbb{R}^{N \times M}$ is a weight matrix shared across $g$ and $b(g), c(g)$ are hypernetwork-modulated bias vectors. Marginalizing over the hidden units yields the free energy

$F_\theta(s|g) = -\sum_{i} b_i(g) s_i - \sum_{j} \mathrm{softplus}((W^\top s)_j + c_j(g)),$

with $\mathrm{softplus}(x) = \log(1 + e^x)$ . The conditional probability is

$p_\theta(s|g) = \frac{e^{-F_\theta(s|g)}}{Z_\theta(g)},$

and the wavefunction ansatz is defined as $\Psi_\theta(s|g) = \sqrt{p_\theta(s|g)}$ , appropriate for stoquastic states where the wavefunction can be chosen real and non-negative (Tonner et al., 28 Jan 2026).

2. Hypernetwork Architecture and FiLM Modulation

Scheme flexibility across control parameters is achieved via a hypernetwork—a small multilayer perceptron—that implements Feature-wise Linear Modulation (FiLM) of RBM biases. The hypernetwork accepts the parameter $g \in \mathbb{R}$ (or a small vector $\lambda \in \mathbb{R}^d$ in the generalized case) and outputs four vectors: $\gamma^b(g), \beta^b(g) \in \mathbb{R}^N$ for visible bias modulation and $\gamma^c(g), \beta^c(g) \in \mathbb{R}^M$ for hidden bias modulation. These are applied as

$b(g) = (1 + \gamma^b(g)) \odot b_{\mathrm{base}} + \beta^b(g),$

$c(g) = (1 + \gamma^c(g)) \odot c_{\mathrm{base}} + \beta^c(g),$

where $\odot$ denotes elementwise multiplication. The weight matrix $W$ remains fixed. The network typically consists of an input layer for $g$ , one hidden layer (width $H$ ), and four output heads for the FiLM coefficients.

This design ensures a differentiable, smoothly-varying latent representation across the parameter manifold, facilitating interpolation and derivative-based analyses such as fidelity susceptibility (Tonner et al., 28 Jan 2026).

3. Training Objective and Optimization Methodology

The training objective is the negative log-likelihood over observed pairs $(s^{(i)}, g^{(i)})$ , corresponding to minimizing the Kullback–Leibler divergence $D_{\mathrm{KL}}(q_{\mathrm{data}} \| p_\theta)$ , where $q_{\mathrm{data}}(s|g)$ is the empirical distribution of measurement outcomes. For stoquastic models,

$L(\theta) = -\sum_{i=1}^{N_s} \log p_\theta(s^{(i)} | g^{(i)}).$

The gradient decomposes as

$\nabla_\theta L = \mathbb{E}_{(s,g)\sim \mathrm{data}} [\nabla_\theta F_\theta(s|g)] - \mathbb{E}_{(s,g)\sim \mathrm{model}} [\nabla_\theta F_\theta(s|g)],$

with the positive phase computed over empirical samples, and the negative phase estimated by running short block-Gibbs chains (Contrastive Divergence, typically CD- $k$ with $k=10$ or $k=20$ ) conditioned on the current $g$ . Training is performed using ADAM with an inverse-sigmoid learning-rate schedule, and mini-batched over $(s, g)$ pairs.

If the underlying physics induces symmetry (e.g., $\mathbb{Z}_2$ symmetry for $g < J$ in the transverse-field Ising model), a symmetrized free energy

$F_{\mathrm{sym}}(s|g) = -\log[e^{-F_\theta(s|g)} + e^{-F_\theta(1-s|g)}]$

can be introduced, paired with an augmented Gibbs sampler involving a latent bit $u \in \{0,1\}$ to enforce symmetry regularization (Tonner et al., 28 Jan 2026).

4. Computational Considerations and Scaling

HyperRBM models trained on lattices up to $N=16$ spins with approximately 10 support parameter points and 20,000 measurement samples per support point complete within 30 minutes on a single CPU, with peak memory usage below $16\,\mathrm{GB}$ . Each mini-batch contains $(s, g)$ pairs sampled uniformly across parameter support, with the negative phase sampling initialized partially at random each epoch to improve mixing dynamics.

Relative to point-wise RBM tomography, HyperRBMs require only a fixed number of samples and training runs for the entire parameter manifold, yielding order-of-magnitude reductions in both experimental and computational resource requirements as the parameter dimensionality or support grows (Tonner et al., 28 Jan 2026).

5. Empirical Evaluation on Quantum Systems

In applications to the transverse-field Ising model on chains and periodic 2D lattices, HyperRBMs are benchmarked against exact Lanczos diagonalization. Several diagnostics are evaluated:

Observable Estimation: Longitudinal magnetization $\langle \sigma^z \rangle$ is estimated by Monte Carlo under $p_\theta(s|g)$ . Transverse magnetization $\langle \sigma^x \rangle$ is computed via a local-estimator trick, leveraging the Monte Carlo sample and the RBM's analytic structure.
Global Fidelity: For $N \leq 16$ , the partition function $Z_\theta(g)$ and the fidelity $F(g) = |\langle \Psi_{\mathrm{ED}}(g) | \Psi_\theta(g)\rangle|^2$ are evaluated by direct summation.
Fidelity Susceptibility: The model's differentiable dependence on $g$ enables estimation of the fidelity susceptibility,

$\chi_F(g) = \frac{1}{4}\,\mathrm{Var}_{s \sim p_\theta(\cdot|g)}[\partial_g F_\theta(s|g)],$

via Monte Carlo and automatic differentiation, permitting direct detection of quantum phase transitions through peaks in $\chi_F(g)$ .

Entanglement Entropy: The second Rényi entropy $S_2(A) = -\log \mathrm{Tr}(\rho_A^2)$ is estimated using the swap-operator replica trick, providing access to block entanglement properties.

Empirically, HyperRBMs achieve fidelities exceeding 99% across both phases and through critical regions, and with as few as three parameter support points, local observables are interpolated across 21 test values with near-exact accuracy. The fidelity susceptibility accurately reproduces the location of the quantum phase transition absent any prior knowledge of the critical field (Tonner et al., 28 Jan 2026).

6. Comparison to Point-wise RBM Tomography and Advantages

Traditional RBM-based tomography necessitates independent model training at each parameter value, leading to $O(\#g)$ training runs and measurement counts. HyperRBMs replace this with a single conditional probability model $p_\theta(s|g)$ valid across the parameter sweep. The approach enables:

Efficient sharing of statistical strength between proximal parameter values.
Reduction in experimental measurement and computational resources.
Construction of a differentiable state manifold permitting direct computation of parametric diagnostics (e.g., fidelity susceptibility).
Comparable or superior fidelity relative to point-wise tomography, with reduced total data and consistent accuracy across the full phase diagram (Tonner et al., 28 Jan 2026).

7. Implications, Limitations, and Prospects

HyperRBMs unify neural-network quantum state tomography with hypernetwork-driven parametric modeling, yielding simultaneous state reconstruction across a manifold of Hamiltonian parameters. This framework retains the scalability and efficient Monte Carlo sampling properties of RBMs while obviating the need for repetitive, point-wise retraining. A plausible implication is the extension of such models to higher-parameter-dimensional systems, conditional quantum processes, or time-dependent tomography, although explicit data for these settings has not appeared.

Key limitations include the requirement that the target family of quantum states be representable within the RBM manifold and that measurement data be sufficiently dense at chosen support points to enable successful interpolation. The current evidence is primarily for stoquastic ground states on moderate lattice sizes; scalability beyond $N \sim 16$ has not yet been presented.

HyperRBMs constitute a notable development for quantum device validation, phase diagram exploration, and scenarios where parametric characterization is essential (Tonner et al., 28 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Parametric Quantum State Tomography with HyperRBMs (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HyperRBM.