Papers
Topics
Authors
Recent
Search
2000 character limit reached

HyperRBM: Conditional Neural Quantum States

Updated 31 January 2026
  • HyperRBM is a neural quantum state model that uses a hypernetwork to condition RBM biases on external control parameters.
  • It enables efficient interpolation of quantum states and computes diagnostics like fidelity susceptibility to pinpoint phase transitions.
  • The approach reduces both experimental and computational resources by sharing RBM weights across parameter values for scalable tomography.

HyperRBM refers to a class of neural quantum state models in which a hypernetwork conditions a Restricted Boltzmann Machine (RBM) on external control parameters, enabling the representation of an entire manifold of quantum states—most notably, families of ground states indexed by system Hamiltonian parameters. The central innovation is the use of hypernetwork-modulated bias terms within the RBM architecture while retaining a shared parameterization of the weight matrix. HyperRBMs directly address the prohibitive cost of traditional point-wise quantum state tomography, where a separate RBM must be trained at each value of a control parameter, by producing a single, continuous parametric model of the quantum state landscape. This conditional formulation yields efficient, scalable, and differentiable models that can interpolate physical observables and compute sensitivity diagnostics such as fidelity susceptibility across extended phase diagrams (Tonner et al., 28 Jan 2026).

1. Wavefunction Ansatz and Probabilistic Formulation

In the HyperRBM approach, a family of real, nonnegative ground-state wavefunctions Ψ(sg)\Psi(s \mid g), with s{0,1}Ns \in \{0,1\}^N a spin configuration and gg a Hamiltonian parameter (e.g., transverse field strength), is represented as follows. A conditional RBM incorporates MM hidden units h{0,1}Mh \in \{0,1\}^M and defines the joint energy

Eθ(s,hg)=i=1Nbi(g)sij=1Mcj(g)hji,jsiWijhj,E_\theta(s,h|g) = -\sum_{i=1}^N b_i(g) s_i - \sum_{j=1}^M c_j(g) h_j - \sum_{i,j} s_i W_{ij} h_j,

where WRN×MW \in \mathbb{R}^{N \times M} is a weight matrix shared across gg and b(g),c(g)b(g), c(g) are hypernetwork-modulated bias vectors. Marginalizing over the hidden units yields the free energy

Fθ(sg)=ibi(g)sijsoftplus((Ws)j+cj(g)),F_\theta(s|g) = -\sum_{i} b_i(g) s_i - \sum_{j} \mathrm{softplus}((W^\top s)_j + c_j(g)),

with softplus(x)=log(1+ex)\mathrm{softplus}(x) = \log(1 + e^x). The conditional probability is

pθ(sg)=eFθ(sg)Zθ(g),p_\theta(s|g) = \frac{e^{-F_\theta(s|g)}}{Z_\theta(g)},

and the wavefunction ansatz is defined as Ψθ(sg)=pθ(sg)\Psi_\theta(s|g) = \sqrt{p_\theta(s|g)}, appropriate for stoquastic states where the wavefunction can be chosen real and non-negative (Tonner et al., 28 Jan 2026).

2. Hypernetwork Architecture and FiLM Modulation

Scheme flexibility across control parameters is achieved via a hypernetwork—a small multilayer perceptron—that implements Feature-wise Linear Modulation (FiLM) of RBM biases. The hypernetwork accepts the parameter gRg \in \mathbb{R} (or a small vector λRd\lambda \in \mathbb{R}^d in the generalized case) and outputs four vectors: γb(g),βb(g)RN\gamma^b(g), \beta^b(g) \in \mathbb{R}^N for visible bias modulation and γc(g),βc(g)RM\gamma^c(g), \beta^c(g) \in \mathbb{R}^M for hidden bias modulation. These are applied as

b(g)=(1+γb(g))bbase+βb(g),b(g) = (1 + \gamma^b(g)) \odot b_{\mathrm{base}} + \beta^b(g),

c(g)=(1+γc(g))cbase+βc(g),c(g) = (1 + \gamma^c(g)) \odot c_{\mathrm{base}} + \beta^c(g),

where \odot denotes elementwise multiplication. The weight matrix WW remains fixed. The network typically consists of an input layer for gg, one hidden layer (width HH), and four output heads for the FiLM coefficients.

This design ensures a differentiable, smoothly-varying latent representation across the parameter manifold, facilitating interpolation and derivative-based analyses such as fidelity susceptibility (Tonner et al., 28 Jan 2026).

3. Training Objective and Optimization Methodology

The training objective is the negative log-likelihood over observed pairs (s(i),g(i))(s^{(i)}, g^{(i)}), corresponding to minimizing the Kullback–Leibler divergence DKL(qdatapθ)D_{\mathrm{KL}}(q_{\mathrm{data}} \| p_\theta), where qdata(sg)q_{\mathrm{data}}(s|g) is the empirical distribution of measurement outcomes. For stoquastic models,

L(θ)=i=1Nslogpθ(s(i)g(i)).L(\theta) = -\sum_{i=1}^{N_s} \log p_\theta(s^{(i)} | g^{(i)}).

The gradient decomposes as

θL=E(s,g)data[θFθ(sg)]E(s,g)model[θFθ(sg)],\nabla_\theta L = \mathbb{E}_{(s,g)\sim \mathrm{data}} [\nabla_\theta F_\theta(s|g)] - \mathbb{E}_{(s,g)\sim \mathrm{model}} [\nabla_\theta F_\theta(s|g)],

with the positive phase computed over empirical samples, and the negative phase estimated by running short block-Gibbs chains (Contrastive Divergence, typically CD-kk with k=10k=10 or k=20k=20) conditioned on the current gg. Training is performed using ADAM with an inverse-sigmoid learning-rate schedule, and mini-batched over (s,g)(s, g) pairs.

If the underlying physics induces symmetry (e.g., Z2\mathbb{Z}_2 symmetry for g<Jg < J in the transverse-field Ising model), a symmetrized free energy

Fsym(sg)=log[eFθ(sg)+eFθ(1sg)]F_{\mathrm{sym}}(s|g) = -\log[e^{-F_\theta(s|g)} + e^{-F_\theta(1-s|g)}]

can be introduced, paired with an augmented Gibbs sampler involving a latent bit u{0,1}u \in \{0,1\} to enforce symmetry regularization (Tonner et al., 28 Jan 2026).

4. Computational Considerations and Scaling

HyperRBM models trained on lattices up to N=16N=16 spins with approximately 10 support parameter points and 20,000 measurement samples per support point complete within 30 minutes on a single CPU, with peak memory usage below 16GB16\,\mathrm{GB}. Each mini-batch contains (s,g)(s, g) pairs sampled uniformly across parameter support, with the negative phase sampling initialized partially at random each epoch to improve mixing dynamics.

Relative to point-wise RBM tomography, HyperRBMs require only a fixed number of samples and training runs for the entire parameter manifold, yielding order-of-magnitude reductions in both experimental and computational resource requirements as the parameter dimensionality or support grows (Tonner et al., 28 Jan 2026).

5. Empirical Evaluation on Quantum Systems

In applications to the transverse-field Ising model on chains and periodic 2D lattices, HyperRBMs are benchmarked against exact Lanczos diagonalization. Several diagnostics are evaluated:

  • Observable Estimation: Longitudinal magnetization σz\langle \sigma^z \rangle is estimated by Monte Carlo under pθ(sg)p_\theta(s|g). Transverse magnetization σx\langle \sigma^x \rangle is computed via a local-estimator trick, leveraging the Monte Carlo sample and the RBM's analytic structure.
  • Global Fidelity: For N16N \leq 16, the partition function Zθ(g)Z_\theta(g) and the fidelity F(g)=ΨED(g)Ψθ(g)2F(g) = |\langle \Psi_{\mathrm{ED}}(g) | \Psi_\theta(g)\rangle|^2 are evaluated by direct summation.
  • Fidelity Susceptibility: The model's differentiable dependence on gg enables estimation of the fidelity susceptibility,

χF(g)=14Varspθ(g)[gFθ(sg)],\chi_F(g) = \frac{1}{4}\,\mathrm{Var}_{s \sim p_\theta(\cdot|g)}[\partial_g F_\theta(s|g)],

via Monte Carlo and automatic differentiation, permitting direct detection of quantum phase transitions through peaks in χF(g)\chi_F(g).

  • Entanglement Entropy: The second Rényi entropy S2(A)=logTr(ρA2)S_2(A) = -\log \mathrm{Tr}(\rho_A^2) is estimated using the swap-operator replica trick, providing access to block entanglement properties.

Empirically, HyperRBMs achieve fidelities exceeding 99% across both phases and through critical regions, and with as few as three parameter support points, local observables are interpolated across 21 test values with near-exact accuracy. The fidelity susceptibility accurately reproduces the location of the quantum phase transition absent any prior knowledge of the critical field (Tonner et al., 28 Jan 2026).

6. Comparison to Point-wise RBM Tomography and Advantages

Traditional RBM-based tomography necessitates independent model training at each parameter value, leading to O(#g)O(\#g) training runs and measurement counts. HyperRBMs replace this with a single conditional probability model pθ(sg)p_\theta(s|g) valid across the parameter sweep. The approach enables:

  • Efficient sharing of statistical strength between proximal parameter values.
  • Reduction in experimental measurement and computational resources.
  • Construction of a differentiable state manifold permitting direct computation of parametric diagnostics (e.g., fidelity susceptibility).
  • Comparable or superior fidelity relative to point-wise tomography, with reduced total data and consistent accuracy across the full phase diagram (Tonner et al., 28 Jan 2026).

7. Implications, Limitations, and Prospects

HyperRBMs unify neural-network quantum state tomography with hypernetwork-driven parametric modeling, yielding simultaneous state reconstruction across a manifold of Hamiltonian parameters. This framework retains the scalability and efficient Monte Carlo sampling properties of RBMs while obviating the need for repetitive, point-wise retraining. A plausible implication is the extension of such models to higher-parameter-dimensional systems, conditional quantum processes, or time-dependent tomography, although explicit data for these settings has not appeared.

Key limitations include the requirement that the target family of quantum states be representable within the RBM manifold and that measurement data be sufficiently dense at chosen support points to enable successful interpolation. The current evidence is primarily for stoquastic ground states on moderate lattice sizes; scalability beyond N16N \sim 16 has not yet been presented.

HyperRBMs constitute a notable development for quantum device validation, phase diagram exploration, and scenarios where parametric characterization is essential (Tonner et al., 28 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HyperRBM.