Rapid mixing in positively weighted restricted Boltzmann machines
Published 1 Apr 2026 in cs.DS, cs.LG, and math.PR | (2604.00963v1)
Abstract: We show polylogarithmic mixing time bounds for the alternating-scan sampler for positively weighted restricted Boltzmann machines. This is done via analysing the same chain and the Glauber dynamics for ferromagnetic two-spin systems, where we obtain new mixing time bounds up to the critical thresholds.
The paper presents sharp polylogarithmic mixing time bounds for the alternating-scan sampler in positively weighted RBMs, advancing theoretical guarantees.
It employs a direct Markov chain analysis using typical-case aggregate strong spatial mixing and potential functions to control correlation decay.
The results bridge theoretical sampling efficiency with practical MCMC training, reinforcing the robustness of deep learning protocols.
Polylogarithmic Mixing Time in Positively Weighted Restricted Boltzmann Machines
Introduction and Context
The paper "Rapid mixing in positively weighted restricted Boltzmann machines" (2604.00963) establishes sharp quantitative mixing time bounds for the alternating-scan sampler in restricted Boltzmann machines (RBMs) with strictly positive weights. RBMs serve as fundamental stochastic models in many deep learning pipelines, notably as building blocks for deep belief nets. Their MCMC-based training protocols, especially those using the alternating-scan sampler (alternately updating visible and hidden units), fundamentally rely on mixing properties to achieve accurate parameter estimation.
Previous theoretical results on mixing times for RBMs either required both upper and lower bounds on interaction strengths or focused on special graph structures. However, no prior rigorous result established efficient mixing for general positively weighted RBMs in the regime of unbounded interactions. This work overcomes these restrictions and demonstrates that for RBMs with strictly positive interactions, the alternating-scan sampler has polylogarithmic in system size mixing time up to uniqueness thresholds inherited from ferromagnetic two-spin systems.
Main Results
Polylogarithmic Mixing for Alternating-Scan Sampler
The central theorem states that for any RBM with n variables, strictly positive off-diagonal weights bounded below by c>0, and non-negative biases, the mixing time of the alternating-scan sampler satisfies
tmix=O((logn)C(c)log(1/ϵ))
where C(c) is a constant depending on the lower bound c on the weights. Importantly, no upper bound on the interaction weights is required, and the RBM topology is arbitrary, so long as all nonzero weights are at least c and all biases are nonnegative.
Notably, this result is obtained without recourse to reductions to the Ising model or random cluster models; rather, a direct Markov chain analysis is developed in the context of the corresponding ferromagnetic two-spin system.
Generalization to Ferromagnetic Two-Spin Systems
The analysis is extended to arbitrary (β,γ,λ)-parameterized ferromagnetic two-spin systems on bipartite graphs, where polylogarithmic mixing persists up to λ<γ/β (a regime corresponding to weak external fields). Specifically, Glauber dynamics has mixing time O(n⋅polylog(n)) and the alternating scan sampler achieves O(polylog(n)), both scaling optimally with the system size when the parameters are constant.
Beyond the uniqueness threshold, the paper identifies a further critical threshold c>00—above which the problem becomes known or conjectured to be computationally intractable—and establishes new bounds for Glauber dynamics up to that threshold. For all c>01, the mixing time for Glauber dynamics is c>02 from the all-ones state and c>03 from arbitrary initial conditions. This bridges the gap between the known algorithmic and hardness thresholds for sampling ferromagnetic two-spin systems.
Technical Approach
Direct Markov Chain Analysis and Typical-Case SSM
A key technical innovation is the analysis directly in terms of Markov chain dynamics on the RBM or two-spin configuration space—eschewing reductionist approaches that map parameter regimes to classical Ising or random cluster models. The paper develops a local-to-global argument for mixing time bounds based on "typical-case" aggregate strong spatial mixing (ASSM), in which one only requires local decay of correlations under boundary conditions encountered with high probability by the chain, rather than worst-case boundary settings.
This typical-case ASSM is supported by a careful probabilistic analysis of the induced SAW tree neighborhoods (a generalization of Weitz's construction) and correlation decay in the presence of large or unbounded degrees in the underlying bipartite interaction graph.
Influence Bounds, Potential Functions, and Burn-in Construction
The authors introduce a global all-to-one influence bound on the system for the relevant uniqueness regime, extend this to local blocks, and use carefully constructed potential functions (in the sense of recursive marginal ratios) to control the propagation of pinning effects through general interaction graphs. The local mixing arguments are then "bootstrapped" probabilistically across the configuration space, with explicit control over warm-start times (burn-in), to achieve the global polylogarithmic mixing guarantee.
Notably, monotonicity properties of the ferromagnetic model are leveraged to provide monotone grand couplings, yielding aggregate coupling contraction at the right scale after suitable burn-in, making the analysis robust to arbitrary initializations.
Numerical and Complexity-Related Implications
The results show that, in RBMs and ferromagnetic two-spin systems with strictly positive weights and weak external fields, efficient sampling by alternating-scan or Glauber dynamics is attainable in time polynomial (even polylogarithmic) in the system size. This provides rigorous support for empirical protocols (e.g., short-run contrastive divergence updates) used for large-scale RBM training, justifying their use (at least in the positive-weight regime) from a complexity-theoretic standpoint.
Additionally, the results clarify the qualitative distinction between the relaxation and bottleneck phenomena in positively and negatively weighted RBMs: whereas the anti-ferromagnetic (negative weight) case is computationally hard above the uniqueness threshold, the positive-weight case remains tractable without restrictive norm constraints.
Broader Implications and Future Directions
The methodology provides a new direct framework for studying high-dimensional mixing in monotone graphical models where classical worst-case correlation decay does not hold, notably in heterogeneous or unbounded-degree graphs. It yields algorithmic advances for both parameter learning and approximate counting in ferromagnetic systems beyond previous limitations.
Open directions identified in the paper include:
Removing the lower bound c>04 on positive weights (currently an artifact of the proof).
Extending the near-optimal c>05 mixing bound to the full uniqueness regime up to the computational threshold.
Adapting the technique for other monotone models (e.g., log-concave distributions, multi-spin systems) where local mixing cannot be enforced under worst-case boundaries.
Sharpening ASSM from typical-case to more refined forms such as entropic or spectral independence, potentially exploiting average-case expansion properties.
Conclusion
This work rigorously establishes that for all positively weighted RBMs and ferromagnetic two-spin systems within the uniqueness regime, the practically relevant alternating-scan sampler and Glauber dynamics converge rapidly to stationarity, with mixing times scaling at most polylogarithmically in system size. This closes a longstanding theoretical gap in the understanding of the MCMC training dynamics of RBMs and provides a new suite of analytical tools for high-dimensional monotone MCMC analysis. The direct Markov chain approach and typical-case spatial mixing represent significant methodological contributions, likely extensible well beyond the specific context of RBMs.