Spectral Generative Flow Models: A Physics-Inspired Replacement for Vectorized Large Language Models

Published 13 Jan 2026 in cs.LG and cs.CL | (2601.08893v1)

Abstract: We introduce Spectral Generative Flow Models (SGFMs), a physics-inspired alternative to transformer-based LLMs. Instead of representing text or video as sequences of discrete tokens processed by attention, SGFMs treat generation as the evolution of a continuous field governed by constrained stochastic dynamics in a multiscale wavelet basis. This formulation replaces global attention with local operators, spectral projections, and Navier--Stokes-like transport, yielding a generative mechanism grounded in continuity, geometry, and physical structure. Our framework provides three key innovations: (i) a field-theoretic ontology in which text and video are unified as trajectories of a stochastic partial differential equation; (ii) a wavelet-domain representation that induces sparsity, scale separation, and computational efficiency; and (iii) a constrained stochastic flow that enforces stability, coherence, and uncertainty propagation. Together, these components define a generative architecture that departs fundamentally from autoregressive modeling and diffusion-based approaches. SGFMs offer a principled path toward long-range coherence, multimodal generality, and physically structured inductive bias in next-generation generative models.

Abstract PDF Upgrade to Chat

Summary

The paper introduces Spectral Generative Flow Models (SGFMs), which use SPDEs and wavelet transforms to replace vectorized LLMs.
It employs a multiscale wavelet-domain representation to enable efficient and unified treatment of modalities like text and video.
The approach achieves computational efficiency and robust regularization through continuous field representations and physics-informed constraints.

Spectral Generative Flow Models: A Physics-Inspired Generative Architecture

Overview and Motivation

The paper "Spectral Generative Flow Models: A Physics-Inspired Replacement for Vectorized LLMs" (2601.08893) introduces Spectral Generative Flow Models (SGFMs) as an alternative to transformer-based LLMs, particularly vectorized LLMs (vLLMs). The motivation arises from fundamental limitations of the token-centric, attention-driven autoregressive paradigm which lacks continuity, collapses uncertainty at every step, and relies on weak inductive bias. By drawing on mathematical frameworks from statistical physics and turbulence theory—specifically stochastic partial differential equations (SPDEs) modeled in a wavelet basis—the SGFM approach supplants symbolic sequence modeling with field-theoretic, continuous representations governed by physics-inspired dynamics.

The authors contend that by reframing generation as stochastic evolution in function space, continuity, local geometric structure, and long-range coherence are induced architecturally, not merely learned from data. The result is a generative mechanism that is computationally more efficient, modality-agnostic, and structurally regularized.

Field-Theoretic Ontology and SPDE-Based Generation

SGFMs cast text and video as trajectories of a continuous field $u$ evolving over a domain $\Omega$ across time $t$ , with the generative process determined by an SPDE:

$\mathrm{d}u = \Big[ -\mathcal{P}(u \cdot \nabla u) + \nu \Delta u + f_\theta(u) \Big]\mathrm{d}t + \sigma\,\mathrm{d}W_t,$

where $\mathcal{P}$ enforces constraints (e.g., incompressibility), $\nu$ is viscosity, $f_\theta$ is a learned forcing term, and $W_t$ is a driving Wiener process.

This approach eliminates the discrete symbolic manipulation inherent to LLMs, replacing it with a continuous set of degrees of freedom and constrained local evolution. Semantic and syntactic content are encoded as continuous flows, with uncertainty propagated dynamically rather than collapsed autoregressively.

Wavelet-Domain Multiscale Representation

To induce sparsity, tractability, and explicit scale separation, SGFMs represent the generative field $u$ in an orthonormal wavelet basis:

$u(x,t) = \sum_{j,k} c_{j,k}(t)\,\psi_{j,k}(x)$

Wavelet coefficients $c_{j,k}$ indexed by scale $j$ and location $k$ allow the model to decompose global semantics (coarse scales) and local detail (fine scales). This mirrors multiscale energy cascades in turbulence, providing both computational efficiency ( $O(N\log N)$ operations for transforms and projections) and a mechanism for conditional, stochastic generation constrained to appropriate scales.

The wavelet domain also facilitates enforcement of physical constraints, as operators such as gradients, Laplacians, and projections admit sparse representations, and boundary enforcement integrates seamlessly.

Unified Multimodal Architecture

A signature claim of SGFMs is dimensional universality: the same generative pipeline applies across domains—text (2D), video (3D), and beyond—with no need for tokenization or architectural changes. Only the dimensionality of $\Omega$ is altered, unlike transformer-based LLMs that require bespoke treatment for each modality. This unification is achieved by employing the same SPDE and wavelet machinery, with prompts and conditioning information translated into initial/boundary values.

Generative Dynamics via Wavelet-Space Diffusion

The stochastic evolution of generation is achieved through diffusion in wavelet coefficient space:

$\mathrm{d}c = \sqrt{2} \mathrm{d}W_\tau + s_\theta(c,\tau)\,\mathrm{d}\tau$

where $s_\theta(c,\tau)$ is a score function conditioned on multiscale coefficients and physical parameters, and $W_\tau$ is a Wiener process. Sampling proceeds via reverse-time SDE integration, optionally augmented with physics-guided correction:

$c \leftarrow c - \eta \nabla_c \mathcal{E}(W^{-1}[c])$

with $\mathcal{E}$ penalizing violations of physical constraints.

The training objective combines diffusion-based score matching, physics residual minimization (enforcing conformity with SPDE constraints), and boundary conditioning, resulting in an overall loss:

$\mathcal{L} = \mathcal{L}_{\mathrm{diff}} + \lambda_R \|\mathcal{P}(R(u))\|^2 + \lambda_B \mathcal{L}_{\mathrm{BC}}$

Rigorous Structural and Computational Analysis

SGFMs offer improved computational scaling over transformers: all major operations—wavelet transforms, differential operators, projections, advection—admit $O(N)$ or $O(N\log N)$ complexity, as opposed to $O(N^2)$ for attention in transformers. This efficiency is especially pronounced for long contexts or high-resolution domains, and is analytically verified in the paper’s appendices.

Stability of the generative process is ensured by the inclusion of dissipation (Laplacian term), robust constraints (incompressibility), energy bounds, and provable well-posedness of SPDEs under standard regularity assumptions. These features promote bounded, physically admissible evolution, suppressing hallucinations and instability.

Theoretical and Practical Implications

SGFMs embody a post-transformer paradigm where:

Attention is obsolete: Global coherence arises from integrating local dynamics, multiscale interactions, and constraint-mediated projections.
Unified treatment of modalities: All modalities—text, video, and physical data—are modeled as stochastic evolution in function space, with domain dimensionality as the only variable.
Strong inductive bias: Physical priors circumvent the need for massive data scale, reducing sample complexity and improving extrapolation/generalization.
Continuous semantics and uncertainty: Meaning is represented as a continuous field, allowing smooth interpolation, rigorous uncertainty propagation, and explicit regularity control.
Explicit constraint satisfaction: Logically valid and physically plausible configurations are enforced by design, not by post-hoc filtering.

Limitations and Future Directions

SGFMs introduce substantial mathematical complexity: SPDE-driven sampling, spectral transforms, and projection operations demand sophisticated numerical and software infrastructure. Overly rigid constraints may suppress creative or linguistic diversity not amenable to physical regularization. Scaling empirical validation to vLLM benchmarks, refining adaptive resolution schemes, and developing hardware acceleration remain open engineering challenges. Further work is needed to seamlessly integrate symbolic reasoning constraints without undermining field-theoretic coherence.

Integration with stochastic control and optimal transport theory suggests new approaches to reasoning, conditioning, and planning. Adaptive spectral resolution may yield models that dynamically allocate computational resources, focusing capacity where semantic or visual complexity is highest.

Conclusion

SGFMs represent a foundational shift in generative modeling: abandoning discrete, attention-centric transformer architectures in favor of continuous, physically structured dynamics in wavelet spectral space. By leveraging SPDEs and multiscale field-theoretic representations, they enable unified, efficient, and structurally regularized generative models that apply across modalities. This approach lays the groundwork for principled post-transformer architectures, with broad implications for scalability, generalization, and theoretical analysis in AI.