Stochastic Information Geometry

Updated 25 January 2026

Stochastic Information Geometry is a field that employs Riemannian and dual-flat structures, such as Fisher–Rao and Wasserstein metrics, to quantify uncertainty and statistical distances in evolving stochastic models.
It leverages tools like entropy-regularized optimal transport, mirror descent, and natural gradient methods to analyze non-equilibrium, high-dimensional, and distributed systems.
This framework enables rigorous uncertainty quantification, robust inference, and efficient algorithmic design, with applications spanning thermodynamics, quantum systems, and spatial networked data.

Stochastic information geometry studies the interplay between information-theoretic metrics and the probabilistic dynamics induced by stochastic processes, particularly in non-equilibrium, high-dimensional, and distributed systems. Central to this field is the use of Riemannian and dual-flat geometric structures—such as the Fisher–Rao and Wasserstein metrics—to quantify statistical distances, uncertainty, and efficiency of estimators, as well as to guide algorithmic inference in settings ranging from time-dependent stochastic models and thermodynamics to probabilistic learning, geometry-aware aggregation, and bandit optimization.

1. Geometric Structures of Stochastic Models

Stochastic information geometry extends classical information geometry by equipping spaces of probability measures—parameterized by evolving random processes—with one or more Riemannian metrics, providing intrinsic notions of distance, curvature, and geodesics.

The Fisher–Rao metric defines a Riemannian structure on parametric statistical manifolds (e.g., exponential families), with infinitesimal length element for a parametric model $p(x;\theta)$

$ds^2 = g_{ij}(\theta) d\theta^i d\theta^j = \mathbb{E}_{p(\cdot;\theta)}\big[\partial_i \ln p(X;\theta) \partial_j \ln p(X;\theta)\big] d\theta^i d\theta^j$

yielding distinguishability measures and statistical line elements for time-evolving PDFs $p(x,t)$ (Ito, 2017, Tenkès et al., 2017).

For manifolds of Gaussian distributions $\mathcal{G} = \{\mathcal{N}(\mu, \Sigma)\}$ , the Fisher–Rao metric specializes to:

$ds^2 = d\mu^\top\,\Sigma^{-1}\,d\mu + \frac{1}{2}\mathrm{Tr} \big(\Sigma^{-1} d\Sigma \Sigma^{-1} d\Sigma\big)$

The 2-Wasserstein metric induces a second Riemannian structure:

$W_2^2(\mathcal{N}(\mu_1,\Sigma_1), \mathcal{N}(\mu_2,\Sigma_2)) = \|\mu_1 - \mu_2\|^2 + \mathrm{Tr}\big[\Sigma_1 + \Sigma_2 - 2(\Sigma_2^{1/2}\Sigma_1\Sigma_2^{1/2})^{1/2}\big]$

(Ghatak, 24 Aug 2025).

Entropy-regularized optimal transport interpolates between the Wasserstein and Fisher–Rao geometries, yielding a continuous one-parameter family of metrics and divergences on the probability simplex (Amari et al., 2017).
Mirror descent and natural gradient methods induce non-Euclidean geometry on parameter space via Bregman divergences, which are shown to be equivalent to the natural gradient on the dual manifold for exponential families (Raskutti et al., 2013).

2. Statistical Distance and Information Length in Stochastic Dynamics

For a family of time-evolving densities $p(z,t)$ governed by stochastic processes:

The infinitesimal statistical distance is grounded in the second-order expansion of the Kullback–Leibler divergence,

$d\ell^2 = \int dz\, \frac{[\partial_t p(z,t)]^2}{p(z,t)} \, (dt)^2$

yielding a Riemannian metric on the one-dimensional statistical manifold $\{p(\cdot, t)\}$ with $g_{tt}(t) = \int dz\, (\partial_t p)^2 / p$ (Tenkès et al., 2017).

The information length $\mathcal{L}(T)$ integrates this rate over time,

$\mathcal{L}(T) = \int_0^T dt \, \sqrt{g_{tt}(t)} = \int_0^T dt \, \frac{1}{\tau(t)}$

where $1/\tau(t) \equiv \sqrt{E(t)}$ is the instantaneous rate of information change. This length quantifies the number of statistically distinguishable states traversed by the system, providing a unifying coordinate-independent measure for non-equilibrium processes (Tenkès et al., 2017, Ito et al., 2018, Guel-Cortez et al., 2023).

In multivariate Gaussian processes, the information rate is

$\Gamma^2 = \dot\mu^\top\Sigma^{-1}\dot\mu + \frac{1}{2}\mathrm{Tr}\left[(\Sigma^{-1}\dot\Sigma)^2\right]$

(Guel-Cortez et al., 2023).

3. Stochastic Information Geometry in Distributed and Networked Systems

Recent advances integrate stochastic geometry (spatial Poisson point processes, PPP) with information geometry for inference and aggregation in spatial networks (Ghatak, 24 Aug 2025):

Distributions (“marks”) are attached to each random node $x \in \Phi$ of the PPP, typically as Gaussians $\mathcal{N}(\mu_x, \Sigma_x)$ .
Fréchet means over these random fields are defined by minimizing the expectation (population) or empirical average (empirical) of squared information-geometric distances (Fisher–Rao or 2-Wasserstein).

$\bar{p} = \arg\min_{q \in \mathcal{G}} F(q), \quad F(q) = \mathbb{E}_\Phi\left[ d^2(q, p_X) \right]$

Concentration theorems quantify deviation and uncertainty due to the sampling randomness (Ghatak, 24 Aug 2025).

Geometry-aware aggregation blends node reports into a global belief, downweighting unreliable sensors via precision-weighted averaging in the manifold of Gaussians, achieving robust fusion and explicit error bounds (Ghatak, 24 Aug 2025).
Semantic compression protocols subsample the PPP to achieve a target semantic-fidelity (Fréchet mean distortion), with explicit distortion and subsampling rate guarantees (Ghatak, 24 Aug 2025).

4. Stochastic Information Geometry in Thermodynamics and Statistical Physics

The geometric approach links non-equilibrium thermodynamics to information geometry:

The statistical line element in the Fisher–Rao metric is related to entropy production and observable fluctuations in both classical Markov and quantum master equations (Ito, 2017, Bettmann et al., 2024).
Geometric thermodynamics studies the decomposition of entropy production into excess and housekeeping parts, as well as trade-off relations (“thermodynamic uncertainty relations”)—all expressed as Pythagorean projections onto convex submanifolds of trajectory distributions (Ito, 2022, Ito et al., 2018).
A dually flat structure enables the derivation of second-law inequalities and speed–cost tradeoffs for nonequilibrium transformations. For any protocol,

$\mathcal{L}^2 \leq \Sigma, \qquad \tau \geq \frac{\mathcal{L}^2}{\Sigma}$

where $\Sigma$ is cumulative thermodynamic cost and $\mathcal{L}$ the statistical length (Ito, 2017).

In quantum settings, the quantum Fisher information admits incoherent and coherent contributions, supporting geometric decompositions in open quantum systems (Bettmann et al., 2024, Melo et al., 18 Jan 2026). Trajectory-level geometric objects (conditional QFI) can be defined analogously and satisfy speed-limit inequalities (Melo et al., 18 Jan 2026).

5. Dually Flat Manifolds and Algorithmic Methods

Dually flat geometry, established by Bregman divergences and exponential families, underpins efficient stochastic learning and inference algorithms:

The mirror descent and natural gradient methods exploit the Riemannian structure induced by Bregman divergences, achieving asymptotic Cramér–Rao efficiency in estimation (Raskutti et al., 2013).
For exponential family models, the mirror descent update is equivalent to natural gradient descent on the dual mean-parameter manifold.
Monte Carlo Information Geometry provides practical stochastic approximations of these geometries for models with intractable partition functions, enabling efficient algorithmic computation of geodesics, projections, and clustering (Nielsen et al., 2018).
Applications include geometry-aware k-means on mixtures, natural evolution strategies, and stochastic natural gradient descent for both continuous and combinatorial parameter spaces (Malagò et al., 2014).

6. Hierarchies of Projections, Interactions, and Stochastic Additivity

Stochastic information geometry formalizes the decomposition of statistical and thermodynamic quantities using projections in dually flat spaces:

Core information-theoretic measures (mutual information, transfer entropy, stochastic interaction) emerge as Kullback–Leibler projections onto e-flat submanifolds defined by various patterns of independence or “disconnectedness” (Oizumi et al., 2015).
In stochastic thermodynamics, entropy production of coupled systems decomposes into additive (marginal), interaction (non-additive), and global terms—all interpreted as distances to flat submanifolds of trajectory distributions (Ito et al., 2018).
The failure of additivity in non-bipartite networks is quantified by the “stochastic interaction,” itself a KL-divergence projection, yielding a geometric hierarchy that unifies statistical dependence and thermodynamic irreversibility.

7. Significance and Applications

Stochastic information geometry provides:

Rigorous uncertainty quantification, error bounds, and geometric uncertainty relations in both classical and quantum dynamics (Ghatak, 24 Aug 2025, Bettmann et al., 2024, Melo et al., 18 Jan 2026).
Geometry-aware methods for distributed inference, semantic communication, and control in spatially random systems, with explicit robustness to heterogeneity (Ghatak, 24 Aug 2025).
A unified geometric theory synthesizing stochastic processes, statistical inference, optimization, and non-equilibrium thermodynamics through the properties of information-geometric manifolds.

This framework underpins principled algorithmic design for geometry-aware learning, optimal control, and robust inference in high-dimensional stochastic environments, particularly where statistical structure and spatial or temporal randomness play intertwined roles (Ghatak, 24 Aug 2025, Malagò et al., 2014, Raskutti et al., 2013).