Learning interacting particle systems from unlabeled data

Published 2 Apr 2026 in stat.ML, cs.LG, and math.NA | (2604.02581v1)

Abstract: Learning the potentials of interacting particle systems is a fundamental task across various scientific disciplines. A major challenge is that unlabeled data collected at discrete time points lack trajectory information due to limitations in data collection methods or privacy constraints. We address this challenge by introducing a trajectory-free self-test loss function that leverages the weak-form stochastic evolution equation of the empirical distribution. The loss function is quadratic in potentials, supporting parametric and nonparametric regression algorithms for robust estimation that scale to large, high-dimensional systems with big data. Systematic numerical tests show that our method outperforms baseline methods that regress on trajectories recovered via label matching, tolerating large observation time steps. We establish the convergence of parametric estimators as the sample size increases, providing a theoretical foundation for the proposed approach.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper introduces a trajectory-free quadratic loss that efficiently learns interaction and external potentials from unlabeled snapshot data.
It leverages both parametric and neural network methods to bypass trajectory recovery and directly optimize the self-test loss, ensuring robust performance even with coarse time discretizations.
Key theoretical guarantees, including convergence error rates and condition number controls, validate the method's practical applicability to real-world particle systems.

Learning Interacting Particle Systems from Unlabeled Data: A Technical Analysis

Problem Formulation and Limitations of Existing Approaches

This paper formulates and solves the inverse problem of learning interaction and external potentials in stochastic interacting particle systems from unlabeled snapshot data. The absence of trajectory or particle label information imposes substantial statistical and computational challenges. Formally, the system is a finite set of $N$ particles in $\mathbb{R}^d$ evolving via an Itô SDE governed by unknown interaction potential $\Phi$ and external potential $V$ , with standard Brownian noise:

$d X_t^i = -\frac{1}{N} \sum_{j\neq i} \nabla \Phi(X_t^i-X_t^j) dt - \nabla V(X_t^i) dt + \sigma dW_t^i,$

Of primary interest is the case where only sequences of unordered ensembles (snapshots $\mathcal{D} = \{X_{t_\ell}^{\pi,m}\}$ ) are observed at discrete times, with particle identities obfuscated due to unknown permutations $\pi_t$ .

Classical approaches (MLE, energy-based, likelihood maximization, Bayesian inverse SDE, and OT-based methods) rely on labeled trajectories or accurate matching between adjacent snapshots. These methods either perform poorly for large observation intervals due to significant label ambiguity or are computationally prohibitive. Distributional-matching or mean-field approaches further break down unless $N$ is prohibitively large, and direct trajectory recovery via optimal assignment is highly inaccurate or ill-posed under strong diffusion and non-trivial dynamics.

Trajectory-Free Quadratic Self-Test Loss: Construction and Properties

The central technical contribution is a trajectory-free, quadratic-in-the-potentials loss function, constructed by exploiting the empirical measure's exact weak stochastic PDE, bypassing the need for individual particle trajectories. Specifically, following Itô's chain rule, the empirical distribution of the particles,

$\mu^N_t(x) = \frac{1}{N} \sum_{i=1}^N \delta_{X_t^i}(x),$

evolves according to a closed-form weak-form stochastic PDE involving linear functionals of the unknown potentials $V$ , $\mathbb{R}^d$ 0. Testing this PDE against $\mathbb{R}^d$ 1 and integrating over data instantiations yields a quadratic loss that can be minimized over any function class $\mathbb{R}^d$ 2. The structure of the resulting objective is:

$\mathbb{R}^d$ 3

where:

$\mathbb{R}^d$ 4: dissipative quadratic form in gradients,
$\mathbb{R}^d$ 5: diffusion/Laplacian correction,
$\mathbb{R}^d$ 6: energy exchange term, computable per snapshot.

Crucially, no velocity or trajectory estimates are required (in contrast to MLE or drift regression methods).

Figure 1: Workflow of both estimation algorithms using the self-test loss; the left path is least squares regression with basis expansion, the right path is neural-network-based, both leveraging the quadratic self-test loss.

Key Properties:

Exact for any finite $\mathbb{R}^d$ 7, not a mean-field or infinite- $\mathbb{R}^d$ 8 approximation,
Quadratic and convex in the parameters when potentials are linearly parameterized,
Well-posed minimization, with Frechet derivative zero at the true potentials (up to a vanishing martingale term),
Robust to coarse time discretizations; supports both parametric and nonparametric architectures.

Algorithms: Parametric and Neural Network Implementations

Two estimator classes are described:

Parametric Least Squares Regression: Potentials are expanded in basis sets (e.g., polynomials, RBFs). The loss remains quadratic, reducing to regularized normal equations; computational complexity is dominated by basis cross-terms and pairwise interactions. Regularization via Tikhonov (ridge) and careful selection of the penalty is mandatory due to poor conditioning in the interaction block, particularly for large $\mathbb{R}^d$ 9.
Neural Network Regression: Potentials are parameterized via deep $\Phi$ 0-smooth architectures (e.g., MLPs with Softplus). All derivatives (notably Laplacians) are supplied by automatic differentiation. Batching and stochastic optimization allow scaling to high-dimensional and non-radial settings. Unlike basis approaches, the NN method does not require prior knowledge of the system's structure.

Both methods bypass all explicit trajectory recovery steps, such as optimal transport or Sinkhorn matching, and do not require finite-difference velocity estimates that plague MLE approaches.

Theoretical Guarantees: Convergence and Error Rates

Rigorous non-asymptotic error analysis is provided for the parametric estimator:

With $\Phi$ 1 samples and observation step $\Phi$ 2, the estimator achieves

$\Phi$ 3

with $\Phi$ 4 (Riemann quadrature) or $\Phi$ 5 (trapezoidal rule), validating the theoretically minimal statistical rates in the absence of labels.

Coercivity and identifiability are guaranteed under broad conditions (weak cross-correlations, finite basis moments, exchangeability). Conditioning degrades with $\Phi$ 6 (interaction block condition number scaling as $\Phi$ 7, see Figure 2).

For large (but finite) $\Phi$ 8, the quadratic form's condition number is dominated by the interaction block, but the method remains statistically efficient. The nonparametric (NN) estimator is empirically validated but left for future theoretical treatment.

Figure 3: $\Phi$ 9-scaling of estimator error under different time integration schemes, illustrating $V$ 0 statistical error and discretization bias scaling as $V$ 1 or $V$ 2.

Figure 4: Convergence for discrete-time models. The estimator error is limited by intrinsic model bias $V$ 3 when the simulation and observation steps coincide.

Figure 2: Scaling of condition numbers of the normal matrix for increasing $V$ 4. The conditioning is worse for $V$ 5 than $V$ 6, matching theory.

Empirical Results: Superiority in the Unlabeled Setting

The new self-test approach is systematically benchmarked against multiple baselines:

Labeled MLE: Ideal but unimplementable in the unlabeled regime; achieves lowest error for infinitesimal $V$ 7 due to perfect label information.
Sinkhorn MLE: Recovers pseudo-labels by OT-based assignment; label mismatches rapidly accumulate for large $V$ 8 or under strong diffusion, severely degrading estimation performance and rendering computation prohibitively expensive.
Self-Test (LSE and NN): Achieves lower or comparable errors as soon as $V$ 9, outperforming both labeled and Sinkhorn MLE as $d X_t^i = -\frac{1}{N} \sum_{j\neq i} \nabla \Phi(X_t^i-X_t^j) dt - \nabla V(X_t^i) dt + \sigma dW_t^i,$ 0 increases, remaining robust even at very coarse observation intervals (Table 1, Figure 5).
Figure 5: Non-radial potential recovery ( $d X_t^i = -\frac{1}{N} \sum_{j\neq i} \nabla \Phi(X_t^i-X_t^j) dt - \nabla V(X_t^i) dt + \sigma dW_t^i,$ 1); the self-test NN achieves low relative gradient error.

Additional experiments recover challenging non-radial, ill-conditioned, and singular potentials, where only the self-test NN is universally applicable. Performance does not degrade catastrophically on models that violate technical regularity assumptions (Figures 6, 7), showing strong empirical robustness.

Figure 6: Estimation of interaction gradients for sharply varying or singular potentials; the self-test NN and parametric methods provide accurate recovery without labels.

Figure 7: Recovery performance on various radial test potentials, including those with large gradient magnitudes or lack of smoothness, confirming estimation reliability.

Implications, Limitations, and Outlook

Practical Implications:

Enables robust and scalable inference of interaction laws from population snapshots in physical, biological, and social systems—the typical empirical context where trajectories are unavailable.
Avoids the computational and statistical pitfalls of label and velocity recovery.
Demonstrated robustness to measurement gaps and high-dimensionality, enabling application to real-world experimental and observational datasets.

Theoretical Implications:

Establishes that trajectory/label recovery is not required for consistent potential learning with finite $d X_t^i = -\frac{1}{N} \sum_{j\neq i} \nabla \Phi(X_t^i-X_t^j) dt - \nabla V(X_t^i) dt + \sigma dW_t^i,$ 2, providing sharp error bounds matching known rates for labeled data up to constant multiplicative bias.
The loss structure and self-testing test families could be adapted for related inverse problems with empirical measure observations.

Limitations and Future Work:

Current framework assumes homogeneity; the extension to multi-species particle systems without type labels remains an open problem (will require fundamentally new ideas, e.g., for type-coupled weak-PDEs).
Singular potentials with strong short-range divergences necessitate further analysis and possibly adaptive regularization.
Neural network estimator theory (rates, landscape geometry, implicit regularization) remains undeveloped and is a crucial future direction.
Sharp minimax lower bounds for the unlabeled setting to formally quantify the statistical cost of the missing label/trajectory information would close a core theoretical gap.

Conclusion

The trajectory-free self-test quadratic loss for interacting particle systems resolves a fundamental limitation in the scientific learning of stochastic dynamics from unlabeled data. It delivers efficient, theoretically justified, and empirically robust parameter recovery under challenging real-world conditions, with broad immediate applications in experimental and computational sciences.

Reference: "Learning interacting particle systems from unlabeled data" (2604.02581)

Markdown Report Issue