DNN Uncertainty Propagation Overview

Updated 28 January 2026

DNN Uncertainty Propagation is the systematic quantification and transformation of input, model, and output uncertainties in deep neural networks.
It employs analytical, moment-matching, sampling, and hybrid methods to approximate uncertainty distributions efficiently.
This approach improves robustness in safety-critical applications such as autonomous navigation and calibrated predictive inference.

Deep Neural Network (DNN) Uncertainty Propagation refers to the systematic modeling, transformation, and quantification of uncertainties—whether originating from the input, parameters, or intrinsic outputs—as they traverse through the hidden layers and computations of a deep neural architecture. This domain is critical for scientific computing, safety-critical systems, autonomous decision-making, and calibrated predictive inference, relying on rigorous mathematical tools from Bayesian inference, dynamical systems, numerical analysis, and optimization.

1. Taxonomy and Sources of Uncertainty in DNNs

DNN uncertainties decompose into at least three principal categories: input (aleatoric or data uncertainty), parameter/model (epistemic), and output (predictive/posterior). Input uncertainty refers to the randomness or lack of precision in observed features or input fields, such as sensor noise or measurement variability. Epistemic uncertainty arises from incomplete knowledge of model parameters—manifest in weight posterior distributions in Bayesian neural networks or from finite data regimes. Output or predictive uncertainty synthesizes the propagated effect of all upstream sources on the final classification, regression, or generative outputs.

Uncertainty propagation in DNNs thus addresses two central questions: (i) How does input noise or parameter randomness mathematically transfer through deep nonlinear compositions? and (ii) How can one efficiently approximate, sample, or bound the resulting output distributions for practical evaluation and calibration? Modern approaches are classified according to the source(s) they model and the propagation mechanisms they employ: analytical (moment matching, local linearization, SDEs), sampling (MC/Unscented Transform), interval and set-based, or hybrid optimization techniques.

2. Analytical and Moment-Matching Propagation Schemes

Deterministic analytical propagation is rooted in classic probabilistic numerics and signal processing. The Extended Kalman Filter (EKF) approach treats each network layer as a nonlinear process in a state-space model. Starting with a Gaussian input $x_0\sim\mathcal N(\hat{x}_0,\Sigma_0)$ , propagation alternates mean and covariance updates ( $\hat{x}_\ell$ , $\Sigma_\ell$ ) per layer, linearizing nonlinear activations via Jacobians: $\hat{x}_\ell = f(W_\ell \hat{x}_{\ell-1} + b_\ell);\quad \Sigma_\ell = F_\ell \Sigma_{\ell-1} F_\ell^\top + Q_\ell,$ where $F_\ell$ is the layer's Jacobian evaluated at the mean, and $Q_\ell$ encapsulates process or model noise (Titensky et al., 2018). For ReLU networks $F_\ell$ is masked by the local ReLU derivatives. This formulation efficiently synthesizes the effect of input covariance and process noise across deep stacks, yielding output mean/variance approximations that closely mimic Monte Carlo sampling at reduced computational cost. Lightweight Probabilistic Networks (LPN) enforce diagonal approximations of covariances, further reducing complexity but at the expense of neglecting cross-correlations (Daruna et al., 2023).

Layerwise moment-matching extends to assumed density filtering (ADF), passing first and second moments through each transformation, with closed-form updates for affine, ReLU, and sigmoid layers (Das et al., 2023). For non-Gaussian input uncertainties, recent work models Cauchy and more general stable laws, applying TV-optimal local linearization at each nonlinearity to guarantee controlled approximation error (Petersen et al., 2024).

3. Sampling-Based and Hybrid Propagation Methods

Sampling-based schemes empirically approximate uncertainty transfer by pushing random perturbations through the network. The Unscented Transform (UT) generates $2n+1$ sigma points from the input distribution, evaluating the entire network per point to reconstruct the output mean and covariance (Daruna et al., 2023). Brute-force Monte Carlo (MC) draws $N$ samples, forwarding each through the network, but incurs significant computational overhead, especially for high-fidelity uncertainty estimation.

Hybrid methods mediate between sampling and analytic linearization. The factor-graph formalism frames the propagation process as joint MAP inference over a reduced graphical model that represents the DNN's computational topology (Daruna et al., 2023). By replicating input variables for multiple stochastic draws and coupling outputs via the full network mapping, and then jointly optimizing over all variables and constraints using Gauss–Newton or iSAM2 solvers, this framework captures nonlocal nonlinear dependencies and skip connections while exploiting analytic gradients. Empirical benchmarking shows that such hybrid factor graphs consistently outperform both layerwise linearization and pure sampling in output covariance fidelity (as measured by 2-Wasserstein distance to MC reference) across image and signal domains.

4. Parameter/Model Uncertainty: Bayesian and Information-Form Approaches

Classical Bayesian deep learning propagates uncertainty at the parameter/weight level, representing the weight posterior as a high-dimensional Gaussian. The sparse information (INF) form (Lee et al., 2020) represents the precision matrix $\mathcal I = \Sigma^{-1}$ in spectral-sparsified Kronecker eigenbases, enabling low-rank sampling via Woodbury identities. Sampling from such posteriors, or linearizing the network about the MAP weights and projecting covariances via input-output Jacobians, facilitates efficient MC or analytical output uncertainty estimation. Posterior samples $\theta^{(t)}$ are pushed through the network to build empirical predictive distributions, or moments are propagated via

$\mathrm{Cov}[f] \approx J \Sigma J^\top,$

where $J$ is the Jacobian of the network outputs with respect to the weights, evaluated at $\theta_\mathrm{MAP}$ . This approach incurs only $\mathcal O(L^3)$ cost for rank $L$ spectral factorizations, dramatically reducing memory and computation over full covariance in dense models.

For certain structured nonlinearities (e.g., soft-thresholding in unrolled sparse coding), the output distribution admits closed-form updates of non-Gaussian family types such as spike-and-slab, with learned weights propagating the parameters of these distributions layer to layer (Kuzin et al., 2018).

5. Input Uncertainty and Unified Uncertainty Propagation

Explicit input-uncertainty propagation starts from a model of the input as a random vector $x\sim\mathcal N(\mu, \Sigma)$ or in more general distributional or bounded (interval) forms, aiming to characterize how such disturbances transfer through the nonlinear computational graph. First-order Taylor expansion yields layerwise mean/covariance propagation as

$\mu^{(\ell)} = f^{(\ell)}(\mu^{(\ell-1)}),\quad \Sigma^{(\ell)} = J^{(\ell)} \Sigma^{(\ell-1)} {J^{(\ell)}}^\top,$

with $J^{(\ell)}$ the Jacobian about the mean input (Valdenegro-Toro et al., 2024). This method extends to classification networks by propagating through logits and reconstructing predictive distributions at the output via sampling or moment-matching.

Unified frameworks that combine input, model, and data uncertainties run multiple stochastic forward passes (e.g., MC-dropout for model uncertainty) at the input mean, and within each pass propagate input uncertainty (via analytic Jacobians) and aleatoric (data) variance. Total predictive variance decomposes into input-induced, epistemic (model), and aleatoric (data) components,

$\mathrm{Var}[f(x)] = \mathrm{Var}_\text{model}(\mathrm{E}_\text{input}[f(x)]) + \mathrm{E}_\text{model}[\mathrm{Var}_\text{input}[f(x)]] + \mathrm{E}_\text{model,data}[\text{predicted aleatoric variance}],$

allowing precise isolation and quantification of each source (Valdenegro-Toro et al., 2024).

6. Set-based and Interval Propagation Techniques

In scenarios with epistemic incertitude or only bounded input errors, interval-valued arithmetic propagates lower and upper bounds through each linear and nonlinear layer using sharp interval analysis rules (Betancourt et al., 2021). For affine steps, the set of possible outputs is constructed as the tightest enclosure per component; for monotonic activations, the interval propagates via their envelope. Interval-valued backpropagation extends gradient optimization to interval parameters. While guaranteeing that all physically possible predictions are contained within the output intervals, such methods may suffer from dependency-driven over-approximation in deep chains, mitigated by design choices (monotonicity, small weights, regularization) and possible use of affine-arithmetic or zonotopic representations.

7. Task-Specific Frameworks and Empirical Validation

Domain-adapted uncertainty propagation frameworks further decompose uncertainty for composite tasks, such as in image registration-based segmentation. Here, transformation and appearance uncertainties are insufficient indicators for segmentation error; bespoke auxiliary DNNs are trained to predict voxelwise aleatoric variance from observable residual maps, and epistemic uncertainty is estimated as the entropy of propagated label probabilities under sampled transformations (Chen et al., 2024). The resulting uncertainty maps correlate much more strongly with actual label-propagation error than transformation-level uncertainties, substantiating the necessity of tailored propagation schemes for downstream error control.

Experimental results across high-dimensional surrogate modeling (e.g., CNN surrogates for stochastic PDEs (Luo et al., 2019)), wireless localization (Salihu et al., 2021), digital-twin monitoring (Das et al., 2023), and robotics navigation (Arnez et al., 2022) consistently demonstrate that DNNs equipped with principled uncertainty propagation outperform deterministic baselines in both calibration and task-relevant reliability metrics. Hybrid analytical-sampling methods yield the most accurate uncertainty quantification at minimal computational excess, and are robust to architectural and input noise complexities.

Table: Representative Uncertainty Propagation Methods

Method/Class	Formalism/Key Step	Reference
EKF/ADF Layerwise Propagation	Jacobian-based moment recursion	(Titensky et al., 2018)
Input-Model Unified Propagation	Taylor expansion per stochastic forward pass	(Valdenegro-Toro et al., 2024)
Factor-Graph Hybrid Optimization	Global joint MAP, marginalization over network DAG	(Daruna et al., 2023)
Sparse Information Posterior	Woodbury-sampled Kronecker spectral factorization	(Lee et al., 2020)
Interval Arithmetic	Set-based bounding layerwise via interval rules	(Betancourt et al., 2021)
SDE Perspective	Neural SDE discretization (drift/diffusion nets)	(Kong et al., 2020)

In summary, DNN uncertainty propagation comprises a spectrum of mathematically rigorous techniques for moving uncertainty through the nonlinear and compositional structure of deep networks. The selection and integration of these techniques must account for source structure (input or model), architectural constraints (dense, convolutional, residual, graph), and downstream requirements (predictive calibration, error detection, control). Recent advances elucidate the trade-offs and effectiveness of sampling, analytic linearization, hybrid optimization, and set-based approximation, providing a foundation for reliable, uncertainty-aware deep learning in both research and engineering practice.