Probabilistic Functional Neural Networks

Published 27 Mar 2025 in stat.ML and cs.LG | (2503.21585v1)

Abstract: High-dimensional functional time series (HDFTS) are often characterized by nonlinear trends and high spatial dimensions. Such data poses unique challenges for modeling and forecasting due to the nonlinearity, nonstationarity, and high dimensionality. We propose a novel probabilistic functional neural network (ProFnet) to address these challenges. ProFnet integrates the strengths of feedforward and deep neural networks with probabilistic modeling. The model generates probabilistic forecasts using Monte Carlo sampling and also enables the quantification of uncertainty in predictions. While capturing both temporal and spatial dependencies across multiple regions, ProFnet offers a scalable and unified solution for large datasets. Applications to Japan's mortality rates demonstrate superior performance. This approach enhances predictive accuracy and provides interpretable uncertainty estimates, making it a valuable tool for forecasting complex high-dimensional functional data and HDFTS.

Abstract PDF Upgrade to Chat

Summary

The paper introduces Probabilistic Functional Neural Networks (ProFnet), which integrates deep networks with Gaussian Processes in a latent space to model and forecast high-dimensional functional time series.
Empirical results show ProFnet achieves superior point forecast accuracy and provides reliable uncertainty quantification compared to baselines on real-world high-dimensional functional time series data.
ProFnet offers a scalable and unified framework for modeling high-dimensional functional time series, jointly handling temporal and spatial dependencies while providing essential probabilistic forecasts.

Probabilistic Functional Neural Networks (ProFnet), as introduced in "Probabilistic Functional Neural Networks" (2503.21585), represent a deep learning framework designed for modeling and forecasting High-dimensional Functional Time Series (HDFTS). This class of data involves observations that are functions (e.g., curves, surfaces) recorded over time across numerous spatial units or entities. HDFTS presents significant modeling challenges arising from inherent nonlinearity, potential nonstationarity in temporal dynamics, and high dimensionality, particularly when the number of spatial units ( $H$ ) exceeds the number of time points ( $T$ ). ProFnet aims to address these challenges by integrating feedforward neural network architectures with probabilistic modeling principles, specifically leveraging Gaussian Processes (GPs) within the latent space to capture temporal dependencies and provide uncertainty quantification.

Methodology of ProFnet

The ProFnet architecture consists of three interconnected blocks: an Encoding Block, a Probabilistic Block, and a Generator Block. It processes input functional data $X_{t,h}(u)$ for region $h$ at time $t$ to generate probabilistic forecasts $\hat{X}_{t',h'}(u)$ for region $h'$ at a future time $t'$ .

Encoding Block

This block transforms the input functional data and associated spatial context into fixed-dimensional vector representations.

Functional Encoding: The input function $X_{t,h}(u)$ is processed by a functional learning layer. This layer computes inner products between the input function and a set of learnable functional coefficients $\{\beta_l(u)\}_{l=1}^L$ , often represented using a basis expansion (e.g., B-splines, Fourier basis). The coefficients of this basis expansion are learned parameters. The output is a vector of inner products $[\langle X_{t,h}, \beta_1 \rangle, ..., \langle X_{t,h}, \beta_L \rangle]$ . This can be followed by standard fully connected layers to yield a deep functional representation $\mathbf{W}_x$ .
Spatial Encoding: The region indicator $h$ (or other spatial covariates) is mapped to a latent representation $\mathbf{W}_h$ . This is typically achieved using an embedding layer or matrix factorization techniques, potentially followed by fully connected layers, allowing the model to learn similarities and relationships between regions.
Concatenation: The functional and spatial representations are concatenated to form a unified encoded representation $\mathbf{W} = [\mathbf{W}_x, \mathbf{W}_h]$ , capturing both the functional shape and the spatial context of the input observation at time $t$ .

Probabilistic Block

This core component introduces probabilistic modeling for temporal dynamics using Gaussian Processes within the latent space.

Latent Gaussian Processes: The model assumes the temporal evolution is governed by a set of $K$ independent Gaussian Processes, $\{GP_k(t)\}_{k=1}^K$ . Each $GP_k(t)$ is defined by a mean function (often zero) and a covariance function (kernel), such as the squared exponential kernel $k(t, t') = \sigma_k^2 \exp(-\frac{(t-t')^2}{2\rho_k^2})$ .
Parameter Learning: Crucially, the parameters defining the state or conditional distribution of these GPs are not fixed but are learned functions of the encoded input representation $\mathbf{W}$ . A feedforward neural network (parameter learning block) maps $\mathbf{W}$ to the necessary parameters (e.g., parameters influencing the mean $\mu_k(t)$ or kernel parameters $\sigma_k, \rho_k$ related to the input time $t$ ). This allows the GP dynamics to adapt based on the specific input function and region.
Probabilistic Forecasting: To forecast from time $t$ to $t'$ , the model leverages the conditional properties of GPs. Given the state of the GPs implicitly determined by the input $X_{t,h}(u)$ via $\mathbf{W}$ and the learned parameters, the model samples from the conditional distribution $P( \mathbf{z}(t') | \text{state at } t )$ , where $\mathbf{z}(t') = (z_1(t'), ..., z_K(t'))^T$ is a $K$ -dimensional latent vector representing the state of the $K$ GPs at the forecast time $t'$ . This sampling process inherently captures the uncertainty in the temporal evolution. Monte Carlo (MC) sampling is employed here: multiple samples of $\mathbf{z}(t')$ are drawn from this conditional distribution to approximate the forecast distribution in the latent space.

Generator Block

This block maps the sampled latent state back to the functional data space.

Input: Takes a sampled latent vector $\mathbf{z}(t')$ from the probabilistic block and the target region's encoded representation $\mathbf{W}_{h'}$ (obtained via the spatial encoding for the target region $h'$ ) as input.
Functional Generation: Utilizes feedforward neural network layers to transform the combined input $[\mathbf{z}(t'), \mathbf{W}_{h'}]$ into the parameters of a basis expansion (similar to the functional encoding, but in reverse) or directly predicts the values of the function $\hat{X}_{t',h'}(u)$ at specific points $u$ . This generates a single functional forecast corresponding to the input latent sample $\mathbf{z}(t')$ . By generating forecasts for each MC sample of $\mathbf{z}(t')$ , a distribution of functional forecasts is obtained.

Training Procedure

ProFnet is trained end-to-end using stochastic gradient descent or variants like Adam. The objective function is analogous to that of a Variational Autoencoder (VAE), balancing reconstruction accuracy with regularization of the latent space. The loss function typically comprises two terms:

Reconstruction Loss: Measures the discrepancy between the generated forecast $\hat{X}_{t',h'}(u)$ (often using the mean of the MC samples for point forecasts during training, or sampling one during training) and the true target function $X_{t',h'}(u)$ . A common choice is the Mean Squared Error (MSE) integrated over the functional domain $u$ : $\mathbb{E} \left[ \int (\hat{X}_{t',h'}(u) - X_{t',h'}(u))^2 du \right]$ .
KL Divergence Term: Acts as a regularizer on the latent space. It encourages the approximate posterior distribution of the latent GP variables at time $t'$ , conditioned on the input at time $t$ (implicitly represented via the learned parameters), denoted $Q(\mathbf{Z}(t')|X_{t,h})$ , to remain close to the prior distribution $P(\mathbf{Z}(t'))$ defined by the unconditional $K$ independent GPs. This term is calculated as $D_{KL}(Q(\mathbf{Z}(t')|X_{t,h}) || P(\mathbf{Z}(t')))$ .

The combination enables the model to learn meaningful latent representations that capture the temporal dynamics while ensuring the generated forecasts align with the observed data. A key advantage noted is the "lag-free" training capability, where the model can be trained simultaneously for various forecast horizons ( $t \to t'$ ) without needing separate models or retraining for each specific lag $\delta = t' - t$ .

Empirical Performance and Findings

The practical applicability of ProFnet was evaluated using Japan's prefectural mortality rate data (an HDFTS dataset with $H=47$ ) and simulation studies (2503.21585).

Japan Mortality Rate Forecasting

Predictive Accuracy: ProFnet demonstrated superior point forecast accuracy compared to several baseline methods, including Functional Linear Models (FLM), Neural Operators (NOP), Unified Functional Time Series (UFTS), Multivariate Functional Time Series (MFTS), and Multivariate Functional Locally Stationary Time Series (MFLTS). Using the mean of the Monte Carlo samples ( $\text{ProFnet}_{\text{mean}}$ ) consistently yielded the lowest Mean Squared Forecast Error (MSFE) across different forecast horizons ( $\delta = 1, 5, 10$ years). For instance, improvements in MSFE relative to the next best model were reported.
Uncertainty Quantification: The probabilistic forecasts generated via Monte Carlo sampling provided reliable prediction intervals. The empirical coverage probabilities for nominal 95% intervals were reported to be high, achieving 98.3% for $\delta=1$ , 95.8% for $\delta=5$ , and 85.6% for $\delta=10$ . While coverage decreased for longer horizons, it remained substantial.
Computational Efficiency: The feedforward architecture resulted in faster training times compared to RNN-based approaches evaluated, highlighting its scalability benefits for high-dimensional datasets.
Interpretability: The framework allowed for deriving directional regional associations based on forecast quality metrics (e.g., coverage probability), offering potential insights into spatial dependencies beyond simple geographic proximity.

Simulation Studies

Confirmed the model's ability to achieve target coverage probabilities in controlled settings.
Investigated the impact of the number of latent GPs ( $K$ ). Performance generally improved with increasing $K$ , although potential for overfitting exists if $K$ becomes excessively large.
Demonstrated favorable scalability characteristics: training time was observed to increase approximately linearly with $K$ , and the model maintained efficient convergence.

Implications for Practice

ProFnet offers several practical advantages for modeling and forecasting HDFTS:

Scalability: Its feedforward structure makes it computationally more tractable than recurrent architectures, especially when the number of spatial units ( $H$ ) is large. This is crucial for many real-world HDFTS applications (e.g., environmental monitoring, regional economics, public health).
Unified Modeling: It jointly models temporal dynamics and spatial dependencies within a single framework, avoiding the need to build separate models for each spatial unit or complex multi-stage modeling pipelines.
Probabilistic Forecasts: The integration of GPs and Monte Carlo sampling provides a principled mechanism for quantifying forecast uncertainty. This yields prediction intervals and distributional forecasts, which are essential for risk assessment and decision-making under uncertainty.
Flexibility: The model can inherently handle nonstationary data without explicit preprocessing steps like detrending, and its "lag-free" training capability enhances efficiency when forecasts across multiple horizons are required.

In conclusion, ProFnet provides a valuable tool for practitioners dealing with complex HDFTS data. By combining the representation learning capabilities of deep neural networks with the probabilistic inference strengths of Gaussian Processes, it delivers accurate point forecasts while also offering robust uncertainty quantification, addressing key limitations of previous methods in this domain. Its scalability and flexibility further enhance its applicability to large-scale, real-world problems.