Anisotropic Heatmap Regression
- Anisotropic Heatmap Regression is a framework that approximates spatially varying functions by summing anisotropic multivariate splats, enhancing local adaptivity and interpretability.
- The method employs parameterized anisotropic Gaussian bumps with positive-definite covariance matrices, optimized via Wasserstein–Fisher–Rao gradient flows for precise function fitting.
- Empirical results demonstrate that this approach outperforms traditional techniques in 1D and 2D tasks by reducing error and preserving geometric interpretability.
Anisotropic heatmap regression is a regression framework wherein the target is a spatially varying function (commonly a heatmap) and the predictor is modeled as a sum of anisotropic multivariate “splats”—parametric bump functions such as anisotropic Gaussians, each with heterogenous orientation and scale controlled by positive-definite covariance matrices. This approach, formalized in the Splat Regression Model (SRM) and optimized via gradient flows in the Wasserstein–Fisher–Rao (WFR) metric geometry, achieves locally adaptive, interpretable, and highly expressive function approximations particularly effective in low-dimensional settings (Daniels et al., 18 Nov 2025).
1. Anisotropic Splat Primitives
Let denote the input domain. The basic primitive is the anisotropic splat function , defined as the push-forward of an isotropic “mother” density (often standard Gaussian) under affine transformation:
For the Gaussian mother density,
leading to the explicit multivariate Gaussian form
Here, is the center and is a positive-definite covariance matrix encoding both local scale (via eigenvalues) and orientation (via eigenvectors). The decompositional flexibility— for full-rank , or via spectral decomposition with and diagonal —enables local adaptation to anisotropy.
2. Splat Regression Model Architecture
The Splat Regression Model approximates a target mapping by forming a finite mixture of anisotropic splats,
with
- : amplitude (output weight) vector for the -th splat,
- : center,
- : anisotropy matrix.
Parameter counting per splat yields in , in , and in (or for a symmetric parameterization).
3. Parameterization and Anisotropy Constraints
Positive-definiteness of is enforced via parameterizations such as:
- unconstrained full-rank (yielding ),
- Cholesky decomposition: (with lower-triangular),
- Eigen-decomposition: with all .
In eigen-parameterization, unconstrained log-eigenvalues parameterize the scale, while rotation matrices (in , parameterizable via the Lie algebra) encode orientation. This structure enables each splat to locally stretch and align with elongated features in the underlying data.
4. Wasserstein–Fisher–Rao Gradient Flow Optimization
Optimization proceeds in the non-Euclidean space of mixing measures, lifting the splat parameters to an atomic measure
with each atom . The population loss,
(where is e.g., squared error), is minimized via the Wasserstein–Fisher–Rao gradient flow, decomposing tangent directions into a mass-teleportation (Fisher–Rao) and a transport (Wasserstein) component. The gradients are:
- Fisher–Rao (mass) gradient:
- Wasserstein (parameter) gradients:
where at . In practice, stochastic gradient steps or particle birth-death schemes are applied over minibatches of .
5. Workflow for Anisotropic Heatmap Regression
When the regression target is a heatmap on a 2D grid , the workflow comprises: a) Select a mother splat (typically 2D standard Gaussian). b) Initialize splats as small isotropic Gaussians on a grid of centers , with initial and . c) Specify loss: . d) Compute error and form the above gradients via Monte-Carlo minibatching. e) Update via Adam or SGD on the combined WFR gradients. f) The learnt covariance encodes local anisotropic scaling: large eigenvalues elongate the splat, aligning it with elongated heatmap features.
Performance is tracked via held-out mean-squared error, and qualitative assessment is aided by visualizing ellipses to examine alignment with heatmap structures.
6. Empirical Performance and Comparative Analysis
Empirical results indicate that anisotropic splat models offer substantial benefits on low-dimensional approximation and regression tasks:
- In a 1D multiscale interpolation problem, a splat model learns an adaptive interpolation grid, outperforming Haar-wavelet interpolation and matching Chebyshev methods on nonuniform domains.
- On a 2D regression task with and anisotropic splats, models achieve an order of magnitude lower error than comparably sized multilayer perceptrons (MLPs) or Kolmogorov–Arnold networks by leveraging local orientational adaptation.
- On physics-informed regression (e.g., Allen–Cahn equation interfaces on ), anisotropic splat models fit boundary layers and curved interfaces more accurately and with fewer parameters than isotropic radial basis function (RBF) methods or standard physics-informed neural networks (PINNs) (Daniels et al., 18 Nov 2025).
A plausible implication is that the learned anisotropic parameters confer model capacity that remains interpretable and resistant to over-parameterization in low dimensions.
7. Interpretability, Adaptivity, and Applications
Anisotropic heatmap regression via Splat Regression Models yields weighted sums of ellipsoidal bump functions, with learnable centers, amplitudes, and anisotropy matrices. WFR-gradient-based end-to-end learning preserves interpretability: each splat models a localized structure with explicit geometric meaning in . Visualization of splats as ellipses elucidates how the model aligns and adapts to structured regions of the data, especially in cases exhibiting elongated, curved, or otherwise anisotropic phenomena. This approach enables flexible and accurate solutions to diverse approximation, estimation, and inverse problems where local adaptivity and geometric structure are paramount (Daniels et al., 18 Nov 2025).