Papers
Topics
Authors
Recent
Search
2000 character limit reached

Anisotropic Heatmap Regression

Updated 11 February 2026
  • Anisotropic Heatmap Regression is a framework that approximates spatially varying functions by summing anisotropic multivariate splats, enhancing local adaptivity and interpretability.
  • The method employs parameterized anisotropic Gaussian bumps with positive-definite covariance matrices, optimized via Wasserstein–Fisher–Rao gradient flows for precise function fitting.
  • Empirical results demonstrate that this approach outperforms traditional techniques in 1D and 2D tasks by reducing error and preserving geometric interpretability.

Anisotropic heatmap regression is a regression framework wherein the target is a spatially varying function (commonly a heatmap) and the predictor is modeled as a sum of anisotropic multivariate “splats”—parametric bump functions such as anisotropic Gaussians, each with heterogenous orientation and scale controlled by positive-definite covariance matrices. This approach, formalized in the Splat Regression Model (SRM) and optimized via gradient flows in the Wasserstein–Fisher–Rao (WFR) metric geometry, achieves locally adaptive, interpretable, and highly expressive function approximations particularly effective in low-dimensional settings (Daniels et al., 18 Nov 2025).

1. Anisotropic Splat Primitives

Let xRdx \in \mathbb{R}^d denote the input domain. The basic primitive is the anisotropic splat function φ(x;μ,Σ)\varphi(x;\mu,\Sigma), defined as the push-forward of an isotropic “mother” density ρ(z)\rho(z) (often standard Gaussian) under affine transformation:

φ(x;μ,Σ)=detΣ1/2ρ(Σ1/2(xμ))\varphi(x;\mu,\Sigma) = |\det \Sigma|^{-1/2} \,\rho\big(\Sigma^{-1/2}(x-\mu)\big)

For the Gaussian mother density,

ρ(z)=(2π)d/2exp(z2/2)\rho(z) = (2\pi)^{-d/2} \exp(-\|z\|^2/2)

leading to the explicit multivariate Gaussian form

φ(x;μ,Σ)=(2π)d/2Σ1/2exp(12(xμ)Σ1(xμ))\varphi(x;\mu,\Sigma) = (2\pi)^{-d/2}|\Sigma|^{-1/2}\exp\big(-\tfrac12(x-\mu)^\top \Sigma^{-1}(x-\mu)\big)

Here, μRd\mu \in \mathbb{R}^d is the center and ΣS+d\Sigma \in \mathbb{S}_+^d is a positive-definite covariance matrix encoding both local scale (via eigenvalues) and orientation (via eigenvectors). The decompositional flexibility—Σ=AA\Sigma = A A^\top for full-rank AA, or via spectral decomposition Σ=RΛR\Sigma = R \Lambda R^\top with RSO(d)R \in SO(d) and diagonal Λ\Lambda—enables local adaptation to anisotropy.

2. Splat Regression Model Architecture

The Splat Regression Model approximates a target mapping f ⁣:RdRpf^\star\!:\mathbb{R}^d \to \mathbb{R}^p by forming a finite mixture of kk anisotropic splats,

f(x)=i=1kwiφ(x;μi,Σi)f(x) = \sum_{i=1}^k w_i\,\varphi(x;\mu_i,\Sigma_i)

with

  • wiRpw_i \in \mathbb{R}^p: amplitude (output weight) vector for the ii-th splat,
  • μiRd\mu_i \in \mathbb{R}^d: center,
  • ΣiS+d\Sigma_i \in \mathbb{S}_+^d: anisotropy matrix.

Parameter counting per splat yields pp in wiw_i, dd in μi\mu_i, and d2d^2 in AiA_i (or d(d+1)/2d(d{+}1)/2 for a symmetric parameterization).

3. Parameterization and Anisotropy Constraints

Positive-definiteness of Σi\Sigma_i is enforced via parameterizations such as:

  • AiA_i unconstrained full-rank (yielding Σi=AiAi\Sigma_i=A_iA_i^\top),
  • Cholesky decomposition: Σi=LiLi\Sigma_i=L_iL_i^\top (with LiL_i lower-triangular),
  • Eigen-decomposition: Σi=Ridiag(λi1,,λid)Ri\Sigma_i=R_i\operatorname{diag}(\lambda_{i1},\ldots,\lambda_{id})R_i^\top with all λij>0\lambda_{ij}>0.

In eigen-parameterization, unconstrained log-eigenvalues parameterize the scale, while rotation matrices RiR_i (in SO(d)SO(d), parameterizable via the Lie algebra) encode orientation. This structure enables each splat to locally stretch and align with elongated features in the underlying data.

4. Wasserstein–Fisher–Rao Gradient Flow Optimization

Optimization proceeds in the non-Euclidean space of mixing measures, lifting the splat parameters {wi,μi,Σi}i=1k\{w_i,\mu_i,\Sigma_i\}_{i=1}^k to an atomic measure

μP(Rp×ρ(Rd))\mu\in \mathcal{P}(\mathbb{R}^p\times \rho(\mathbb{R}^d))

with each atom (wi,ρAi,bi)(w_i, \rho_{A_i, b_i}). The population loss,

F(fμ)=Exπ[L(fμ(x),y(x))]F(f_\mu) = \mathbb{E}_{x\sim \pi}\big[L(f_\mu(x), y(x))\big]

(where LL is e.g., squared error), is minimized via the Wasserstein–Fisher–Rao gradient flow, decomposing tangent directions into a mass-teleportation (Fisher–Rao) and a transport (Wasserstein) component. The gradients are:

  • Fisher–Rao (mass) gradient:

δFRF(μ)(wi,Ai,bi)=δF(x),wiρAi,bi(x)π(dx)EjδF(x),wjρAj,bj(x)π(dx)\delta^{FR}F(\mu)(w_i,A_i,b_i) = \int \langle \delta F(x),w_i \rangle\,\rho_{A_i, b_i}(x)\, \pi(dx) - \mathbb{E}_j \int \langle \delta F(x),w_j \rangle\,\rho_{A_j, b_j}(x)\, \pi(dx)

  • Wasserstein (parameter) gradients:

wF=δF(x)ρAi,bi(x)π(dx) AF=δF(x),wi[I+xlogρAi,bi(x)(xbi)]AiTρAi,bi(x)π(dx) bF=δF(x),wixlogρAi,bi(x)ρAi,bi(x)π(dx)\begin{align*} \partial_{w}F &= \int \delta F(x)\, \rho_{A_i, b_i}(x)\, \pi(dx) \ \partial_{A}F &= \int \langle \delta F(x), w_i \rangle \left[ I + \nabla_x \log \rho_{A_i, b_i}(x)\, (x{-}b_i)^\top \right] A_i^{-T}\, \rho_{A_i, b_i}(x)\, \pi(dx) \ \partial_{b}F &= -\int \langle \delta F(x), w_i \rangle \nabla_x \log \rho_{A_i, b_i}(x)\, \rho_{A_i, b_i}(x)\, \pi(dx) \end{align*}

where δF(x)=L/f\delta F(x) = \partial L / \partial f at fμf_\mu. In practice, stochastic gradient steps or particle birth-death schemes are applied over minibatches of xπx \sim \pi.

5. Workflow for Anisotropic Heatmap Regression

When the regression target y(x)y(x) is a heatmap on a 2D grid x[0,1]2Rx\in [0,1]^2\to \mathbb{R}, the workflow comprises: a) Select a mother splat ρ\rho (typically 2D standard Gaussian). b) Initialize kk splats as small isotropic Gaussians on a grid of centers bib_i, with initial wi0w_i\approx 0 and AiαIA_i\approx \alpha I. c) Specify loss: F=12xgrid(fμ(x)y(x))2F=\tfrac12\sum_{x \in \mathrm{grid}}(f_\mu(x) - y(x))^2. d) Compute error δF(x)=fμ(x)y(x)\delta F(x)=f_\mu(x)-y(x) and form the above gradients via Monte-Carlo minibatching. e) Update {wi,bi,Ai}\{w_i, b_i, A_i\} via Adam or SGD on the combined WFR gradients. f) The learnt covariance Σi=AiAi\Sigma_i=A_iA_i^\top encodes local anisotropic scaling: large eigenvalues elongate the splat, aligning it with elongated heatmap features.

Performance is tracked via held-out mean-squared error, and qualitative assessment is aided by visualizing ellipses {(xbi)Σi1(xbi)=c}\{(x-b_i)^\top\Sigma_i^{-1}(x-b_i)=c\} to examine alignment with heatmap structures.

6. Empirical Performance and Comparative Analysis

Empirical results indicate that anisotropic splat models offer substantial benefits on low-dimensional approximation and regression tasks:

  • In a 1D multiscale interpolation problem, a k=30k=30 splat model learns an adaptive interpolation grid, outperforming Haar-wavelet interpolation and matching Chebyshev methods on nonuniform domains.
  • On a 2D regression task with f(x,y)=sin(3πx)cos(3πy)f(x,y)=\sin(3\pi \sqrt{x})\cos(3\pi y) and k=100k=100 anisotropic splats, models achieve an order of magnitude lower error than comparably sized multilayer perceptrons (MLPs) or Kolmogorov–Arnold networks by leveraging local orientational adaptation.
  • On physics-informed regression (e.g., Allen–Cahn equation interfaces on [0,1]2[0,1]^2), anisotropic splat models fit boundary layers and curved interfaces more accurately and with fewer parameters than isotropic radial basis function (RBF) methods or standard physics-informed neural networks (PINNs) (Daniels et al., 18 Nov 2025).

A plausible implication is that the learned anisotropic parameters confer model capacity that remains interpretable and resistant to over-parameterization in low dimensions.

7. Interpretability, Adaptivity, and Applications

Anisotropic heatmap regression via Splat Regression Models yields weighted sums of ellipsoidal bump functions, with learnable centers, amplitudes, and anisotropy matrices. WFR-gradient-based end-to-end learning preserves interpretability: each splat models a localized structure with explicit geometric meaning in Σi\Sigma_i. Visualization of splats as ellipses elucidates how the model aligns and adapts to structured regions of the data, especially in cases exhibiting elongated, curved, or otherwise anisotropic phenomena. This approach enables flexible and accurate solutions to diverse approximation, estimation, and inverse problems where local adaptivity and geometric structure are paramount (Daniels et al., 18 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Anisotropic Heatmap Regression.