Graph and Simplicial Complex Prediction Gaussian Process via the Hodgelet Representations

Published 16 May 2025 in cs.LG and cs.AI | (2505.10877v1)

Abstract: Predicting the labels of graph-structured data is crucial in scientific applications and is often achieved using graph neural networks (GNNs). However, when data is scarce, GNNs suffer from overfitting, leading to poor performance. Recently, Gaussian processes (GPs) with graph-level inputs have been proposed as an alternative. In this work, we extend the Gaussian process framework to simplicial complexes (SCs), enabling the handling of edge-level attributes and attributes supported on higher-order simplices. We further augment the resulting SC representations by considering their Hodge decompositions, allowing us to account for homological information, such as the number of holes, in the SC. We demonstrate that our framework enhances the predictions across various applications, paving the way for GPs to be more widely used for graph and SC-level predictions.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a novel Gaussian Process framework combining Hodgelet representations for topology-aware prediction on graphs and simplicial complexes.
It leverages wavelet transforms and Hodge decomposition to construct robust multi-scale features that outperform conventional GNNs in data-scarce scenarios.
Experimental results on synthetic and real datasets demonstrate superior performance in tasks such as vector field classification and molecular property prediction.

This paper, "Graph and Simplicial Complex Prediction Gaussian Process via the Hodgelet Representations" (2505.10877), introduces a novel approach for predicting labels on graph-structured data and its generalization, simplicial complexes (SCs), particularly addressing the challenge of scarce data where traditional Graph Neural Networks (GNNs) may struggle with overfitting. The core idea is to leverage the power of Gaussian Processes (GPs) combined with rich, topology-aware representations of the input structures, which the authors term "Hodgelet representations".

The paper motivates this approach by highlighting the limitations of existing methods: GNNs require large datasets and can be difficult to interpret, while many graph kernels struggle with scalability, high-dimensional attributes, or fixed-size inputs. GPs, conversely, are data-efficient, non-parametric, and provide uncertainty estimates, making them suitable for small-to-medium data regimes. The challenge for GPs is constructing a robust representation of the graph or SC input that captures both structural and attribute information.

The proposed solution extends the use of graph signal processing techniques, specifically the wavelet transform, to simplicial complexes and integrates it with the Hodge decomposition. A simplicial complex generalizes a graph by including higher-order interactions represented by $k$ -simplices (vertices are 0-simplices, edges are 1-simplices, triangles are 2-simplices, etc.). Attributed SCs have signals or attributes associated with simplices of different dimensions.

The methodology hinges on the Hodge Laplacian, a discrete operator on SCs that generalizes the graph Laplacian. The Hodge Laplacian $\mathbf{L}_k$ for $k$ -simplices is decomposed into a lower Laplacian $\mathbf{L}_k^{\text{low}} = \mathbf{B}_k^\top \mathbf{B}_k$ and an upper Laplacian $\mathbf{L}_k^{\text{up}} = \mathbf{B}_{k+1} \mathbf{B}_{k+1}^\top$ , where $\mathbf{B}_k$ are incidence matrices. This structure enables the Hodge decomposition of the attribute space $\mathbb{R}^{N_k}$ into three orthogonal subspaces: exact, co-exact, and harmonic. \begin{equation} \mathbb{R}^{N_k} = \mathrm{im}(\mathbf{B}^{\top}_{k}) \oplus \mathrm{im}(\mathbf{B}_{k+1}) \oplus \mathrm{ker}(\mathbf{L}_k) \end{equation} The exact subspace captures signals that are analogous to curl-free fields, the co-exact subspace captures divergence-free fields, and the harmonic subspace captures fields that are both curl-free and divergence-free. Crucially, the dimension of the harmonic subspace relates to the number of $k$ -dimensional "holes" in the complex (Betti numbers), providing topological insights.

The Hodgelet representation is constructed by applying a wavelet transform to the simplex attributes within each of these Hodge subspaces separately.

Compute the Hodge Laplacians $\mathbf{L}_k$ .
Perform the eigendecomposition of $\mathbf{L}_k$ , partitioning eigenvectors $\mathbf{U}_k$ and eigenvalues $\boldsymbol{\Lambda}_k$ according to the exact, co-exact, and harmonic subspaces: $\mathbf{U}_k = [\mathbf{U}_{ke} \,\, \mathbf{U}_{kc} \,\, \mathbf{U}_{kh}]$ and $\boldsymbol{\Lambda}_k = \mathrm{diag}(\boldsymbol{\Lambda}_{ke}, \boldsymbol{\Lambda}_{kc}, \boldsymbol{\Lambda}_{kh})$ .
Define wavelet filters $w_{k\bullet}(\lambda)$ (where $\bullet$ is e, c, or h) based on scaling and wavelet functions and learnable parameters $P_{k\bullet}$ .
Apply the filtered operators to the attribute vector $\mathbf{x}_k$ : $\hat{\mathbf{x}}_{ke} = \mathbf{U}_{ke} w_{ke}(\boldsymbol{\Lambda}_{ke}) \mathbf{U}_{ke}^\top \mathbf{x}_k$ , and similarly for $\hat{\mathbf{x}}_{kc}, \hat{\mathbf{x}}_{kh}$ . These are the Hodgelet coefficients.
Pool or aggregate the coefficients to obtain permutation-invariant scalar values. The paper uses the squared 2-norm ( $A_{k\bullet} = \|\hat{\mathbf{x}}_{k\bullet}\|^2$ ) as a simple example, but suggests others like sum, min, max, or weighted combinations.
Concatenate the aggregated values across multiple wavelet filters (to capture information at different scales) to form the final Hodgelet representation vectors $\mathbf{r}_{ke}, \mathbf{r}_{kc}, \mathbf{r}_{kh}$ for each simplex dimension $k$ and Hodge component. For multi-dimensional attributes, this process is applied to each dimension independently and the results are stacked.

The representation for an entire attributed SC is then formed by combining the Hodgelet representations for different simplex dimensions ( $k$ ) and Hodge components. These combined representations are used as inputs to a standard GP. The GP kernel $\kappa(S, S')$ between two SCs $S$ and $S'$ is defined as an additive or product kernel over the simplex dimensions and within each dimension, additively over the Hodge components: \begin{align*} \kappa_{\mathrm{add}}(S, S') &= \sum_{k=1}^{K} \Big(\kappa_{ke}(\mathbf{r}{ke}, \mathbf{r}{ke}') + \kappa_{kc}(\mathbf{r}{kc}, \mathbf{r}{kc}') + \kappa_{kh}(\mathbf{r}{kh}, \mathbf{r}{kh}')\Big) \ \kappa_{\mathrm{prod}}(S, S') &= \prod_{k=1}^{K} \Big(\kappa_{ke}(\mathbf{r}{ke}, \mathbf{r}{ke}') + \kappa_{kc}(\mathbf{r}{kc}, \mathbf{r}{kc}') + \kappa_{kh}(\mathbf{r}{kh}, \mathbf{r}{kh}')\Big)\end{align*} where $\kappa_{k\bullet}$ are standard Euclidean kernels like RBF. The hyperparameters of these kernels, along with the wavelet parameters $P_{k\bullet}$ , are jointly optimized by maximizing the GP's marginal likelihood (or ELBO for classification).

Practical Implementation Details:

Simplicial Complexes: Represented by vertex, edge, and triangle sets ( $V, E, T$ for a 2-complex) and incidence matrices $\mathbf{B}_k$ that capture connections between $k-1$ and $k$ simplices. Orientation is needed for incidence matrices but the final representations are isomorphism-invariant due to pooling.
Hodge Laplacians & Eigendecomposition: This is the most computationally intensive step, scaling cubically with the number of $k$ -simplices ( $N_k$ ). For graphs ( $k=1$ ), $N_1$ is the number of edges. For 2-complexes, $N_2$ is the number of triangles. For small-to-medium SCs, this is tractable. For larger structures, approximation methods (e.g., polynomial approximations) might be necessary. The eigendecomposition is a one-time cost per SC.
Wavelet Filters: Defined by scaling and wavelet functions (e.g., inspired by polynomial approximations of filters). The parameters (e.g., dilation/scaling factors $\alpha_k, \beta_{kl}$ ) are learned during training. Multiple filters capture different frequency bands.
Aggregation: The choice of aggregation function (energy/squared 2-norm, sum, min, max) is a design decision and can impact performance depending on whether the task requires capturing magnitude, total value, or extreme values of the Hodgelet coefficients. For instance, the sum might be better if the task depends on the overall 'flow' or parity of the signal.
Gaussian Process: Standard GP implementations (like GPytorch) can be used. The kernel needs to be implemented based on the proposed additive/product structure combining standard Euclidean kernels applied to the pooled Hodgelet representations. Variational inference (e.g., using inducing points) is recommended for scalability in the number of training examples (SCs), although the paper mentions training costs are cubic in the number of SCs, which is typical for GPs without inducing points. The paper uses variational GPs for classification tasks.
Hyperparameter Tuning: A key advantage is the joint optimization of GP kernel hyperparameters and wavelet parameters via marginal likelihood/ELBO maximization, providing a principled approach compared to validation-based tuning for GNNs.

Real-World Applications and Experimental Results:

The paper demonstrates the framework's effectiveness across diverse tasks and datasets:

Vector Field Classification (Synthetic):
- Application: Analyzing vector fields discretized onto triangular meshes (simplicial 2-complexes) with edge attributes.
- Tasks: Classifying fields as predominantly divergence-free vs. curl-free (related to exact/co-exact decomposition) and classifying vortex flow direction (positive/negative circulation).
- Findings: HTGP models (using Hodge decomposition) consistently outperform WTGP (without decomposition) and models that convert edge attributes to vertex attributes on line graphs. This highlights the importance of preserving edge structure and leveraging the Hodge decomposition for tasks related to topological properties captured by the decomposition.
TUDataset and MoleculeNet (Real-world):
- Application: Molecular/biological graph prediction (e.g., predicting mutagenicity, AIDS antiviral screening, solubility).
- Tasks: Binary classification (MUTAG, AIDS) and regression (FreeSolv, ESOL). These datasets have vertex and often edge attributes.
- Findings: Models incorporating edge attributes (WTGP/HTGP hybrid and edge-only) generally perform better than vertex-only models and are competitive with or outperform strong GNN baselines, especially on these small-to-medium datasets. HTGP (using Hodge decomposition) often outperforms WTGP, showing the value of this richer representation.
PowerGraph (Real-world):
- Application: Predicting stability in electrical power grids represented as attributed graphs with vertex and edge attributes.
- Tasks: Binary classification, multi-class classification, and regression related to grid stability. Data is downsampled to fit the small data regime.
- Findings: HTGP models (hybrid and edge-only) achieve the best performance across all three tasks, significantly surpassing GNN baselines. This reinforces the benefits of combining edge information with the Hodge decomposition for real-world network prediction tasks, particularly in data-scarce settings.

Practical Implications:

The paper provides a practical framework for implementing GP models for graph and SC-level prediction, especially valuable when large labeled datasets are not available. It shows how to translate theoretical concepts from topological data analysis (Hodge decomposition) and graph signal processing (wavelets) into concrete feature engineering steps for machine learning models. The ability to handle multi-dimensional vertex and edge attributes, varying graph/SC sizes, and provide interpretable results makes it a strong alternative or complement to GNNs in specific application contexts.

The computational cost of eigendecomposition is a key consideration; careful profiling and potential use of approximation methods will be necessary for large-scale applications. The choice of aggregation function and kernel hyperparameters requires empirical tuning but can be done in a principled way through GP optimization.

In summary, the Hodgelet representation combined with Gaussian Processes offers a powerful, data-efficient, and topology-aware method for prediction on graphs and simplicial complexes with attributes, demonstrating superior performance over GNNs and simpler GP approaches on various tasks, particularly in data-scarce environments.

Markdown Report Issue