Global Neighbor Sampling with Caching

Updated 11 January 2026

The paper introduces a global caching mechanism that reuses neighbor samples to accelerate computations in both Monte Carlo PDE solvers and graph neural network training.
It employs efficient cache construction and kernel-based reweighting strategies to minimize redundant computations and reduce estimator variance.
It demonstrates significant practical improvements, achieving up to 2×–14× speedups while maintaining accuracy in complex simulation and training tasks.

Global neighbor sampling with caching is a family of algorithms designed to accelerate sampling-based computations in large-scale problems, ranging from stochastic PDE solvers to distributed and mixed CPU–GPU training of graph neural networks (GNNs). These approaches use a global cache of samples or node data to reduce redundant computation, minimize data movement, or decrease estimator variance, in contrast to purely local or pointwise methods.

1. Fundamental Concepts and Definitions

Global neighbor sampling with caching operates on the core principle of constructing a reusable global set of samples (or node data) that efficiently supports repeated evaluation or learning queries across a domain or a graph. In the canonical setting of solving Laplace’s equation via stochastic representations on domains $\Omega \subset \mathbb{R}^n$ , the method builds a cache $C$ of spatial “centers” coupled with stored walk data. Each walk samples Brownian motion paths outward from centers, storing both first-exit and boundary-hit points, enabling flexible reuse through kernel-based reweighting to estimate the solution at nearby locations. In large-scale GNN training (e.g., SALIENT++ and GNS), the cache consists of selected high-utility node features, which reside in GPU memory and support repeated, importance-weight-corrected neighbor sampling for minibatch computation (Czekanski et al., 2024, Kaler et al., 2023, Dong et al., 2021).

2. Formal Problem Statement and Theoretical Basis

In the Monte Carlo Laplace setting, the problem is to approximate $u:\Omega \to \mathbb{R}$ solving

$\Delta u(x) = 0, \quad x \in \Omega; \quad u(x) = f(x), \quad x \in \partial\Omega$

with $u(x) = \mathbb{E}_x[f(B_{\tau})]$ where $B_t$ is Brownian motion and $\tau$ the exit time from $\Omega$ (Theorem 2.1 in (Czekanski et al., 2024)). The global neighbor sampling approach constructs a grid-aligned covering of centers $C$ and, for each $c\in C$ , stores $N$ independent Walk-on-Spheres (WOS) trajectories, noted as $(X_{c,j}^1, X_{c,j}^*)$ , where $X_{c,j}^1$ is the first-exit from sphere $S(c, r_c)$ and $X_{c,j}^*$ the eventual boundary hit. At query, for any $x$ sufficiently deep in $\Omega$ , the estimator is

$\hat u(x) = \frac{ \sum_{c \in N(x)} \sum_{j=1}^N w_{c,j}(x) f(X_{c,j}^*) }{ \sum_{c \in N(x)} \sum_{j=1}^N w_{c,j}(x) }$

with local reweighting $w_{c,j}(x) = r_c^{-(n+1)} k_c(x, X_{c,j}^1)$ , where $k_c$ is the Poisson kernel on the sphere $\partial S(c, r_c)$ .

In GNN training, the analogous problem is to minimize the data movement and memory footprint in mini-batched, multi-hop neighbor sampling. Vertex-wise inclusion probabilities (VIPs) are computed for each vertex $u$ as the probability it appears in a $K$ -hop sample rooted at a particular partition. The static caching policy selects the highest-VIP nodes for replication on each worker, enabling efficient local and remote sampling (Kaler et al., 2023, Dong et al., 2021).

3. Algorithmic Structure and Pseudocode

Monte Carlo PDE (Laplace) Global Neighbor Sampling

Cache Construction (Offline):

Cover $\Omega$ with a $\delta/2$ -spaced grid $L$ ; select $C = \{ c \in L : \text{dist}(c, \partial\Omega) \geq \delta/2 \}$ .
For each $c\in C$ , run $N$ WOS walks to generate and cache $(X_{c,j}^1, X_{c,j}^*)$ .

Query (Online):

For query $x$ , find $N(x) = \{ c \in C : \|x-c\| < r_c \}$ via spatial indexing.
Aggregate cached walk outcomes with Poisson-kernel reweighting to form the estimator $\hat u(x)$ .

Distributed/Mixed CPU–GPU GNN Training

Cache Construction:

Compute per-node sampling probabilities (degree-based or via propagation of initial distribution from training nodes).
Sample a global cache $\mathcal{C}$ of nodes to reside in GPU memory, proportional to their utility.
For each mini-batch, perform in-GPU neighbor sampling, preferring cached nodes and applying importance correction where needed (Dong et al., 2021).

SALIENT++ / VIP-Based Distributed Caching:

Given graph partitions and mini-batch scheme, compute the exact VIPs for each remote vertex.
Statistically rank and cache the highest-VIP candidates up to the allowed replication factor ( $\alpha$ ).
The cache remains static per epoch (or several epochs). Queries to uncached nodes are rare and handled asynchronously, overlapping communication with computation.

Pseudocode snippets outlining these phases are provided in the original sources (Czekanski et al., 2024, Kaler et al., 2023, Dong et al., 2021).

4. Data Structures, Sampling Strategies, and Complexity

The cache for Laplace’s equation consists of tuples $(c, r_c, \{ (X_{c,j}^1, X_{c,j}^*) \})$ , indexed spatially (e.g., kd-tree, uniform grid with cell size $\sim\delta/2$ ), supporting $O(\log|C| + p)$ neighbor retrieval per query, with $p = |N(x)|$ neighbors within radius $r_c$ . In GNN methods, the GPU cache includes a dense feature tensor indexed by position, adjacency lists of cached neighbors for each node, and mapping tables for gather/scatter during neighbor aggregation (Dong et al., 2021).

Neighbor selection strategies include:

Equal-weight scheme: restricts the neighbor set to guarantee variance bounds (e.g., $\|x-c\| \leq c_0 r_c$ ).
Inverse-variance weighting: expands the set to all $\|x-c\| < r_c$ with weights optimizing variance (Czekanski et al., 2024).
VIP-based selection (GNN): statically weights and ranks cache candidates by their expected inclusion probability, minimizing communication (Kaler et al., 2023).

Offline cost for cache construction is $|C| N O(\log(1/\epsilon))$ (Laplace, with $\epsilon$ the required boundary tolerance), while online query is $O(\log|C| + p N)$ per evaluation, yielding end-to-end $O(N \delta^{-n} \log(1/\epsilon) + m N)$ total cost for $m$ queries (Czekanski et al., 2024). For GNNs, cache size is often $<1\%$ of $|\mathcal{V}|$ , and communication reduction is proportional to the cache hit rate (Dong et al., 2021).

5. Variance Reduction and Statistical Guarantees

In the Laplace setting, the variance of the reused-walk estimator via global neighbor sampling is provably reduced: for $p$ effective neighbor centers, variance drops by a factor of $p$ compared to pointwise estimation, i.e., $\operatorname{Var}(\hat u) \leq M^2/(N p)$ (Theorem 4.3, (Czekanski et al., 2024)), where $M$ bounds the range of $f$ . Lemma 4.2 quantifies the variance–neighbor radius tradeoff, while cache covering and neighbor selection determine the effective $p$ for variance stacking.

For GNNs, theoretical analysis yields that as soon as the cache size and fan-out meet $\widetilde C\,C_d\,k_1\,k_2 \gg 1$ (with notation as in (Dong et al., 2021)), the mean-squared-error of gradients under cached sampling matches the order of that from the full node-wise sampler. Convergence rate of stochastic gradient descent is thus preserved for sufficiently sized cache.

6. Empirical and Practical Assessment

Empirical evaluations demonstrate the variance and error reduction in Laplace-PDE applications: in $[-1,1]^2$ domains, equal-weighted or variance-weighted estimators show $2\times$ – $3\times$ lower $L_2$ error over original WOS under fixed budget, and error remains stable under growth in evaluation points when reuse is employed (Czekanski et al., 2024).

In large-scale GNN settings, global neighbor sampling with caching yields significant speedups (up to $2\times – 4\times$ vs node-wise sampling; $2\times – 14\times$ vs layer-based sampling such as LADIES), with negligible or no loss in model accuracy. Cache hit rates above $90\,\%$ are typical with $0.3$ replication fractions, and communication overhead becomes negligible due to pipelined overlap (Kaler et al., 2023, Dong et al., 2021). Table-based results in the primary references summarize epoch times, F1 scores, and the dependency on cache size across several publicly available datasets.

Application	Method (Cache)	Speedup	Accuracy Impact
Laplace equation	Global Neighbor (WOS)	$2\times$ – $3\times$ lower error	Stable/Less Variance
GNN (Products)	GNS, SALIENT++	$2\times$ – $14\times$	$>98\%$ of baseline F1

7. Trade-offs and Tuning Considerations

Key tuning parameters for successful deployment are cache size (e.g., spatial quantization parameter $\delta$ , or fraction $\alpha$ in distributed GNN training) and neighbor selection/radius (e.g., factor $c_0$ or use of the full kernel reweighting). Decreasing $\delta$ increases cache memory and offline cost, but allows more optimal near-boundary queries and higher variance reduction. Expanding the neighbor radius $r_c$ (or weight function) raises the effective number of walks $p$ participating in estimation and thus reduces variance, with diminishing returns if variance per walk increases. In distributed GNN contexts, replication fraction directly controls memory–bandwidth trade-off and is typically set to achieve $>90\%$ cache hit-rate.

By appropriate parameter selection, global neighbor sampling with caching transforms methods with otherwise superlinear or communication-bound scaling into computationally efficient, linear-in-query (or epoch) algorithms with substantial variance and bandwidth reduction (Czekanski et al., 2024, Kaler et al., 2023, Dong et al., 2021).

Markdown Report Issue Upgrade to Chat

References (3)

Walking on Spheres and Talking to Neighbors: Variance Reduction for Laplace's Equation (2024)

Communication-Efficient Graph Neural Networks with Probabilistic Neighborhood Expansion Analysis and Caching (2023)

Global Neighbor Sampling for Mixed CPU-GPU Training on Giant Graphs (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Global Neighbor Sampling with Caching.