One-Side Edge Sampling (OES) Overview

Updated 18 January 2026

OES is a family of methods that enables reconstruction of signals or uniform sampling of graph edges using one-sided data, achieved through oversampling conditions and predictive frameworks.
It extends classical sampling theory by guaranteeing exact recovery below the Nyquist rate, leveraging Hardy space and outer function concepts.
OES adapts to sublinear-time graph algorithms and GNN training, offering efficient edge sampling and regularization that reduces computational overhead and mitigates over-smoothing.

One-Side Edge Sampling (OES) denotes a family of sampling and signal reconstruction techniques in which complete or probabilistically uniform inference is achieved using only an asymmetric, semi-infinite, or confidence-directed subset of data samples or graph edges. OES is instantiated in diverse contexts, including classical signal processing, sublinear-time algorithms for edge sampling in large graphs, and graph neural network (GNN) training regimes. Central concepts include leveraging oversampling, prediction theory, or edge confidence heterogeneity to achieve exact or near-uniform recovery or estimation while operating on only one “side” of an initially symmetric or two-sided structure.

1. OES in Classical Sampling Theory

OES arises in the context of the Nyquist–Shannon–Kotelnikov sampling theorem for band-limited signals. The classical result states that a function $f \in L^2(\mathbb{R})$ with Fourier support in $[-\Omega, \Omega]$ can be reconstructed exactly from its two-sided, uniformly spaced samples $\{f(nT)\}_{n\in\mathbb{Z}}$ for any $T \leq \pi/\Omega$ .

OES (Dokuchaev, 2016) (Dokuchaev, 2016) asserts that for any sampling period $0 < T < \pi/\Omega$ , the semi-infinite one-sided sample sequence $\{f(kT)\}_{k\geq 0}$ uniquely determines $f$ . The key oversampling condition ( $T < \pi/\Omega$ ) is essential: at critical rate, uniqueness fails.

The explicit reconstruction is

$f(t) = \sum_{k=0}^{\infty} f(kT)\, h(t-kT)$

where $h$ is a causal kernel constructed via outer function theory and inverse Fourier transform: $h(t)=\frac{1}{2\pi} \int_{-\Omega}^{\Omega} H(\omega)\,e^{i\omega t} d\omega,\quad H(\omega)=\frac{1}{G(e^{i\omega T})}$ with $G(e^{i \omega T}) = T^{-1}F\big(\frac{\omega}{T}\big)$ , and $F$ the Fourier transform of $f$ .

This result exploits Hardy space theory—specifically, the Szegő–Kolmogorov predictability theorem—which ensures future coefficients determine the analytic function in $H^2(|z|>1)$ , so missing ("negative index") samples can be recovered as linear combinations of the one-sided tail. Oversampling opens a "spectral hole," enabling stable one-sided prediction.

2. Sublinear-Time One-Side Edge Sampling in Large Graphs

A core application of OES is in sublinear-time algorithmics for edge-uniform sampling in massive graphs (Tětek, 2020, Tětek et al., 2021). The objective is to output a (multi-)graph edge $e \in E$ chosen exactly or approximately uniformly at random, using only local access to vertex degrees and neighborhoods—denying direct uniform edge queries.

The OES approach works as follows (paraphrased pseudocode from (Tětek, 2020, Tětek et al., 2021)):

Subroutine Sampling_Attempt(k):
    u0 = RandomVertex()
    if Degree(u0) > θ: return FAIL
    j = Uniform(1, θ)
    u1 = Neighbor(u0, j)
    if u1 is null: return FAIL
    for i in 2..k:
        if Degree(u_{i-1}) ≤ θ: return FAIL
        r = Uniform(1, Degree(u_{i-1}))
        u_i = Neighbor(u_{i-1}, r)
    return (u_{k-1}, u_k)

The main routine repeatedly chooses

k

uniformly from

\{1, ..., \ell\}

, attempts a sample, and repeats until success.

Key parameters:

"Light" vertex: degree $\leq \theta = \lceil \sqrt{2m} \rceil$
"Heavy" vertex: degree $> \theta$
$\ell = \lceil \log_2 (1/\epsilon) \rceil + 1$ , target accuracy parameter

This walk combines direct sampling of light-edges ( $k=1$ ) and a one-sided "walk" among heavy vertices for $k ≥ 2$ , such that—via rejection sampling and coupling arguments—each edge is sampled $\epsilon$ -close to uniform in total variation. The expected number of probes is $O(n \log(1/\epsilon) / \sqrt{m})$ .

Exact uniform sampling is achieved by combining this approximate routine with a low-probability correction path, obtaining $O(n \log n / \sqrt{m})$ expected time (Tětek, 2020).

3. OES under Varied Graph Neighborhood Access Models

OES algorithms are adapted to three graph access archetypes (Tětek et al., 2021):

Indexed neighbor access: Each neighbor or degree is accessed via an index, corresponding to standard graph APIs.
Full neighborhood access: The entire adjacency list is retrievable per query, aligning with some web APIs.
Hash‐ordered neighbor access: A global hash function determines neighbor access sequence, enabling constant-time “peek-next” operations and efficient subgraph exploration even at large scale.

This generalization enables efficient OES implementation in external memory, in distributed environments, and under practical API constraints (e.g., Twitter or Wikipedia limits).

Preprocessing (when feasible) sorts adjacency lists into hash order or maintains per-vertex BSTs keyed by hash. Sampling requires only $O(1)$ neighbor-accesses per successful sample attempt on average.

4. OES as Edge Dropout in Graph Neural Networks

A distinct strand of OES is confidence-based edge dropout for regularizing GNN training (Trieu, 11 Jan 2026). In this context, OES denotes a process where, during each training epoch (for up to $n$ epochs), edges with high model classification confidence (and predicted correctly) are retained, while the remainder are randomly sub-sampled at a user-determined ratio. The intended effects are:

Reduced per-epoch computational cost (as the number of active edges is decreased).
Dampened over-smoothing: Less dense subgraphs decrease the rate at which repeated GNN convolutions force node representations toward homogeneity.
Mitigated over-fitting: The random sampling introduces input variability analogous to data augmentation.

Mathematically, for edge $e_{u,v}$ , define

$c(e_{u,v}) = \max\bigl(y_{\mathrm{pos}}(u,v),\,y_{\mathrm{neg}}(u,v)\bigr)$

where the $y$ 's are model logits for the two classes. The top- $p$ percentile of these confidence scores on correctly predicted edges is identified, and a fraction $r$ is uniformly retained: $E' = \{e_{u,v} \in E : c(e_{u,v}) \ge P_p \wedge \hat y_{u,v} = y_{u,v}\}$ with $|E_{\mathrm{OES}}| = \lfloor r|E'| \rfloor$ , and the training proceeds on $(V, E_{\mathrm{OES}})$ (Trieu, 11 Jan 2026).

Empirical results show F1-score gains (up to 8.5 points on GIN+EU) and 8–20% training time reduction on financial fraud datasets.

Theoretical justifications invoke spectral arguments: dropping edges increases effective resistance (thus decreasing the second eigenvalue $\lambda_2$ ) and raises the over-smoothing depth threshold.

5. Theoretical Foundations and Proof Sketches

In the classical signal processing setting, OES is enabled by over-sampling ( $T < \pi/\Omega$ ), which ensures the spectral support does not fill a half-circle and the corresponding Z-transform is outer in the Hardy space $H^2(|z|>1)$ . The deterministic Szegő–Kolmogorov theorem implies predictability from the one-sided sequence.

In graph sampling, correctness hinges on controlling the bias between light and heavy edges using a mixture of "flat" and "one-sided" walks, with the bias decaying exponentially with walk-length ( $O(\log(1/\epsilon))$ is sufficient for $\epsilon$ -closeness). Pointwise deviation in output probabilities is bounded by $\epsilon/m$ (Tětek et al., 2021, Tětek, 2020).

For GNNs, spectral bounds connect the edge set structure to the rate of over-smoothing and subspace collapse, with OES-altered graphs provably improving these quantities.

6. Practical Implications and Implementation Considerations

In signal reconstruction, OES eliminates the need to buffer symmetric, two-sided sample windows: a one-sided tail suffices, reducing storage and latency, with numeric implementation via rational approximation or spectral factorization (Dokuchaev, 2016).

For graph analysis, OES methodologies are particularly relevant at web scale or for on-disk graphs, where full-graph access is infeasible. The sampled walks adapt to degree heterogeneity and can utilize API-efficient neighbor listing (Tětek et al., 2021).

In GNN training, OES requires only confidence and label information per-edge at each epoch, with drop/sampling hyperparameters ( $p, r, n$ ) easily tuned. Ablation shows excessive reduction (low $p$ ), or hyper-restrictive sampling ( $r \downarrow 0.01$ ), degrades accuracy, suggesting that balance is crucial (Trieu, 11 Jan 2026).

7. Variants and Extensions

OES extends the Nyquist–Shannon paradigm to one-sided, streaming, and predictive applications by exploiting oversampling. In algorithmic graph theory, it bridges the gap between models allowing direct uniform edge queries and those limited to vertex-neighborhood access, achieving optimal or near-optimal sublinear time.

A plausible implication is that OES concepts may inform further developments in streaming algorithms, distributed sampling, or kernel-based reconstruction, especially where only temporally or structurally asymmetric data subsamples are available.

References

Dokuchaev, N. "On Nyquist-Shannon Theorem with one-sided half of sampling sequence" (Dokuchaev, 2016)
Trieu, D. "Graph Neural Network with One-side Edge Sampling for Fraud Detection" (Trieu, 11 Jan 2026)
Dell, H., Lapinskas, J. "Sampling an Edge Uniformly in Sublinear Time" (Tětek, 2020)
H{\"u}ffner, F., et al., "Edge Sampling and Graph Parameter Estimation via Vertex Neighborhood Accesses" (Tětek et al., 2021)