Papers
Topics
Authors
Recent
Search
2000 character limit reached

Groupby Neighbors Random Walk (GNRW)

Updated 23 January 2026
  • The paper introduces GNRW, a higher-order Markov chain sampling method that leverages transition history to stratify neighbor groups and reduce query counts.
  • It preserves the stationary distribution of simple random walks while achieving lower asymptotic variance and faster convergence.
  • Empirical results on social networks demonstrate a 30–50% reduction in estimation error compared to baseline methods, validating its efficiency improvements.

Groupby Neighbors Random Walk (GNRW) is a higher-order Markov chain-based sampling method designed to improve the efficiency of random walk-based analytics over large online social networks, where the available query primitives typically only expose node neighbor queries. Unlike the baseline simple random walk (SRW), which suffers from slow mixing and high estimator variance due to its memoryless selection of neighbors, GNRW leverages the walk’s transition history to induce systematic stratification over neighbor groups, thereby reducing the number of queries required to achieve a specified estimation accuracy. The method achieves this efficiency gain without altering the stationary distribution of the walk, providing a statistically valid “drop-in” alternative to SRW for network sampling tasks (Zhou et al., 2015).

1. Formal Construction and Transition Dynamics

Let G=(V,E)G=(V,E) represent an undirected graph. The GNRW defined on GG is a Markov chain whose current state at step nn is XnVX_n \in V, with history-sensitive structures assigned to every directed edge (uv)(u \to v).

Given a fixed grouping function gg and node vVv \in V, partition the neighborhood N(v)N(v) as g(N(v))={S1,S2,...,Sm}g(N(v)) = \{S_1, S_2, ..., S_m\}, where each SiS_i is disjoint and iSi=N(v)\bigcup_i S_i = N(v). These groups are typically defined by a measure attribute (e.g., degree, attribute values).

For each directed pair (u,v)(u,v), maintain:

  • S(u,v)S(u,v): the set of neighbor groups from g(N(v))g(N(v)) already chosen following prior uvu \to v transitions.
  • For every SiS_i, a set bSi(u,v)b_{S_i}(u,v) of neighbors within SiS_i visited after uvu \to v (both structures are sampling "without replacement").

The transition at step nn—from u=Xn2u = X_{n-2}, v=Xn1v = X_{n-1}—selects wN(v)w \in N(v) by:

  1. Identifying remaining eligible groups CS={SiS(u,v)}CS = \{ S_i \notin S(u,v) \}.
  2. If CSCS \neq \emptyset, select SCSS^* \in CS with probability:

P{S=Siuv}=SiN(v)SjS(u,v)Sj\mathbb{P}\{S^* = S_i \mid u \to v\} = \frac{|S_i|}{|N(v)| - \sum_{S_j \in S(u,v)}|S_j|}

  1. Within SS^*, let U=SbS(u,v)U = S^* \setminus b_{S^*}(u,v). If UU \neq \emptyset, pick ww uniformly from UU; otherwise, reset bS(u,v)=b_{S^*}(u,v) = \emptyset and sample ww uniformly from SS^*. Update bS(u,v)b_{S^*}(u,v) and S(u,v)S(u,v) accordingly.

The resulting two-level stratification--first over groups, then within groups--defines the transition kernel:

P[Xn=wXn2=u,Xn1=v]=SiN(v)SjS(u,v)Sj×1SibSi(u,v)P[X_n = w | X_{n-2}=u, X_{n-1}=v] = \frac{|S_i|}{|N(v)| - \sum_{S_j \in S(u,v)} |S_j|} \times \frac{1}{|S_i| - |b_{S_i}(u,v)|}

where wSiw \in S_i.

Both S(u,v)S(u,v) and each bSi(u,v)b_{S_i}(u,v) reset when exhausted.

2. Implementation: Pseudocode

The GNRW sampler is initialized with arbitrary x0,x1x_0, x_1 and a grouping function gg:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Input: x, x (starting nodes); grouping function g; N (sample_size)

Data structures:
    For each directed edge (u,v):
        S(u,v)           # groups used so far
        For each Sᵢg(N(v)):
            bᵢ(u,v)      # neighbors chosen so far from Sᵢ

for i = 2  N do
    u  x_{i-2};  v  x_{i-1}
    {S,,S_m}  g(N(v))
    CS  { Sᵢ  {SS_m} : Sᵢ  S(u,v) }
    if CS == :
        S(u,v)  
        CS  {S,,S_m}
    # pick group S* ∈ CS
    total_size  sum(|Sᵢ| for Sᵢ in CS)
    choose S* in CS with probability |S*| / total_size
    # within S*, pick neighbor
    U  S* \ bₖ(u,v)
    if U  :
        w  Uniform(U)
        bₖ(u,v)  bₖ(u,v)  {w}
        S(u,v)   S(u,v)  {S*}
    else:
        w  Uniform(S*)
        bₖ(u,v)          # reset this group’s memory
        S(u,v)   S(u,v)  {S*}
    x_i  w
Output: {x,,x_N}

3. Stationarity, Asymptotic Variance, and Statistical Properties

GNRW provably preserves the stationary distribution of the SRW: π(v)=deg(v)/(2E)\pi(v) = \deg(v) / (2|E|). This is established via path block analysis: every return to a directed edge uvu\to v subdivides the walk into blocks. Under GNRW, all blocks associated with outgoing neighbors of vv are selected in a stratified "without replacement" fashion both across groups and within each group. Over the long run, every neighbor jN(v)j \in N(v) is visited with frequency $1/|N(v)|$, so the marginal frequency at any node vv matches SRW.

For any bounded f:VRf:V \to \mathbb{R}, the empirical mean μ^n=(1/n)t=1nf(Xt)\hat{\mu}_n = (1/n) \sum_{t=1}^n f(X_t) under GNRW achieves asymptotic variance VGNRW(μ^)VSRW(μ^)V_\infty^{GNRW}(\hat{\mu}) \leq V_\infty^{SRW}(\hat{\mu}), due to stratified block sampling [Neal 2004]. This guarantees that the estimator is at least as efficient as that from SRW and strictly improves for attributes aligned with the chosen grouping.

4. Computational and Query-Efficiency Analysis

GNRW issues one "get neighbors" API query per transition, identical to SRW and related methods such as NB-SRW and CNRW. The additional bookkeeping—principally, maintaining S(u,v) and the bSi(u,v)b_{S_i}(u,v) per directed edge (u,v)(u,v)—can be implemented using two hash-maps with amortized O(1)O(1) access and update, giving total space and time overhead O(K)O(K) after KK steps.

Crucially, since GNRW reduces both burn-in and estimator variance, the number of transitions (i.e., queries) required for a target estimation bias ϵ\epsilon is strictly lower than or equal to SRW. Both “burn-in” to stationary and sample count for the desired estimator confidence are reduced in practice by GNRW-induced stratification (Zhou et al., 2015).

5. Empirical Performance and Results

Experimental comparisons on real and synthetic datasets demonstrate the empirical gains of GNRW over SRW, NB-SRW, and CNRW. Relative estimation errors after a fixed number of queries are consistently smallest for GNRW:

Dataset / Task SRW NB‐SRW CNRW GNRW
Google Plus (avg. degree @ 500 q.) 0.085 0.080 0.058 0.048
Yelp (avg. degree @ 300 q.) 0.33 0.29 0.26 0.20
Yelp (avg. review count @ 300 q.) 0.38 0.34 0.31 0.23
Facebook subgraph (KL div. @ 400 q.) 0.18 0.16 0.11 0.08

For synthetic barbell graphs, estimation error reductions of 30–50% are observed for GNRW versus SRW and NB‐SRW. In all tests:

  • GNRW < CNRW ≤ NB‐SRW < SRW <<< MHRW (MH Random Walk).
  • The maximum improvement is achieved when the grouping attribute matches the estimation target attribute (e.g., degree for average degree estimation).

6. Impact of Grouping Function and Practical Considerations

The grouping function gg critically determines stratification depth and estimator efficiency. When m=N(v)m = |N(v)| (i.e., each group is a singleton), GNRW reduces to CNRW. Grouping by relevant node attribute—such as degree (“GNRW‐by‐Degree”) for degree-based analytics or review count for review-based estimation—substantially improves mixing across targeted features and thus accelerates convergence. Randomized groupings (e.g., via hashing) offer limited improvements.

Group counts (mm) that are too large (over-stratification) lead to unnecessary overhead; too small and stratification loses benefits. Empirically, values m[5,20]m \in [5, 20] work well for social networks with typical node degrees in the range $50$–$200$.

A plausible implication is that for practical deployment, aligning groupings with the primary statistics of interest yields the greatest sampling efficiency (Zhou et al., 2015).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Groupby Neighbors Random Walk (GNRW).