Fast-Block-Select Algorithm
- The paper introduces a heuristic that rapidly identifies near-optimal shared block selections to maximize joint log-likelihood in multi-graph stochastic block models.
- It employs a greedy, injective block selection procedure to efficiently navigate exponential candidate spaces while circumventing NP-hard ILP formulations.
- Empirical results demonstrate dramatic runtime improvements—from over 8 hours with ILP to under one second—with ARI scores approaching near-optimal values in practical settings.
The Fast-Block-Select algorithm refers to a class of greedy block selection procedures that rapidly identify optimal or near-optimal block assignments in large-scale partitioning and inference problems. Most prominently, in the context of shared stochastic block modeling (SSBM) over multiple graphs, Fast-Block-Select provides a scalable heuristic for selecting shared blocks, circumventing the computational hardness of integer linear programming (ILP) formulations. This paradigm has emerged as a practical response to the NP-hardness and inapproximability of “shared block detection” in multi-graph SBMs (Kumpulainen et al., 2024). While the term may be occasionally overloaded in the literature—for instance, in network clustering or compressed bitmap implementations (Grabowski et al., 2016)—the defining hallmark is the rapid, injective, greedy construction of block assignments to optimize a statistical objective.
1. Algorithmic Objective and Formal Setup
In the SSBM setting, the goal is to select block vectors across input graphs, each partitioned into blocks, in order to maximize the joint likelihood: $\mathit{LLH} = \sum_{k=1}^n \log P(G_k\mid \Theta_k, \bfb_k)$ subject to injective mapping constraints between graphs and shared parameterization for the selected blocks. Each candidate spans the product space . Block-pair parameters are re-estimated in closed form for both private and shared assignments, yielding respective log-likelihood scores for private blocks and for shared blocks, based on empirical edge and non-edge counts . The objective is equivalently an NP-hard combinatorial maximization over valid -sized injective subsets (Kumpulainen et al., 2024).
2. Heuristic Procedure and Pseudocode
Fast-Block-Select employs a greedy, iterative construction of . At each step, the procedure scans the candidate set to find the vector whose addition induces maximal increase (or minimal drop) in log-likelihood, computed via incremental updates to and removal or merging of private contributions. Candidate vectors sharing any block index with vectors already in are eliminated to maintain injectivity. The process repeats until . The annotated pseudocode is as follows (Kumpulainen et al., 2024):
1 2 3 4 5 6 7 8 9 10 11 |
Procedure Fast-Block-Select(s, {G_k, b_k}_{k=1}^n):
Input: s, graphs G_k, block partitions b_k
Precompute all per-block and per-pair scores
Initialize S = ∅, T = all block-vectors in product space
while |S| < s:
for r in T:
Δ(r) = net increase in objective if r is appended to S
select r* with maximal Δ(r)
S ← S ∪ {r*}
remove from T any candidate sharing block indices with r*
return S |
3. Mathematical Structure and Underlying Scores
The algorithm’s efficiency and correctness depend on precise calculation of the relevant scores:
- For each graph , private block-pair contribution:
- For shared blocks :
The candidate pool grows exponentially with and , but in practice is small and moderate, allowing storage and precomputation of in compact tables. This enables real-time greedy selection in large graphs.
4. Complexity, Theoretical Properties, and Limitations
The total running time is for selections, with successive candidate pool reduction. The underlying optimization is NP-hard; Theorem 1 in (Kumpulainen et al., 2024) proves inapproximability to any constant factor. Fast-Block-Select therefore has no worst-case optimality guarantees—suboptimal block selection is theoretically possible, yet its empirical efficacy is validated for practical settings.
5. Empirical Results and Practical Utility
Experiments on synthetic benchmarks and real-world graphs (e.g., large Wikipedia link networks, , ) demonstrate that Fast-Block-Select’s runtime is orders of magnitude faster than exact ILP methods (under one second versus over 8 hours), while typically approaching near-optimal assignment quality as measured by Adjusted Rand Index (ARI: 0.75–0.90 versus 0.95–1.0 for ILP). For synthetic “planted” SBMs, greedy selection consistently outperforms random assignment (ARI 0.2–0.5) (Kumpulainen et al., 2024).
6. Illustrative Example
Consider two graphs (), each with two blocks. With , the algorithm computes the gain for each candidate shared block:
- Private scores: ,
- Shared score:
- Log-likelihood gain from merging and , The highest gain is chosen, conflicting candidates are removed, and selection continues for larger (Kumpulainen et al., 2024).
7. Related Algorithms and Interdisciplinary Context
While the Fast-Block-Select term is closely associated with SSBM selection, analogous “fast block select” procedures appear in hierarchical block model inference (Park et al., 2017), compressed bitmap search (Grabowski et al., 2016), and active set methods for regularized regression (Santis et al., 2014). In each, rapid identification and update of relevant blocks form a core computational strategy, though mathematical details and objectives differ. In SSBM, the concept is uniquely tied to maximizing multi-graph joint likelihood under block-sharing constraints.
Summary Table: Key Attributes of Fast-Block-Select
| Attribute | SSBM (shared SBM) | HSBM (hierarchical SBM) | Bitmap/block selection |
|---|---|---|---|
| Objective | Maximize joint log-likelihood with injective shared blocks | Maximize marginal likelihood, hierarchical assignment | Accelerate select/rank queries |
| Block selection mechanism | Greedy, injective vector selection | Dynamic programming + coordinate ascent | Precomputed block offsets |
| Complexity per step | popcount ops | ||
| Guarantee | None (heuristic); NP-hard | Local convergence, Bayes pruning | Pareto-optimal space-time tradeoff |
The Fast-Block-Select paradigm is a technically robust and practically validated solution for high-dimensional block assignment tasks in multi-graph and hierarchical models, featuring tractable computation on dense, large-scale real-world data and maintaining empirical proximity to theoretical optimums when exact solvers are computationally infeasible (Kumpulainen et al., 2024, Park et al., 2017, Grabowski et al., 2016, Santis et al., 2014).