Fast-Block-Select Algorithm

Updated 19 January 2026

The paper introduces a heuristic that rapidly identifies near-optimal shared block selections to maximize joint log-likelihood in multi-graph stochastic block models.
It employs a greedy, injective block selection procedure to efficiently navigate exponential candidate spaces while circumventing NP-hard ILP formulations.
Empirical results demonstrate dramatic runtime improvements—from over 8 hours with ILP to under one second—with ARI scores approaching near-optimal values in practical settings.

The Fast-Block-Select algorithm refers to a class of greedy block selection procedures that rapidly identify optimal or near-optimal block assignments in large-scale partitioning and inference problems. Most prominently, in the context of shared stochastic block modeling (SSBM) over multiple graphs, Fast-Block-Select provides a scalable heuristic for selecting $s$ shared blocks, circumventing the computational hardness of integer linear programming (ILP) formulations. This paradigm has emerged as a practical response to the NP-hardness and inapproximability of “shared block detection” in multi-graph SBMs (Kumpulainen et al., 2024). While the term may be occasionally overloaded in the literature—for instance, in network clustering or compressed bitmap implementations (Grabowski et al., 2016)—the defining hallmark is the rapid, injective, greedy construction of block assignments to optimize a statistical objective.

1. Algorithmic Objective and Formal Setup

In the SSBM setting, the goal is to select $s$ block vectors $S = \{r^{(1)}, ..., r^{(s)}\}$ across $n$ input graphs, each partitioned into $B_k$ blocks, in order to maximize the joint likelihood: $\mathit{LLH} = \sum_{k=1}^n \log P(G_k\mid \Theta_k, \bfb_k)$ subject to injective mapping constraints between graphs and shared parameterization for the selected blocks. Each candidate $r = (r_1, ..., r_n)$ spans the product space $\mathcal{T} = [B_1] \times \cdots \times [B_n]$ . Block-pair parameters are re-estimated in closed form for both private and shared assignments, yielding respective log-likelihood scores $U^k_{ij}$ for private blocks and $Q_{rt}$ for shared blocks, based on empirical edge and non-edge counts $C^k_{ij}, F^k_{ij}$ . The objective is equivalently an NP-hard combinatorial maximization over valid $s$ -sized injective subsets $S \subseteq \mathcal{T}$ (Kumpulainen et al., 2024).

2. Heuristic Procedure and Pseudocode

Fast-Block-Select employs a greedy, iterative construction of $S$ . At each step, the procedure scans the candidate set $T \subseteq \mathcal{T}$ to find the vector $r^*$ whose addition induces maximal increase (or minimal drop) in log-likelihood, computed via incremental updates to $Q_{r t}$ and removal or merging of private $U^k_{ij}$ contributions. Candidate vectors sharing any block index with vectors already in $S$ are eliminated to maintain injectivity. The process repeats until $|S| = s$ . The annotated pseudocode is as follows (Kumpulainen et al., 2024):

Procedure Fast-Block-Select(s, {G_k, b_k}_{k=1}^n):
    Input: s, graphs G_k, block partitions b_k
    Precompute all per-block and per-pair scores
    Initialize S = ∅, T = all block-vectors in product space
    while |S| < s:
        for r in T:
            Δ(r) = net increase in objective if r is appended to S
        select r* with maximal Δ(r)
        S ← S ∪ {r*}
        remove from T any candidate sharing block indices with r*
    return S

Complexity per iteration is dominated by candidate scans (

O(n|S|)

per candidate) and injective pruning (

O(n \sum_k (B_k - 1))

) (Kumpulainen et al., 2024). No guarantee on approximation ratio is provided; the procedure is entirely heuristic.

3. Mathematical Structure and Underlying Scores

The algorithm’s efficiency and correctness depend on precise calculation of the relevant scores:

For each graph $k$ , private block-pair contribution:

$U^k_{ij} = C^k_{ij} \log \theta^k_{ij} + F^k_{ij} \log(1-\theta^k_{ij}), \quad \theta^k_{ij} = \frac{C^k_{ij}}{C^k_{ij} + F^k_{ij}}$

For shared blocks $r, t$ :

$Q_{rt} = \sum_{k=1}^n \left[ C^k_{r_k t_k} \log \theta_{rt} + F^k_{r_k t_k} \log(1-\theta_{rt}) \right], \quad \theta_{rt} = \frac{\sum_k C^k_{r_k t_k}}{\sum_k (C^k_{r_k t_k} + F^k_{r_k t_k})}$

The candidate pool $\mathcal{T}$ grows exponentially with $n$ and $B_k$ , but in practice $n$ is small and $B_k$ moderate, allowing storage and precomputation of $C^k_{ij}, F^k_{ij}, U^k_{ij}, Q_{rt}$ in compact tables. This enables real-time greedy selection in large graphs.

4. Complexity, Theoretical Properties, and Limitations

The total running time is $O(n s^2 \prod_k B_k)$ for $s$ selections, with successive candidate pool reduction. The underlying optimization is NP-hard; Theorem 1 in (Kumpulainen et al., 2024) proves inapproximability to any constant factor. Fast-Block-Select therefore has no worst-case optimality guarantees—suboptimal block selection is theoretically possible, yet its empirical efficacy is validated for practical settings.

5. Empirical Results and Practical Utility

Experiments on synthetic benchmarks and real-world graphs (e.g., large Wikipedia link networks, $B_k=20$ , $s=2$ ) demonstrate that Fast-Block-Select’s runtime is orders of magnitude faster than exact ILP methods (under one second versus over 8 hours), while typically approaching near-optimal assignment quality as measured by Adjusted Rand Index (ARI: 0.75–0.90 versus 0.95–1.0 for ILP). For synthetic “planted” SBMs, greedy selection consistently outperforms random assignment (ARI 0.2–0.5) (Kumpulainen et al., 2024).

6. Illustrative Example

Consider two graphs ( $n=2$ ), each with two blocks. With $s=1$ , the algorithm computes the gain for each candidate shared block:

Private scores: $U^1_{AA} = -2.501$ , $U^2_{XX} = -2.616$
Shared score: $Q_{(A,X),(A,X)} = -3.767$
Log-likelihood gain from merging $A$ and $X$ , $-3.767 - (-5.117) = +1.350$ The highest gain is chosen, conflicting candidates are removed, and selection continues for larger $s$ (Kumpulainen et al., 2024).

While the Fast-Block-Select term is closely associated with SSBM selection, analogous “fast block select” procedures appear in hierarchical block model inference (Park et al., 2017), compressed bitmap search (Grabowski et al., 2016), and active set methods for $\ell_1$ regularized regression (Santis et al., 2014). In each, rapid identification and update of relevant blocks form a core computational strategy, though mathematical details and objectives differ. In SSBM, the concept is uniquely tied to maximizing multi-graph joint likelihood under block-sharing constraints.

Summary Table: Key Attributes of Fast-Block-Select

Attribute	SSBM (shared SBM)	HSBM (hierarchical SBM)	Bitmap/block selection
Objective	Maximize joint log-likelihood with $s$ injective shared blocks	Maximize marginal likelihood, hierarchical assignment	Accelerate select/rank queries
Block selection mechanism	Greedy, injective vector selection	Dynamic programming + coordinate ascent	Precomputed block offsets
Complexity per step	$O(n s^2 \prod_k B_k)$	$O(m \log K + nK)$	$O(\ell/64)$ popcount ops
Guarantee	None (heuristic); NP-hard	Local convergence, Bayes pruning	Pareto-optimal space-time tradeoff

The Fast-Block-Select paradigm is a technically robust and practically validated solution for high-dimensional block assignment tasks in multi-graph and hierarchical models, featuring tractable computation on dense, large-scale real-world data and maintaining empirical proximity to theoretical optimums when exact solvers are computationally infeasible (Kumpulainen et al., 2024, Park et al., 2017, Grabowski et al., 2016, Santis et al., 2014).

Markdown Report Issue Upgrade to Chat

References (4)

From your Block to our Block: How to Find Shared Structure between Stochastic Block Models over Multiple Graphs (2024)

Rank and select: Another lesson learned (2016)

Fast and reliable inference algorithm for hierarchical stochastic block models (2017)

A Fast Active Set Block Coordinate Descent Algorithm for $\ell_1$-regularized least squares (2014)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fast-Block-Select Algorithm.