Dynamic Coarse-to-Fine Inference

Updated 29 January 2026

Dynamic coarse-to-fine inference is an adaptive strategy that uses inexpensive coarse evaluations to filter and select candidates for focused fine-grained analysis.
It employs hierarchical representations and confidence-based gating mechanisms to dynamically refine predictions across domains such as vision, probabilistic modeling, and SLAM.
Empirical studies show that this approach dramatically reduces computational load with minimal accuracy loss, making it ideal for resource-sensitive and real-time applications.

Dynamic coarse-to-fine inference refers to a family of algorithmic strategies in which inference or prediction is adaptively performed at progressively increasing levels of detail, leveraging computationally cheap coarse-grained computations to filter, prune, or select promising candidates before invoking expensive fine-grained procedures. This paradigm has been systematically formalized and widely adopted in a range of domains including probabilistic inference, computer vision, sequential recommendation, coreference resolution, and real-time dynamic systems. Distinct from static multi-resolution methods, dynamic coarse-to-fine inference features on-the-fly selection or refinement, often coupled to confidence signals, gating functions, or data/adaptive heuristics to optimize both efficiency and accuracy.

1. Formal Principles and Algorithmic Structure

Coarse-to-fine inference algorithms exploit hierarchical or multi-resolution representations of the problem space. The general workflow involves:

Constructing a hierarchy of abstraction levels, such as spatial patchings in vision (Liu et al., 29 Nov 2025), block groupings in probabilistic spaces (Stuhlmüller et al., 2015), or discrete class bins in regression/classification tasks (Vaquero et al., 2018).
Performing fast, approximate inference at a coarse resolution to identify areas or candidates of interest.
Dynamically refining only those subsets of the search space that warrant further scrutiny, typically conditioned on a confidence, pruning, or importance score.
Optionally iterating the process, with further fine-grained sub-selection or local refinement.

Mathematically, these procedures often involve explicit mappings between coarse and fine variables. For instance, in probabilistic programs, coarsening and refinement functions (cv, rv) are introduced, constructing a chain of latent variables $X^{(K)} \to X^{(K-1)} \to \cdots \to X^{(0)}$ to mediate the transition between abstraction levels (Stuhlmüller et al., 2015).

2. Applications Across Domains

Probabilistic Inference and Graphical Models

In probabilistic programs, dynamic coarse-to-fine SMC exploits user-defined coarsening and refinement mappings to generate a sequence of intermediate distributions $p_k$ , facilitating efficient particle exploration and resampling along abstraction levels. This leads to enhanced mixing in high-dimensional models (e.g., Ising model, depth-from-disparity MRF, factorial HMM) and supports adaptive adjustment of coarsening depth based on particle effective sample size (Stuhlmüller et al., 2015).

Graphical model optimization leverages a multi-scale cascade of learned classifiers to reduce the solution space, dynamically transitioning from coarse pruned spaces to finer representations (Conejo et al., 2014).

Dense Prediction and Reasoning

Joint coarse-and-fine neural architectures for dense pixel-wise estimation (optical flow) combine a coarse discrete classification head with a fine continuous regression head, decomposing the solution into a rough localization (via class bins $K$ ) and a data-driven residual. This reduces both endpoint error and training time, and allows “dynamic inference”—for example, adjusting bin granularity at runtime or during training according to confidence, with explicit calibration of the regression range via classification uncertainty (Vaquero et al., 2018).

Sequential Models and Attention Mechanisms

In coreference resolution, a dynamic coarse-to-fine pipeline begins with mention pruning (coarse), aggressive antecedent pruning via bilinear factors, and then applies expensive feedforward antecedent scoring on a pruned set. A dynamic, multi-iteration attention-based refinement is performed only on the reduced candidate set, achieving higher F1 with up to 5× fewer computations (Lee et al., 2018).

For encoder-decoder models with attention, dynamic coarse-to-fine attention first predicts a coarse grid or block (via learned or sparsemax-distributed weights, or sampling with REINFORCE), then restricts fine-grained attention within the selected region(s), reducing O(HW)-cost full attention to O(√HW) while maintaining most of the accuracy (Deng et al., 2016).

Similarly, in multimodal fusion tasks, hierarchical dynamic gating mechanisms combine global modality summaries (coarse) with token-level details (fine), shifting reliance according to context via a learned sigmoid gate (Huang et al., 22 Sep 2025).

Vision and Perception Systems

In Transformer-based and Mamba-based vision models, coarse-to-fine inference is realized by first embedding large image patches for a rapid coarse pass. Only when coarse confidence is low are select regions recursively re-embedded at finer resolution, with selection controlled by learned importance scores and confidence thresholds (e.g., α-patch refinement, β-EMA smoothing, etc.). This enables adaptive allocation of FLOPs, maintaining accuracy while reducing computation by up to 47% on ImageNet (Liu et al., 29 Nov 2025). Real-time detection architectures such as CF-DETR further combine these mechanisms with scheduling frameworks to guarantee deadlines on critical object detection, selectively triggering fine processing only in ambiguous or high-criticality regions and batching such subtasks for efficiency (Shin et al., 29 May 2025).

Dynamic Visual SLAM and Tracking

In visual SLAM, such as CFP-SLAM, object- and keypoint-level static probabilities are first initialized at the object level (via semantic and geometric cues) and then dynamically refined using reprojection errors, epipolar errors, and clustering (DBSCAN) within detected regions (e.g., “person” boxes). Weighted optimization is performed in two passes per frame, with multi-level probability updates ensuring robust localization under both high and low dynamic scenarios (Hu et al., 2022).

Molecular Dynamics and Non-Equilibrium Systems

Dynamic coarse-to-fine path-space variational inference compresses fine-scale stochastic processes into surrogate coarse SDEs. Optimization minimizes path-space Kullback-Leibler divergence or relative entropy rate, enabling transferability to new observables via information-theoretic inequalities (CKP, goal-oriented divergences). This procedure encompasses force-matching and path-likelihood maximization in data-driven settings (Harmandaris et al., 2015).

3. Mathematical Formulations and Pseudocode Patterns

Dynamic coarse-to-fine inference is typically described by explicit selection/refinement or pruning formulae, thresholding rules, and gating mechanisms. Examples include:

In dense estimation (Vaquero et al., 2018):

$\hat Y_{i,j}^\mathrm{coarse} = \sum_k C_k\,p_{i,j,k}$

$Y^{\mathrm{hat}}_{i,j} = \hat Y_{i,j}^\mathrm{coarse} + R_{i,j}$

where $R_{i,j}$ is the fine regressed residual.

Confidence-based dynamic forwarding in MambaScope (Liu et al., 29 Nov 2025):

if q[hat] >= eta:
    return hat
else:
    # Score, select, and refine top-alpha patches
    # ...

Attention-based dynamic selection (Deng et al., 2016):

$\alpha_{t,h,w} = \frac{\exp(a_{t,h,w})}{\sum_{h',w'} \exp(a_{t,h',w'})}$

$c_t = \sum_{h,w} \alpha_{t,h,w} V_{h,w}$

with coarse selection via sparsemax or REINFORCE.

Gated fusion in multimodal attention (Huang et al., 22 Sep 2025):

$g = \sigma(W_g [M_c; M_f] + b_g)$

$M_{cf} = g \odot M_c + (1-g) \odot M_f$

NPFP** scheduling for object detection (Shin et al., 29 May 2025): real-time deadlines enforced by separate coarse and fine subtasks, with batch assignment according to periodic release times and priority order.

4. Dynamic and Adaptive Aspects

Distinct from static pruning or hardwired cascades, dynamic coarse-to-fine inference actively adapts at runtime using:

Confidence signals (e.g., coarse softmax probability, attention weight magnitudes, criticality scores) to determine whether and where to refine (Liu et al., 29 Nov 2025, Shin et al., 29 May 2025).
Learned gates or adaptive thresholds that shift reliance between coarse and fine signal sources depending on instance noise, cross-modal ambiguity, or data sparsity (Huang et al., 22 Sep 2025, Li et al., 2022).
Dynamic selection or early stopping, halting inference at a coarse pass where possible and recursively expanding only “complex” regions or ambiguous candidates.

In probabilistic settings, adaptation can be driven by diagnostics such as effective sample size (ESS), variance, or time-budget, with possible insertion or skipping of refinement levels on-the-fly (Stuhlmüller et al., 2015).

5. Computational and Empirical Benefits

Dynamic coarse-to-fine inference achieves order-of-magnitude reductions in computational complexity and resource usage:

In image-to-markup generation, hard C2F attention reduces typical per-token attention computations from 355 to 38 (9× reduction) with only a 3–4% absolute drop in accuracy (Deng et al., 2016).
In vision Mamba, adaptive refinement halves the FLOPs (2.7G vs. 5.0G) while maintaining 99–100% of baseline classification accuracy (Liu et al., 29 Nov 2025).
In dense prediction, combining coarse classification with fine regression yields 16% lower endpoint error on standard benchmarks (Vaquero et al., 2018).
In sequential recommendation, explicit intent modeling in CaFe improves NDCG@5 by 35–53% on highly sparse datasets (Li et al., 2022).
Real-time SLAM pipelines maintain high accuracy (order-of-magnitude ATE reductions in high-dynamic scenes) while running at ≥23 FPS (Hu et al., 2022).
In real-time scheduling, frameworks like NPFP** guarantee deadlines for critical operations, efficiently utilizing slack for opportunistic fine inference and batch scheduling to further reduce latency (Shin et al., 29 May 2025).

6. Design Variants and Trade-offs

Dynamic coarse-to-fine methods offer different design axes:

Hard vs. Soft Selection: Hard attention or sampling yields the greatest speedups, with minor accuracy drop; soft/hierarchical strategies maintain full accuracy but forgo complexity reduction (Deng et al., 2016).
Heuristic vs. Learned Pruning: Pruning or region selection can use pre-specified importance scores, learned attention weights, or hybrid classifiers (Liu et al., 29 Nov 2025, Conejo et al., 2014).
Granularity Adaptation: Bin width, patch size, number of attention hops, confidence thresholds, and batch sizes can be tuned or learned to maximize the efficiency-accuracy Pareto frontier (Liu et al., 29 Nov 2025, Shin et al., 29 May 2025, Hu et al., 2022).
Hierarchical or Multimodal Fusion: Models may dynamically shift weight between global (coarse) and local (fine) signals by learned gating, as in multimodal intent recognition (Huang et al., 22 Sep 2025).

7. Limitations and Empirical Observations

Empirical results indicate that coarse-to-fine strategies generally introduce only small accuracy degradations—typically in the 3–4% range—when maximal acceleration is sought. Hierarchical “soft-soft” or attention-based methods can retain baseline accuracy but at the cost of less reduction in complexity (Deng et al., 2016). Effectiveness is highest under conditions of high data sparsity or where non-informative regions dominate (e.g., in visual tasks with large homogeneous backgrounds, or recommendation from implicit feedback) (Li et al., 2022, Liu et al., 29 Nov 2025). In dynamic environments (e.g., real-time SLAM or AV perception), dynamic coarse-to-fine inference ensures robust operation under both worst-case and mean-case load, adapting its inference depth according to scene complexity or object criticality (Hu et al., 2022, Shin et al., 29 May 2025).

In summary, dynamic coarse-to-fine inference unifies a broad family of hierarchical, adaptive computational procedures. Its key advantages are systematic efficiency gains, competitive or improved accuracy under tight resource constraints, and the capacity to adapt inference granularity and resource allocation conditionally on observed data and task-driven priorities. Widely validated across probabilistic modeling, vision, NLP, and real-time dynamic systems, dynamic coarse-to-fine inference is an essential design paradigm in contemporary large-scale and resource-sensitive AI systems.