SGRS: Saliency-Guided Rejection Sampling

Updated 4 February 2026

The paper introduces SGRS, a novel method that leverages per-token gradient-attention saliency to dynamically reject ungrounded tokens in LVLMs.
It computes a saliency score by fusing attention weights with input gradients and applies adaptive thresholding to filter out hallucinations.
Empirical results demonstrate reduced hallucination rates and improved factual accuracy on benchmarks when SGRS is integrated in the decoding process.

Saliency-Guided Rejection Sampling (SGRS) is an inference-time filtering framework developed for large vision-LLMs (LVLMs) to mitigate hallucinations during autoregressive generation. SGRS leverages a per-token gradient–attention saliency metric to dynamically reject candidate tokens that are weakly grounded in the model’s recent context, thereby filtering out predictions with a heightened risk of factual incoherence or hallucination. The method is formulated in the context of the LVLMs-Saliency framework, which quantifies the visual grounding strength of each output token by fusing self-attention weights with their input gradients (Zhang et al., 28 Jan 2026).

1. Saliency Score Definition and Computation

At each autoregressive decoding step $P$ in a pretrained LVLM $M$ , SGRS computes a scalar saliency score for every candidate next token $c$ . The process involves:

Extraction of self-attention weight matrices $A^{(l,h)} \in [0,1]^{P \times P}$ for each layer $l = 1 \ldots L$ and attention head $h = 1 \ldots H$ .
Computation of the cross-entropy loss $\ell(y, t)$ for a one-hot label vector $y$ corresponding to token $t$ over the softmax logits $o(s^{(P)})$ .
Backpropagation of this loss to obtain gradients $\nabla_{A^{(l,h)}}\ell$ .
Construction of saliency matrices as $S^{(l,h)} = \text{tril}(A^{(l,h)} \odot \nabla_{A^{(l,h)}}\ell)$ , where $\odot$ denotes the Hadamard product and $\text{tril}(\cdot)$ applies causal masking, so only previous positions can contribute.
Aggregation and $\ell_2$ normalization across heads to yield $\bar{S}^{(l)}$ for each layer.
Pooling $\bar{S}^{(l)}$ over a set of target layers $L_{\rm target}$ (typically the middle-to-deep layers) and all previous output positions $J = \{j \mid j < P\}$ to yield a scalar saliency score per candidate:

$s(c) = \frac{1}{|J|} \sum_{l \in L_{\rm target}} \sum_{j \in J} \bar{S}^{(l)}_{j, P}$

This saliency score quantifies the extent to which the next token $c$ is grounded in preceding outputs via the current model gradients.

2. Context-Adaptive Thresholding

Instead of a static threshold, SGRS determines an adaptive acceptance criterion for token grounding at each step. For decoding position $P$ , the acceptance threshold $\tau_P$ is given by:

$\tau_P = \alpha \frac{1}{|H_P|} \sum_{j \in H_P} s(x_j)$

where $H_P = \{j \mid P-1-W < j < P\}$ is a window covering the $W$ most recent output tokens and $\alpha \in (0,1)$ is a scaling factor. This context-adaptive threshold reflects recent model behavior and requires the saliency of any new candidate $c$ to exceed a fraction $\alpha$ of the local average saliency history.

3. Rejection Sampling Procedure

The rejection sampling procedure of SGRS operates as follows:

Compute logits $s^{(P)}$ and select the top- $K$ most probable candidate tokens $C \leftarrow \text{TopK}(o(s^{(P)}), K)$ .
For up to $R$ trials, sample a candidate $c$ from $C$ proportional to their softmax probabilities. Compute the candidate’s saliency score $s(c)$ and compare to the context threshold $\tau_P$ .
If $s(c)\geq\tau_P$ , accept $c$ as the next output token $x_P$ . Otherwise, remove $c$ from $C$ and repeat.
If no token is accepted after $R$ trials, select $x_P \leftarrow \arg\max_{c \in C} s(c)$ as a fallback.
Emit $x_P$ and increment $P$ .

The method includes stabilization strategies such as exponential moving average smoothing for $\tau_P$ , a minimal floor $\tau_{\min}$ to prevent over-rejection, and bounding rejection trials to avoid inference stagnation (Zhang et al., 28 Jan 2026).

Hyperparameter	Default Value	Purpose
$K$	20	Top-K sampling size
$R$	5	Max rejection trials
$W$	5	Saliency history window
$\alpha$	0.6	Threshold scaling factor
$L_{\rm target}$	$\{L/2,\ldots,L\}$	Target (middle/deep) layers
$\tau_{\min}$	0.05	Minimal acceptance floor

4. Theoretical and Empirical Motivation

Empirical analysis demonstrates a pronounced relationship between saliency scores and output factuality. Specifically:

Mean saliency of correct tokens $\mu_{\text{correct}} \approx 0.47$ –$0.66$ versus hallucinated tokens $\mu_{\text{halluc}} \approx 0.19$ –$0.35$ across LLaVA-1.5-7B, Qwen2-VL-7B, and Intern-VL-7B.
Hallucination probability decreases monotonically with increasing token saliency.
Artificial reduction of the saliency signal ( $s \rightarrow r\cdot s,\ r<1$ ) elevates the hallucination rate (Zhang et al., 28 Jan 2026).

While no formal theorem asserts that $s(c) < \tau_P$ implies hallucination with probability 1, negative correlation between low saliency and hallucination provides an actionable operational filter.

5. Practical Implementation and Considerations

SGRS requires one backward pass per candidate per decoding step, increasing per-token latency by 30–40% compared to greedy decoding. For efficiency, SGRS can be selectively applied—restricted to tokens with high factuality risk or in conjunction with Local Coherence Reinforcement (LocoRE) for limited computational overhead ( $<2\%$ when used alone) (Zhang et al., 28 Jan 2026). The temperature used in softmax sampling is kept at 1.0 throughout.

Variants include:

Smoothing $\tau_P$ via exponential moving average,
Setting a floor $\tau_{\min}$ for acceptance,
Bounded rejection trials,
Fallback mode on rejection exhaustion.

Practical hyperparameter values were tuned using held-out splits of the CHAIR and POPE hallucination detection benchmarks.

6. Experimental Results and Comparative Effectiveness

SGRS, combined with LocoRE, was assessed on standard benchmarks, comparing standard top-K and nucleus sampling methods and eight additional plug-and-play hallucination mitigation techniques. For LLaVA-1.5-7B:

CHAIR hallucination rate: $35.6\%$ (SGRS+LocoRE) versus $48.0\%$ (baseline).
POPE-F1 score: $87.5\%$ (SGRS+LocoRE) versus $84.0\%$ (baseline).

Comparable improvements were obtained across Qwen2-VL and Intern-VL families. On the MME general benchmark, SGRS+LocoRE improved the “Existence” and “Position” subtasks by 5–7 points compared to greedy sampling. Qualitative examples show SGRS’s ability to reject visually ungrounded (hallucinatory) tokens, such as “blue,” when saliency maps collapse, retaining candidates (“gray,” “watch”) exhibiting strong context dependency (Zhang et al., 28 Jan 2026).

7. Significance and Broader Implications

SGRS offers an interpretable, gradient-linked mechanism for online filtering of weakly grounded tokens in LVLMs, reducing hallucination while preserving fluency and downstream task performance. Its granularity and context sensitivity derive from leveraging intrinsic model dynamics—attention and gradient propagation—rather than relying solely on heuristics or external post-hoc filters. Implementation remains computationally more intensive than greedy decoding, but the method remains practical given its substantial factuality gains, particularly under controlled token selection or in conjunction with LocoRE. This suggests avenues for future research in combining gradient-aware and structural saliency signals for robust LVLM decoding (Zhang et al., 28 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Hallucination Begins Where Saliency Drops (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Saliency-Guided Rejection Sampling (SGRS).