Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bayes-Optimal Per-Query Gating

Updated 27 January 2026
  • The paper establishes a Bayesian framework for per-query gating that selects between language model predictions and retrieval evidence using risk minimization.
  • It leverages entropy pursuit and trust-based penalties to balance accuracy and reliability in both active preference learning and retrieval-augmented generation.
  • Empirical results demonstrate computational efficiency and improved factuality, validating hybrid geometric-semantic approaches under varied query conditions.

A Bayes-optimal per-query gate is a statistical mechanism for selecting, on a query-by-query basis, between competing sources of prediction or evidence under a formal Bayesian risk-minimization criterion. Such gates have been established in both active preference learning ("Bayes-Optimal Entropy Pursuit" (Pallone et al., 2017)) and retrieval-augmented generation (RAG) settings ("A Note on k-NN Gating in RAG" (Biau et al., 20 Jan 2026)). In these frameworks, the gate determines which information source—internal model or retrieved memory—should be trusted for each query, taking into account uncertainty, data quality, and downstream objectives such as entropy reduction, misclassification error, or factuality.

1. Mathematical Frameworks for Per-Query Gating

In active choice-based preference learning (Pallone et al., 2017), the system models user preferences via a linear classifier θRd\bm\theta\in\mathbb{R}^d with a Bayesian prior μ0\mu_0. At each time kk, an mm-way choice query is constructed, and the user selects the most preferred item. Observation likelihoods are specified by a noise-channel matrix PRm×mP\in\mathbb{R}^{m\times m}. The posterior on θ\bm\theta is updated using Bayes’ rule given the user's possibly noisy response.

In retrieval-augmented language modeling (Biau et al., 20 Jan 2026), a query xRdx\in\mathbb{R}^d has an unknown label yYy\in\mathcal{Y} and is processed by both:

  • A frozen base LM, yielding q0(x)q_0(\cdot|x),
  • A kk-NN retriever on a memory bank, producing r^(k)(x)\hat{r}^{(k)}(\cdot|x).

A gating function λ(x)[0,1]\lambda(x)\in[0,1] yields the prediction mixture: pλ(yx)=(1λ(x))q0(yx)+λ(x)r^y(k)(x).p_\lambda(y|x) = (1-\lambda(x))q_0(y|x) + \lambda(x)\hat{r}^{(k)}_y(x). The gate is optimized to minimize expected cross-entropy to the ground-truth conditional distribution, penalized by a retrieval-trust term wfact(x)w_{\mathrm{fact}}(x).

2. Bayes-Optimal Policy Derivation

The Bayes-optimal per-query gate is derived by minimizing a risk or loss function that is pointwise decomposable in xx (Biau et al., 20 Jan 2026). For each xx,

J(λ;x)=yPYX(yx)[logpλ(yx)]+ζλ(x)(1wfact(x)),J(\lambda;x) = \sum_{y} P_{Y|X}(y|x)[-\log p_\lambda(y|x)] + \zeta\,\lambda(x)\bigl(1-w_{\mathrm{fact}}(x)\bigr),

where ζ\zeta is a regularization parameter and wfact(x)w_{\mathrm{fact}}(x) measures retrieval reliability: wfact(x)=1kj=1kexp(xU(j)(x)2).w_{\mathrm{fact}}(x) = \frac{1}{k}\sum_{j=1}^{k}\exp\left(-\|x-U_{(j)}(x)\|^2\right). For hard gating, the Bayes-optimal rule is: λ(x)={1if r(x)+ζ(1wfact(x))<0(x), 0otherwise,\lambda^\star(x) = \begin{cases} 1 & \text{if } \ell_r(x) + \zeta(1-w_{\mathrm{fact}}(x)) < \ell_0(x), \ 0 & \text{otherwise}, \end{cases} where 0(x)\ell_0(x) and r(x)\ell_r(x) are the population cross-entropies for the LM and retriever, respectively.

In the entropy pursuit setting (Pallone et al., 2017), the Bayes-optimal policy for query selection is provably greedy with respect to mutual information, i.e., at each step it selects the query that maximizes expected posterior entropy reduction (equivalently, mutual information between preference vector θ\bm\theta and observation).

3. Role of Trust, Penalization, and Memory Alignment

The retrieval-trust weight wfact(x)w_{\mathrm{fact}}(x) encodes the geometric reliability of retrieved evidence. It approaches 1 in dense, in-distribution regions and falls toward 0 for out-of-support or noisy queries. The penalty ζλ(x)(1wfact(x))\zeta\,\lambda(x)(1-w_{\mathrm{fact}}(x)) in the gating loss discourages the gate from relying on retrieval in low-trust regions, thus providing a statistical guard against spurious or misleading evidence (Biau et al., 20 Jan 2026).

A hybrid geometric-semantic model accounts for both covariate shift (in XX vs. reference UU) and label corruption in memory, modulating both retrieval distribution and associated trust quantities. Under such shifts, wfact(x)w_{\mathrm{fact}}(x) decays exponentially in the distance from xx to the memory support, ensuring the optimal gate contracts toward baseline model reliance in off-support or adversarial regions.

4. Information-Theoretic Objectives and Guarantees

Bayes-optimal per-query gates often optimize information-theoretic objectives:

  • In entropy pursuit (Pallone et al., 2017), the posterior differential entropy Hk(θ)H_k(\bm\theta) is minimized; the expected one-step entropy reduction equals the mutual information between observation and latent parameter.
  • The maximal per-step entropy reduction is bounded by the "channel capacity" C(P)C(P) determined by the predictive distribution and noise channel: C(P):=supuΔm{h(uTP)uTh(P)}C(P) := \sup_{u\in\Delta^m}\left\{ h(u^TP) - u^Th(P) \right\} If query alternatives can be constructed from a continuum, greedy entropy pursuit attains the linear rate of entropy decrease: H(θ)HKKC(P)H(\theta) - H_K \geq K C(P). Sensitivity results ensure robust performance even when the attained predictive distribution only approximates the global optimum.

Misclassification error is fundamentally lower-bounded by posterior entropy via Fano's inequality: Emissπ(k)H(WS)Iπ(θ;Y1:k)1log2n\mathbb{E}^\pi_{\mathrm{miss}}(k) \geq \frac{H(W|S) - I^\pi(\theta; Y_{1:k}) - 1}{\log_2 n} indicating that per-query Bayesian entropy control directly governs error rates.

5. Statistical Hallucination, Discordance, and Large-Sample Limits

A discordance-based hallucination criterion quantifies local disagreement between LM predictions and retrieval evidence, weighted by retrieval trust (Biau et al., 20 Jan 2026): Hdisc(q0;x)=wfact(x)[1q0(yr(x)x)],\mathcal{H}_{\mathrm{disc}}(q_0; x) = w_{\mathrm{fact}}(x)\left[1 - q_0(y_r(x)\mid x)\right], where yr(x)y_r(x) is the modal retriever label. The optimal gating solution reduces this discordance only if retrieval meaningfully improves over the LM in well-trusted regions. Asymptotically, in the aligned regime with kk\to\infty, the empirical retriever converges to the true conditional PYXP_{Y|X} and gates nontrivially only at points where the Bayes error is strictly less than LM error.

A plausible implication is that capacity for hallucination mitigation via retrieval is structurally limited by the agreement between the LM and the true Bayes rule in high-density regions.

6. Empirical Performance and Computational Considerations

In choice-based preference learning (Pallone et al., 2017), empirical evaluation on large document sets demonstrates that entropy-pursuit outperforms the knowledge-gradient (KG) policy on posterior entropy and is significantly more computationally efficient. For misclassification error, KG may yield marginal gains in low-noise, weak-prior regimes, but both policies are nearly indistinguishable in regimes with moderate noise or strong prior. The computational advantage of entropy-pursuit stems from needing only O((Nm))O(\binom{N}{m}) vs. O((Nm)(Nn))O(\binom{N}{m}\binom{N}{n}) candidate sets in KG.

7. Scope, Limitations, and Generalizations

The Bayes-optimal per-query gating formalism is broadly applicable wherever two or more sources of predictive evidence must be reconciled at inference time under uncertainty. In both the preference learning and RAG contexts, generalizations to hybrid models—combining geometric and semantic corruption—are mathematically natural via the proposed trust and reliability terms. However, performance depends on adequate estimation of underlying densities, calibration of penalty hyperparameters (ζ\zeta), and availability of high-quality memory support.

In sum, the Bayes-optimal per-query gate provides a principled, risk-minimizing mechanism for balancing model fluency, retrieval grounding, and statistical reliability on a per-query basis, with strong information-theoretic and statistical guarantees in a variety of learning and inference frameworks (Pallone et al., 2017, Biau et al., 20 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bayes-Optimal Per-Query Gate.