Papers
Topics
Authors
Recent
Search
2000 character limit reached

Agentic Experience Search Strategy

Updated 15 January 2026
  • Agentic Experience Search Strategy is a framework that enables LLMs to autonomously decide when, how, and what to search by balancing exploration and exploitation.
  • It employs formal error metrics like over-search (OSR) and under-search (USR) along with per-token uncertainty estimation to guide dynamic retrieval decisions.
  • Reinforcement learning with confidence gating (β-GRPO) refines search policies, reducing inefficiencies and improving answer accuracy across diverse QA benchmarks.

Agentic Experience Search Strategy enables LLMs and related agents to perform autonomous, multi-step information retrieval and reasoning by dynamically deciding when, how, and what to search to maximize efficiency and answer accuracy. This paradigm formalizes the agent’s interaction with external search tools—as opposed to static, pre-retrieved contexts—requiring the agent to balance exploration (searching for new evidence) and exploitation (reasoning over known information), while learning from experience how its own uncertainty and behavior affect downstream outcomes.

1. Formal Definitions and Error Taxonomy

The agentic experience search process is modeled as a discrete sequence of steps T={s1,...,sN}T = \{s_1, ..., s_N\}, where at each tt the agent chooses to either retrieve external information (a retrieval step stRs_t^R) or proceed with internal reasoning (a non-retrieval step stNRs_t^{NR}). Associated with each step is a sub-query, retrieval result (if any), and a sub-answer.

Over-search is quantitatively defined as issuing a search when the correct sub-answer ata_t^* could have been generated using only the model’s internal knowledge and context:

OSR={t:stR and at(M)=at}{t:stR}\text{OSR} = \frac{|\{t: s_t^R \text{ and } a_t(M) = a_t^*\}|}{|\{t: s_t^R\}|}

where at(M)a_t(M) is the answer under search-disabled conditions.

Under-search is failure to retrieve when external evidence is necessary:

USR={t:stNR and atat}{t:stNR}\text{USR} = \frac{|\{t: s_t^{NR} \text{ and } a_t \neq a_t^*\}|}{|\{t: s_t^{NR}\}|}

Empirical analyses find that agentic RAG systems over-search in 20–28% of search steps and under-search in up to 72% of non-search steps, regardless of dataset or backbone (Wu et al., 22 May 2025). Systematic monitoring of OSR and USR curves is critical for diagnosing agent inefficiency beyond aggregate metrics.

2. Uncertainty Estimation and Decision Confidence

Misguided search actions are inseparable from the agent’s ability to estimate its own knowledge boundaries. For each emitted search query, Wu et al. (Wu et al., 22 May 2025) compute a confidence score ctc_t by taking the minimum token probability among the generated search tokens:

ct=minisearch tokensp(qt,i)c_t = \min_{i \in \text{search tokens}} p(q_{t,i})

High ctc_t values robustly predict not only high-quality search actions but also overall answer correctness, with ctc_t–based filtering yielding absolute 6 percentage point gains in exact match metrics. Thus, per-token search confidence provides an actionable signal for reward shaping and policy optimization.

3. Reinforcement Learning for Agentic Search Policy

To minimize sub-optimal search decisions, agentic experience search systems employ reinforcement learning with explicit uncertainty-aware reward structures. The β\beta-GRPO (Group Relative Policy Optimization with confidence gating) updates the policy πθ\pi_\theta only when the trajectory is both correct and all search actions exceed a confidence threshold β\beta:

Rβ(τ)={1if aN=a and c(τ)β 0otherwiseR_\beta(\tau) = \begin{cases} 1 & \text{if } a_N = a^* \text{ and } c(\tau) \geq \beta \ 0 & \text{otherwise} \end{cases}

The expected return is then:

J(θ)=Eτπθ[Rβ(τ)]J(\theta) = \mathbb{E}_{\tau \sim \pi_\theta}[R_\beta(\tau)]

This directly reinforces high-certainty, correct search behavior, reducing OSR by over 1 ppt and USR by over 7 ppt while boosting answer EM by 4 ppt over strong baselines (Wu et al., 22 May 2025).

4. Workflow Initialization, Training, and Implementation

A typical training pipeline first bootstraps the agent with supervised fine-tuning on search-enabled transcripts to teach “how” to search, ensuring reasonable initial actions. RL with β\beta-GRPO then optimizes the “when to search” policy by:

  • Batch-generating multiple rollouts per query (K=5K=5)
  • Evaluating each rollout for correctness and confidence gating
  • Sampling all search decisions at T=1T=1 (maximum entropy)
  • Applying static retrievers (e.g., E5 on Wikipedia) for evaluation repeatability
  • Tuning β\beta by small-scale ablation; β=0.4\beta=0.4 maximizes EM This two-stage procedure both avoids cold-start training failures and prevents reward hacking due to low-confidence search actions.

5. Empirical Performance Across QA Benchmarks

Search Wisely (Wu et al., 22 May 2025) evaluates the agentic experience search strategy on seven diverse QA datasets. Key findings:

  • Baseline agentic RAG with RL already outperforms non-agentic baselines (prompting, SFT)
  • β\beta-GRPO yields an average EM of 34.4% (+4 ppt over non-gated RL)
  • OSR drops from 21.10% to 19.89% while USR drops from 42.04% to 34.71%
  • The efficiency gains arise from better search decision selectivity, not fewer total searches

Wu et al. (Wu et al., 22 May 2025) extract actionable recommendations:

  • Quantitative uncertainty estimation (min-token-prob over search spans) should inform all retrieval decisions
  • Reward structures must balance exploration (allow uncertain searches) and exploitation (discourage unnecessary/low-confidence searches)
  • Bootstrapping with supervised search traces is required before RL
  • Step-wise error curves (OSR/USR) complement end-task metrics, revealing search-path inefficiencies
  • Confidence-threshold tuning allows robust adaptation to new data domains

Together, agentic experience search strategies formalize error types, link decision quality to internal uncertainty, and implement confidence-gated RL to produce agents that “know when they know.” This enables judicious, cost-effective external evidence search and is foundational for scalable, reliable retrieval-augmented LLM systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Agentic Experience Search Strategy.