Iterated In-Context Learning

Updated 7 February 2026

Iterated In-Context Learning is a method that repeatedly leverages LLM prompts to iteratively refine outputs, improving generalization, diversity, and policy performance.
Its algorithmic variants, including ICICL, enhanced attention with meta-gradient accumulation, and MCMC sampling, systematically enrich context and induce diverse responses.
Empirical studies demonstrate significant gains in API testing, dialogue accuracy, and policy improvement, while highlighting trade-offs such as iteration count and computational overhead.

Iterated In-Context Learning (IICL) and its algorithmic variants formalize the process of repeatedly leveraging in-context learning, often within LLMs, to achieve improved generalization, diversity, posterior approximation, or policy optimization across a range of domains. These methods extend the standard in-context learning paradigm—wherein a model is prompted once with input–output demonstrations—by introducing iterative mechanisms, such as multi-pass attention, Markov chains, or repeated prompting, to accumulate richer information or sample complex posterior or prior distributions.

1. Foundational Paradigms of Iterated In-Context Learning

Standard in-context learning (ICL) provides a frozen LLM with a prompt comprising a small set of input–output pairs (“shots”) and a query, relying on the model to generalize the observed mapping to new inputs via pattern continuation. Formally, with $k$ demonstrations $(x_i, y_i)$ , $i=1\dots k$ , and a new input $x^*$ , the prompt takes the form

$x_1 \to y_1; ~...~; x_k \to y_k; ~x^* \to$

The LLM then produces $\hat{y}^*$ . However, in standard ICL, the model typically processes all demonstrations and the query in a single forward pass, with no opportunity to accumulate knowledge across multiple iterations or to refine its internal state through reprocessing.

Iterated In-Context Learning (IICL) generalizes this by chaining multiple model calls or recursive internal operations—e.g., repeated attention or sampling—so each step can incorporate or propagate new information, extract implicit knowledge, or gradually converge to a target solution (Yang et al., 2023, Zhu et al., 2024, Jain et al., 9 Apr 2025, Merwe et al., 20 Aug 2025).

2. Algorithmic Realizations and Mechanisms

Multiple forms of iteration have been proposed:

2.1 Iterated-Calls In-Context Learning (ICICL)

ICICL extends ICL to generate multiple, diverse examples for tasks such as API specification by sequentially prompting the model with a variety of contexts. The key steps are:

Retrieval: Compute similarity scores $s(p, p^*)$ between target and banked parameters.
Greedy Phase: Use the $k$ most similar contexts to prompt the model (temperature $\tau=0$ ), yielding $e_0$ (high-probability example).
Diversity Phase: For $K$ iterations, sample $k-1$ contexts (weighted by similarity), prepend $e_0$ , and prompt the model at higher temperature ( $\tau=0.5$ ), generating additional examples $e_i$ .
Postprocessing: Filter by type, deduplicate, prioritize coverage, and select up to $m$ final examples using BERT embedding similarity (Jain et al., 9 Apr 2025).

2.2 Iterative Enhanced Attention and Meta-Gradient Accumulation

This mechanism, introduced as “Deep-Thinking” in ICL, decouples demonstration accumulation from test-time inference. Demonstration tokens are forwarded through an L-layer Transformer for $T$ steps, during which meta-gradients aggregate in Key/Value matrices of self-attention layers:

For each iteration $t$ and layer $l$ , historical $K^{(t-1),l}, V^{(t-1),l}$ accumulate via momentum-based updates and learning-rate scaling using meta-gradients from previous passes.
At test time, the “learned” $K^T, V^T$ are prepended to the query example, augmenting per-token test keys/values, thus embedding demonstration-derived context directly in the inference flow.
This forward-only adaptation refines the analogical mapping between demonstrations and queries without weight updates (Yang et al., 2023).

2.3 Markov Chain Monte Carlo and Bayesian Sampling via IICL

IICL can be formalized as an MCMC procedure operating over hypothesis and data spaces with LLMs approximating posterior and likelihood updates:

Each iteration alternates:
- Posterior sampling: $h_t \sim p(h|d_{t-1})$ using the LLM as implicit posterior sampler.
- Data generation: $d_t \sim p(d|h_t)$ where the experimenter provides the likelihood.
The chain defined by this two-step update converges (under ergodicity) to the model’s implicit prior $p(h)$ , enabling direct prior elicitation from a frozen LLM (Zhu et al., 2024).

2.4 Iterative Policy Improvement via In-Context Learning

In dynamic manipulation and reinforcement learning settings, iterated in-context learning is leveraged to predict incremental improvements to parametric policies:

A dataset $\mathcal{D} = \left\{(\theta^j, e^j, \Delta\theta^j)\right\}$ is constructed from recorded policy parameters, rollout errors, and optimal parameter deltas.
At each test-time iteration, the current state is encoded, nearest demonstration examples are retrieved, and the LLM predicts the next policy increment $\Delta\theta^i$ in-context (no parameter updates), enabling online iterative improvement (Merwe et al., 20 Aug 2025).

3. Evaluation Metrics, Empirical Outcomes, and Ablations

Intrinsic evaluation metrics for iterated in-context learning include type-correctness (appropriately typed outputs), uniqueness (distinct example rate), diversity (e.g., mean pairwise BERT cosine distance), and manual correctness (fraction of human-validated outputs). Extrinsic metrics measure impact on downstream tasks, such as:

API fuzzing: coverage and response rates, e.g., branch coverage improvement (+116%), changes in 2xx/4xx response proportions.
Dialogue systems: improvements in slot/intent filling accuracy, e.g., normalized slot filling from 0.76 to 0.81.
Human understanding: time-to-completion and task accuracy in cURL call generation (50% time reduction) (Jain et al., 9 Apr 2025).

Ablation studies reveal that iteration count, context diversity, and meta-gradient momentum each contribute to accuracy and robustness, with over-iteration risking overfitting.

In policy improvement (Merwe et al., 20 Aug 2025), iterative ICL consistently outperforms random shooting, Bayesian optimization, and KNN baselines, with convergence typically in 5–10 in-context iterations across simulated and real-world manipulation tasks.

4. Theoretical Underpinnings and Formal Properties

The Markov chain operator in IICL takes the form:

$T(h' | h) = \int_{d \in D} p(h'|d)\,p(d|h)\,dd$

with stationary marginal $p(h)$ . The duality between meta-gradient accumulation in iterative enhanced attention and gradient descent is empirically observed, with norm decay patterns matching classical optimization. This supports the view of IICL as a form of forward-only implicit fine-tuning or meta-learning within the constraints of a frozen model (Yang et al., 2023, Zhu et al., 2024).

5. Applications and Impact

Iterated in-context learning algorithms have been validated in:

Automatic example generation for web APIs, leading to increases in testing branch coverage and downstream model generalization (Jain et al., 9 Apr 2025).
Elicitation of implicit Bayesian priors from LLMs, matching human empirical priors in domains such as causal reasoning and proportion estimation (Zhu et al., 2024).
In-context iterative policy improvement, enabling state-of-the-art learning in data-scarce manipulation settings without fine-tuning, for both simulated and physical robots (Merwe et al., 20 Aug 2025).
Enhanced few-shot classification and reasoning performance in both small and large transformer models, with gains up to 47% in small model settings across diverse NLP tasks (Yang et al., 2023).

6. Limitations and Prospects for Extension

Observed limitations include dependence on the representativeness of demonstration or parameter banks, computational overhead from multiple model calls or forward passes, risk of overfitting at high iteration counts, and, in IICL for prior elicitation, sensitivity to the assumed likelihood function and LLM world model biases (Jain et al., 9 Apr 2025, Zhu et al., 2024, Yang et al., 2023).

Potential research directions involve adaptive context or iteration scheduling, integration with coverage-driven selection, active human-in-the-loop vetting, expanded support for structured outputs (e.g., trees, graphs), and further theoretical grounding of implicit posterior sampling properties in LLMs (Jain et al., 9 Apr 2025, Zhu et al., 2024).

7. Synthesis and Outlook

Iterated in-context learning unifies a set of techniques that exploit multiple passes—across model calls, prompt contexts, or internal computation—to amplify the inductive and inferential capabilities of LLMs without gradient-based adaptation. By leveraging retrieval, meta-gradient accumulation, and algorithmic formulations rooted in MCMC, IICL delivers systematic improvements in sample diversity, prior elicitation, policy adaptation, and analogical reasoning, broadening the scope of rapid adaptation for sequence models in computational linguistics, program synthesis, scientific reasoning, and robotics (Yang et al., 2023, Zhu et al., 2024, Jain et al., 9 Apr 2025, Merwe et al., 20 Aug 2025).