Dual-Process AUQ in Agentic Control

Updated 10 February 2026

Dual-Process AUQ is a framework that actively quantifies uncertainty using fast, local assessments and slower, reflective deliberation.
It integrates dual control systems to dynamically gate execution, mitigating error propagation and improving agent reliability.
Empirical results show improved trajectory calibration, success rates, and AUROC scores, demonstrating AUQ's practical impact in autonomous systems.

Dual-Process Agentic Uncertainty Quantification (AUQ) denotes a set of methodologies and frameworks for principled uncertainty estimation and control in agentic systems, with a special focus on long-horizon, multi-step LLM agents and autonomous control policies. The dual-process paradigm integrates both fast, local uncertainty assessment and slower, deliberative reflection or aggregation, capturing the accumulative and compounding nature of uncertainty in sequential decision-making. AUQ transforms uncertainty quantification (UQ) from a passive, diagnostic sensor into an active, bi-directional controller that can dynamically gate execution, escalation, or self-reflection, thereby directly mitigating unrecoverable epistemic error propagation (the “Spiral of Hallucination”) and enhancing agent reliability (Zhang et al., 22 Jan 2026).

1. Theoretical Foundations of Dual-Process AUQ

Dual-process AUQ frameworks are founded on the decomposition of predictive uncertainty in agentic systems, distinguishing between uncertainty intrinsic to the agent’s current state or model (epistemic), uncertainty irreducible due to environmental or observational noise (aleatoric), and uncertainty propagating through sequential multi-step processes.

Aleatoric vs. Epistemic Uncertainty: Aleatoric uncertainty ( $U_A$ ) quantifies stochasticity or noise inherent to the environment or sensors, while epistemic uncertainty ( $U_E$ ) captures ignorance in the model’s knowledge—uncertainty which could, in principle, be reduced with more data. This decomposition is formalized via predictive variance breakdown:

$\mathrm{Var}[y|x] \approx U_A(x) + U_E(x)$

where $U_A$ is computed as expected within-model variance and $U_E$ as the variance across predictive means of an ensemble of models (Acharya et al., 2022).

Intrinsic vs. Extrinsic Uncertainty in Multi-Step Reasoning: In LLM-based multi-step agents, uncertainty at time $t$ , $H(y_t|x)$ , decomposes into

$H(y_t|x) = H(y_t|y_{1:t-1}, x) + \sum_{i=1}^{t-1} I(y_t; y_i \mid y_{i+1:t-1}, x)$

Here, the first term is internal (intrinsic) uncertainty, characterizing current-step unpredictability given the trajectory so far, and the sum quantifies extrinsic uncertainty, i.e., how much uncertainty is inherited from prior steps via mutual information. This formalism underpins contemporary AUQ algorithms such as UProp and SAUP (Duan et al., 20 Jun 2025, Zhao et al., 2024).

2. Canonical Architectures and Control Mechanisms

A defining feature of dual-process AUQ is the operationalization of uncertainty as an active control signal via complementary mechanisms. Several architectures instantiate this principle:

System 1 (Fast, Implicit / Local): Embodied in “Uncertainty-Aware Memory” (UAM), System 1 injects verbalized confidence scores $\hat{c}_t\in [0,1]$ and natural-language explanations $\hat{e}_t$ into the agent’s context history at each step:

$M_t = \{ (o_i, a_i, \hat{c}_i, \hat{e}_i)\}_{i=0}^{t-1}$

The model’s next output is implicitly conditioned on this memory, with self-attention mechanisms damping overconfident continuation and steering exploration away from known knowledge gaps (Zhang et al., 22 Jan 2026).

System 2 (Slow, Explicit / Global): “Uncertainty-Aware Reflection” (UAR) is triggered when confidence falls below a threshold $\tau$ , typically invoking targeted inference-time reflection or alternative policy sampling. The agent engages in meta-reasoning by conditioning on rational cues distilled from $\hat{e}_t$ to address its own flagged uncertainties, often using best-of-N diversified sampling and consistency-weighted selection:

$S_{\text{cons}}(a) = \frac{1}{N} \sum_{k=1}^N \hat{c}_n \cdot \mathbb{1}(a_n \equiv a)$

This process balances computational efficiency with deep deliberation, adapting reflectivity to epistemic necessity (Zhang et al., 22 Jan 2026).

Integration via Dual-Process Policy: The two systems are integrated into a policy

$\pi_{\text{dual}}(a|h_t) = \begin{cases} \pi_{\text{fwd}}(a|M_t) & \text{if } \hat{c}_t \geq \tau \ \pi_{\text{inv}}(a|h_t, \hat{e}_t) & \text{if } \hat{c}_t < \tau \end{cases}$

The design is typically training-free, relying on LLMs’ inherent metacognitive ability and architectural memory (Zhang et al., 22 Jan 2026).

System	Role	Control Signal
UAM	Fast, unconstrained step execution	Confidence and explanation memory
UAR	Gated reflection, deep deliberation	Confidence threshold, rational cue prompt

3. Quantification Methodologies

Methods for uncertainty estimation in dual-process AUQ span generative modeling, ensemble statistics, information-theoretic estimators, and behavioral surrogates.

Ensemble Variational Methods: Competency- and uncertainty-aware agents are constructed via ensembles of recurrent conditional VAEs. Epistemic uncertainty is the variance of predictive means across ensemble members, while aleatoric is the expected variance within each model over latent samples (Acharya et al., 2022).
Mutual Information (MI) Propagation: Algorithms such as UProp directly decompose per-step uncertainty into intrinsic and extrinsic terms. Pointwise mutual information (PMI) is estimated efficiently over sampled trajectories using kernel-smoothing and branch sampling, yielding computationally tractable but theoretically principled estimates of uncertainty propagation (Duan et al., 20 Jun 2025).
Situational Aggregation and Surrogate Weights: Frameworks such as SAUP introduce situational awareness weights $w_t$ at each step, calculated from context distances or learned via HMM surrogates, allowing weighted global aggregation of per-step uncertainty:

$U_{\text{agent}} = \sqrt{\frac{1}{N}\sum_{t=1}^N (w_t u_t)^2}$

where $u_t$ is the entropy- or likelihood-based local UQ score (Zhao et al., 2024).

Self-Verbalized Uncertainty and Reflection: Modern instruction-tuned LLMs output self-assessed confidence scores and natural language explanations as part of action prediction. These are explicitly treated as active control signals in subsequent reasoning (Zhang et al., 22 Jan 2026).

4. Empirical Performance and Calibration

Empirical validation of dual-process AUQ is performed on agentic reasoning, tool-use, and real-world task benchmarks, targeting trajectory-level calibration, success rates, and robust discriminative power.

Trajectory Calibration Metrics: AUQ achieves strong performance on closed-loop tasks such as ALFWorld and WebShop, reducing trajectory-ECE (expected calibration error) from 0.264 (ReAct baseline) to 0.109 (UAM-only) and 0.174 (full dual AUQ). Similar improvements are observed in trajectory Brier scores (Zhang et al., 22 Jan 2026).
Success Rate Gains: On ALFWorld, AUQ increases success rate from 63.6% (ReAct) and 69.5% (CoT-SC) to 74.3% (Dual). On WebShop, the rate improves from 29.3% to 42.9% (Zhang et al., 22 Jan 2026).
Open-ended Research Tasks: On DeepResearch Bench, dual-process AUQ achieves the highest RACE score (52.1 vs. 50.6 for next best) by leveraging targeted System 2 reflection for flagged epistemic gaps (Zhang et al., 22 Jan 2026).
Discriminative Power: AUQ consistently attains AUROC scores above 0.96 for end-state belief discrimination (Zhang et al., 22 Jan 2026). SAUP obtains up to 20% absolute AUROC improvements for incorrect answer ranking relative to best prior state-of-the-art single-step UQ methods (Zhao et al., 2024). UProp shows 2–11 AUROC point improvements on multi-step benchmarks over baselines (Duan et al., 20 Jun 2025).

5. Application Domains and Integration Patterns

Dual-process AUQ underpins a range of applications in autonomous agents, LLM-based planning, risk-aware access control, and multi-agent systems.

Agentic Planning and Tool Use: AUQ enables selective reflection, deferred execution, and human-in-the-loop escalations by mapping uncertainty signals into actionable control policies (Zhang et al., 22 Jan 2026, Zhao et al., 2024, Duan et al., 20 Jun 2025).
Competency Assessment: Ensemble-based AUQ explicitly communicates both aleatoric and epistemic uncertainties to users for competency and safety assessment, facilitating risk-sensitive planning and the triggering of emergency fallbacks (Acharya et al., 2022).
Access Control Architectures: In security-critical enterprise domains, AUQ supports risk-adaptive TBAC (Task-Based Access Control) using LLMs as risk and UQ judges: composite risk and model uncertainty are computed for each requested action, with two-dimensional thresholds gating auto-approval vs. escalation, thus enforcing more trustworthy, least-privilege policies (Fleming et al., 13 Oct 2025).

Application	AUQ Role	Control Outcome
LLM agent (multi-step)	Verbalized UQ, reflection gating	Step routing, reflection, memory
Autonomous robotics	Aleatoric/epistemic decomposition	Conservative planning, flagging
Access control (TBAC)	Risk + model uncertainty thresholds	Policy approval, escalation

6. Computational Considerations and Scalability

Practical deployment of dual-process AUQ requires balancing statistical tightness against computational tractability:

Ensemble Size and Sampling: Ensembles of 5–10 models with 10–20 latent samples per model are sufficient for real-time planning at scales up to 10 Hz; larger ensembles provide diminishing marginal improvements (Acharya et al., 2022).
Best-of-N, Trajectory Sampling: Reflection and MI estimators employ batch sampling per step or per candidate branch, efficiently parallelizable for moderate N (2–10 branches) (Duan et al., 20 Jun 2025).
Overheads and Surrogates: HMM-based surrogates in SAUP—responsible for weighting steps in uncertainty aggregation—require negligible resources compared to the LLM inference call itself. When resources are constrained, plain distance or position surrogates can be used without significant performance degradation (Zhao et al., 2024).
Training-Free Protocols: Most dual-process AUQ instantiations operate entirely at inference-time, without updating model weights or requiring retraining, relying on protocol-augmented prompting and dynamic memory (Zhang et al., 22 Jan 2026).

7. Interpretability, Human Oversight, and Trust

A core value proposition of dual-process AUQ is facilitating interpretability and human trust:

By separating and reporting distinct uncertainty components, AUQ frameworks provide operators with actionable representations (e.g., “environment too noisy” vs. “model is out-of-depth”) (Acharya et al., 2022).
In access control, a two-dimensional (risk, uncertainty) point enables more robust, context-sensitive policy enforcement, while audit trails and calibration routines are incorporated for transparency and post-hoc analysis (Fleming et al., 13 Oct 2025).
Selective reflection and escalation protocols serve as safeguards, ensuring that ambiguous or high-stakes situations trigger deeper deliberation or defer to human judgment.

This suggests AUQ represents a paradigmatic shift from passive uncertainty diagnostics to integrated, agent- and task-contingent uncertainty management, providing measurable advances in trajectory-level calibration, task performance, and safe deployment across diverse agentic domains (Zhang et al., 22 Jan 2026, Duan et al., 20 Jun 2025, Zhao et al., 2024, Fleming et al., 13 Oct 2025, Acharya et al., 2022).