Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dynamic Attentional Context Scoping: Agent-Triggered Focus Sessions for Isolated Per-Agent Steering in Multi-Agent LLM Orchestration

Published 9 Apr 2026 in cs.MA, cs.AI, and cs.LG | (2604.07911v1)

Abstract: Multi-agent LLM orchestration systems suffer from context pollution: when N concurrent agents compete for the orchestrator's context window, each agent's task state, partial outputs, and pending questions contaminate the steering interactions of every other agent, degrading decision quality. We introduce Dynamic Attentional Context Scoping (DACS), a mechanism in which the orchestrator operates in two asymmetric modes. In Registry mode it holds only lightweight per-agent status summaries (<=200 tokens each), remaining responsive to all agents and the user. When an agent emits a SteeringRequest, the orchestrator enters Focus(a_i) mode, injecting the full context of agent a_i while compressing all other agents to their registry entries. Context isolation is agent-triggered, asymmetric, and deterministic: the context window contains exactly F(a_i) + R_{-i} during steering, eliminating cross-agent contamination without requiring context compression or retrieval. We evaluate DACS across four experimental phases totalling 200 trials: Phase 1 tests N in {3,5,10} (60 trials); Phase 2 tests agent heterogeneity and adversarial dependencies (60 trials); Phase 3 tests decision density up to D=15 (40 trials); Phase 4 uses autonomous LLM agents for free-form questions (40 trials, Claude Haiku 4.5). Across all 8 synthetic scenarios, DACS achieves 90.0--98.4% steering accuracy versus 21.0--60.0% for a flat-context baseline (p < 0.0001 throughout), with wrong-agent contamination falling from 28--57% to 0--14% and context efficiency ratios of up to 3.53x. The accuracy advantage grows with N and D; keyword matching is validated by LLM-as-judge across all phases (mean kappa=0.909). DACS outperforms the flat-context baseline by +17.2pp at N=3 (p=0.0023) and +20.4pp at N=5 (p=0.0008) in Phase 4, with the advantage growing with N confirmed by two independent judges.

Authors (1)

Summary

  • The paper demonstrates that DACS, a deterministic and agent-triggered protocol, increases steering accuracy up to 98.4% while reducing context pollution significantly.
  • It utilizes a dual-mode architecture—Registry and Focus—to maintain lightweight per-agent registries and isolate complete context as needed, ensuring efficient orchestration.
  • Empirical analyses across diverse and high-density scenarios validate DACS's robust performance improvements and sub-linear context scaling for multi-agent LLM environments.

Dynamic Attentional Context Scoping for Multi-Agent LLM Orchestration: Mechanism and Empirical Analysis

Motivation and Problem Statement

The orchestration of multiple concurrent LLM agents—where an orchestrator LLM coordinates, steers, and integrates the activity of numerous specialized agents—poses acute scaling challenges due to "context pollution." As the number of agents (NN) increases, each agent's intermediate state, task-specific outputs, and pending queries compete for space in the orchestrator's context window. In flat-context architectures, this mutual contamination erodes per-agent steering accuracy, decreases context efficiency, and induces cross-agent leakage (i.e., the orchestrator's responses to one agent are semantically or lexically polluted by the states of others). Prior context management research addresses single-agent memory overflow, tool/retrieval bloat, or system-level memory compression but does not implement per-agent dynamic isolation for concurrent, steerable agents.

DACS: Mechanism Design and Theoretical Properties

Dynamic Attentional Context Scoping (DACS) is introduced as a deterministic, agent-triggered protocol for context isolation in multi-agent orchestration. The orchestrator LLM switches between two primary asymmetric operational modes:

  • Registry Mode: The orchestrator holds a lightweight per-agent registry (≤200\leq 200 tokens/entry), updated in real-time with status, minimal task summaries, and progress. This mode supports system responsiveness and efficient agent monitoring.
  • Focus(ai(a_i) Mode: Upon receipt of a SteeringRequest from agent aia_i, the orchestrator loads aia_i’s complete context (task details, steering history, current output) and compresses all other agent contexts to their respective registry entries. The context window thus contains F(ai)+R−iF(a_i) + R_{-i}, ensuring all other agent state is minimized.

Context isolation is agent-triggered, respecting agent urgency (with preemption for high-priority requests), and ensures focus context scales sub-linearly with NN and decision density DD. The focus mode context size grows as ∣F∣+(N−1)∣r∣|F| + (N-1)|r|, and the context efficiency ratio (flat baseline/DACS) increases monotonically with NN and ≤200\leq 2000.

Experimental Design

Empirical analysis spans four experimental phases using an open-source orchestration harness, with all agent-orchestrator interactions fully observable at the token level:

  1. Phase 1 (Agent Count Scaling): ≤200\leq 2001 agents; canonical mix of code, research, and data agents. Measures how context pollution and accuracy scale with ≤200\leq 2002.
  2. Phase 2 (Agent Diversity): Homogeneous (shared vocabulary), maximally diverse (disjoint domains), and adversarial cascade (pipeline dependency) scenarios test the protocol under various agent interaction structures.
  3. Phase 3 (Decision Density): Fix ≤200\leq 2003, scale up decisions-per-agent ≤200\leq 2004 to probe context and accuracy degradation under long interaction sequences.
  4. Phase 4 (Ecological Validity): All scripted agent stubs are replaced with autonomous LLM agents (Claude Haiku 4.5), emitting free-form steering requests, evaluated with LLM-as-judge protocols (Claude Haiku and GPT-4o-mini).

Three metrics are core: steering accuracy (keyword/LLM-judged correctness per agent decision-point), wrong-agent contamination (keyword leakage from non-target agents), and actual orchestrator context size at each steering point.

Principal Findings

Context Pollution and Accuracy Dynamics

In all synthetic scenarios, DACS yields a steering accuracy of 90.0–98.4% versus 21.0–60.0% for a flat-context baseline (p < 0.0001). The advantage grows with agent count—accuracy delta increases from +36.7 pp (≤200\leq 2005) to +69.0 pp (≤200\leq 2006). DACS also suppresses contamination rates, achieving 0-14% (vs 28–57% baseline). Critically, the context efficiency ratio grows with scale, reaching up to 3.53× at ≤200\leq 2007. Figure 1

Figure 1: DACS vs. flat-context baseline across ≤200\leq 2008, highlighting sharp improvements in steering accuracy, substantial reductions in cross-agent contamination, and aggressive context compression under DACS.

Agent Diversity and Pipeline Dependency

DACS maintains its advantage across both homogeneous and heterogeneous agent sets. For maximally diverse agents, steering accuracy remains at 96.0% with zero contamination, while the baseline collapses to 37.0% accuracy with >50% cross-agent leakage. In the pipeline (cascade) scenario—where theoretical advantages for the flat baseline may exist—DACS still outperforms by +37.3 pp. Figure 2

Figure 2: Phase 2 (agent diversity): DACS maintains accuracy and eliminates contamination across homogeneous, crossfire, and cascade settings, with context stability tied to enforced isolation.

Decision Density Amplification

As decision density ≤200\leq 2009 increases (e.g., (ai(a_i0 for (ai(a_i1), the baseline accuracy degrades sharply, falling from 60.0% to 44.2%, whereas DACS shows stable or even improved performance (up to 98.4%). This underpins the compounding effect of interaction history length on context pollution—highlighting DACS’s sub-linear context growth and resilience. Figure 3

Figure 3: Decision density scaling (Phase 3) demonstrates the compounding degradation of the baseline as steering history lengthens, while DACS preserves both accuracy and context compactness.

Autonomous LLM Agents Validation

In fully autonomous agent settings (Phase 4), DACS yields a +17.2 pp accuracy delta vs baseline at (ai(a_i2 and +20.4 pp at (ai(a_i3, corroborated by both Claude Haiku and GPT-4o-mini judges. The advantage persists, though absolute accuracy decreases under the more demanding, free-form query regime (reflecting increased ambiguity and evaluation stringency). The context efficiency effect is mirrored. Figure 4

Figure 4: Real-agent validation: DACS sustains its accuracy/context advantage against the baseline in a live LLM-agent regime (Phase 4), consistent with findings in synthetic, scripted settings.

Theoretical and Practical Implications

The empirical evidence supports several theoretical claims:

  • Context isolation is the principal causal variable in multi-agent steering accuracy. Unlike prior work focused on hierarchical routing or message fidelity tiering, DACS’s deterministic isolation directly targets cross-agent contamination at its moment of impact.
  • Sub-linear context scaling is not a byproduct of architectural decomposition but an enforceable protocol-level guarantee. This ensures that as systems scale to large (ai(a_i4 and high (ai(a_i5, context windows remain within model limitations without sacrificing agent steering fidelity.
  • Design orthogonality: DACS is fully compatible with hierarchical agent architectures and can be composed with approaches like AgentOrchestra’s planning delegation (Zhang et al., 14 Jun 2025) or context tiering.

For practical deployment, these findings indicate that orchestrator-centric LLM systems operating in tool-rich, high-parallelism environments should privilege deterministic, agent-triggered context isolation over opportunistic or reactive compression/eviction heuristics. Further, measuring and controlling actual context content at steering time supersedes aggregate memory or cache metrics for correctness and alignment.

Future Directions

Several axes require further exploration: extension to (ai(a_i6 and higher interaction densities, generalization across model families (including low-context or instruction-tuned architectures), refinement of contamination metrics (from binary cross-agent keyword detection to clause-specific leakage analysis), and integration with production user-facing responsiveness evaluations.

Conclusion

Dynamic Attentional Context Scoping deterministically eliminates context pollution by agent-triggered, asymmetric context window isolation at multi-agent orchestration steering points. Across a battery of synthetic and real-agent trials, the protocol produces substantial, robust increases in steering accuracy, strong suppression of contamination, and highly efficient context growth. Its formalization and open-source availability offer a direct path to scalable, interpretable, and accurate multi-agent LLM systems, with broad applicability to tool orchestration, workflow management, and agent collaboration in large-scale environments.


References

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.