Papers
Topics
Authors
Recent
Search
2000 character limit reached

RP-ReAct: Multi-Agent Enterprise Automation

Updated 3 February 2026
  • RP-ReAct is a multi-agent architecture that decouples high-level planning (RPA) from low-level execution (PEA) to achieve reliable enterprise task automation.
  • It utilizes a context-saving mechanism that limits LLM token usage by offloading extensive tool outputs, thereby preventing context overflow.
  • The design incorporates continuous dynamic replanning and empirical metrics to ensure robust performance and stability in complex, large-scale environments.

RP-ReAct is a multi-agent architecture designed for complex enterprise task automation that disentangles high-level reasoning from low-level tool execution, enabling reliable coordination of multiple tools and effective management of limited LLM context windows. The design fundamentally addresses trajectory instability and context overflow in traditional monolithic plan-execute loops, supporting robust, generalizable autonomous agents suitable for demanding enterprise environments (Molinari et al., 3 Dec 2025).

1. System Structure and Agent Roles

The RP-ReAct framework consists of two primary agentic components:

  1. Reasoner-Planner Agent (RPA): Responsible for all high-level planning, decomposition of the user goal GG into sub-steps, and ongoing strategic reasoning driven by a large reasoning model (LRM).
  2. Proxy-Execution Agent (PEA): Executes sub-steps handed off from the RPA by interacting with external tools and APIs through an iterative ReAct loop. PEAs employ a context-saving mechanism to strictly bound LLM context consumption by offloading large tool outputs to external storage, providing only token-limited previews in LLM memory.

Interaction proceeds in a dialogue-like handshake: at each step tt, RPA emits a sub-question sts_t to the PEA, which returns a (possibly truncated) result rtr_t. This exchange is repeated—allowing for replanning, correction, and context refresh—until the goal is achieved or the step budget is exhausted. The architecture supports both single and multi-agent deployments, though the canonical instantiation focuses on a single RPA supervising one or more PEAs [(Molinari et al., 3 Dec 2025), § Methodology Fig. 1].

2. Formal Process and Control Flow

The formal operational process is as follows:

  • RPA Planning: At step tt with history Ht={(si,ri)}i<tH_t = \{(s_i, r_i)\}_{i<t}, the RPA emits st=πR(Ht,G)s_t = \pi_R(H_t, G). Upon receiving rtr_t, RPA applies a success test δ(rt){0,1}\delta(r_t) \in \{0,1\}. If δ(rt)=0\delta(r_t) = 0, RPA triggers a replanning procedure, computing the next sub-question tt0 as tt1, where tt2 estimates task progress utility.
  • PEA ReAct Loop: For each sub-question, the PEA engages in a ReAct cycle up to tt3 steps: iteratively “think” (generate internal reasoning), “act” (select tool/application), “execute” (run tool, obtain tt4), apply context-saving if tt5, and update conversational context. The result tt6 is either a truncated tool output or a reference to externally stored data, minimizing context window overflow [(Molinari et al., 3 Dec 2025), Algorithmic Pseudocode §3].

This division means RPA’s context remains concise and focused strictly on sub-questions and results, while execution complexity and noisy tool outputs are abstracted away in the PEA.

3. Context-Saving Mechanism

A critical innovation is the context-saving strategy within the PEA. For tool output tt7 exceeding threshold tt8 tokens, the PEA stores the full output externally (database, file, or object store) and returns only a preview tt9 along with a retrieval variable sts_t0 to the LLM context. Formally:

  • Context cost without saving: sts_t1
  • With context-saving: sts_t2

This mechanism yields linear growth in context cost with sts_t3 (number of tool calls sts_t4 preview tokens) rather than sts_t5. Empirical savings sts_t6 are substantial for large table/text outputs, ensuring operational viability for agents employing narrow-window open-weight LLMs [(Molinari et al., 3 Dec 2025), §2.3, §4.2].

4. Dynamic Replanning and Trajectory Stability

RP-ReAct employs continuous dynamic replanning: after each execution result, the binary success function sts_t7 ensures the RPA can adjust strategy or re-issue sub-steps as needed. Metrics for empirical stability are defined using trajectory accuracy sts_t8 for model sts_t9 and agent rtr_t0:

  • rtr_t1
  • rtr_t2
  • rtr_t3
  • Coverage Product Score: rtr_t4

Empirical studies report lower trajectory standard deviation and higher rtr_t5 versus baselines, particularly on hard ToolQA tasks, supporting robustness across LLM and tool variations [(Molinari et al., 3 Dec 2025), Table 3].

5. Architectural Design Decisions

The RP-ReAct architecture results from several explicit decisions:

  • Full Decoupling: Separation of high-level planning (reasoning, intent) from low-level execution (tool interaction, API error handling). The RPA is insulated from LLM context pollution due to noisy, verbose, or malformed tool responses (cf. “context-drift”).
  • Budgeted Steps: RPA and PEA each use a capped number of planning/execution steps (e.g., rtr_t6), guaranteeing upper bounds on latency and preventing unbounded loop execution [(Molinari et al., 3 Dec 2025), Experimental Setup].
  • External Storage Abstraction: Supports tool APIs returning arbitrarily large data objects, with only pointer variables ever carried in LLM memory.
  • Multi-Agent Compatibility: Architecture is readily extensible to multiple PEAs acting under a shared RPA—this suggests applicability to scenarios with high tool diversity or parallelizable subtasks.

These design elements collectively yield a system that is modular, robust to model scaling, and empirically generalizable across diverse task domains.

6. Empirical Evaluation and Application Domains

RP-ReAct was evaluated on the multi-domain ToolQA benchmark, using six open-weight reasoning models. Results demonstrate:

  • Superior overall and task-specific performance versus various monolithic and tool-integrated baselines.
  • Improved generalization to unseen domains, attributed to context isolation and replanning.
  • Robustness and stability across LLM scales—lower variance in task trajectories.

The agentic paradigm is directly applicable in enterprise environments characterized by strict privacy requirements (local LLMs), heterogeneous toolchains (DB, spreadsheet, API, Python), and frequent large-output scenarios. The architecture supports deployments requiring strong modularity and compositional reasoning over complex workflows [(Molinari et al., 3 Dec 2025), § Results].

7. Summary Table of RP-ReAct Key Components

Component Function Notable Feature
Reasoner-Planner High-level planning, step-wise Maintains clean subgoal context
Proxy-Execution Sub-step execution, ReAct loop Context-saving, external storage
Dynamic Replanning Trajectory correction after feedback Ensures goal-achievement robustness
Context Management Manages tool output in LLM memory Enables narrow-window deployment

References

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RP-ReAct Architecture.