RP-ReAct: Multi-Agent Enterprise Automation

Updated 3 February 2026

RP-ReAct is a multi-agent architecture that decouples high-level planning (RPA) from low-level execution (PEA) to achieve reliable enterprise task automation.
It utilizes a context-saving mechanism that limits LLM token usage by offloading extensive tool outputs, thereby preventing context overflow.
The design incorporates continuous dynamic replanning and empirical metrics to ensure robust performance and stability in complex, large-scale environments.

RP-ReAct is a multi-agent architecture designed for complex enterprise task automation that disentangles high-level reasoning from low-level tool execution, enabling reliable coordination of multiple tools and effective management of limited LLM context windows. The design fundamentally addresses trajectory instability and context overflow in traditional monolithic plan-execute loops, supporting robust, generalizable autonomous agents suitable for demanding enterprise environments (Molinari et al., 3 Dec 2025).

1. System Structure and Agent Roles

The RP-ReAct framework consists of two primary agentic components:

Reasoner-Planner Agent (RPA): Responsible for all high-level planning, decomposition of the user goal $G$ into sub-steps, and ongoing strategic reasoning driven by a large reasoning model (LRM).
Proxy-Execution Agent (PEA): Executes sub-steps handed off from the RPA by interacting with external tools and APIs through an iterative ReAct loop. PEAs employ a context-saving mechanism to strictly bound LLM context consumption by offloading large tool outputs to external storage, providing only token-limited previews in LLM memory.

Interaction proceeds in a dialogue-like handshake: at each step $t$ , RPA emits a sub-question $s_t$ to the PEA, which returns a (possibly truncated) result $r_t$ . This exchange is repeated—allowing for replanning, correction, and context refresh—until the goal is achieved or the step budget is exhausted. The architecture supports both single and multi-agent deployments, though the canonical instantiation focuses on a single RPA supervising one or more PEAs [(Molinari et al., 3 Dec 2025), § Methodology Fig. 1].

2. Formal Process and Control Flow

The formal operational process is as follows:

RPA Planning: At step $t$ with history $H_t = \{(s_i, r_i)\}_{i<t}$ , the RPA emits $s_t = \pi_R(H_t, G)$ . Upon receiving $r_t$ , RPA applies a success test $\delta(r_t) \in \{0,1\}$ . If $\delta(r_t) = 0$ , RPA triggers a replanning procedure, computing the next sub-question $t$ 0 as $t$ 1, where $t$ 2 estimates task progress utility.
PEA ReAct Loop: For each sub-question, the PEA engages in a ReAct cycle up to $t$ 3 steps: iteratively “think” (generate internal reasoning), “act” (select tool/application), “execute” (run tool, obtain $t$ 4), apply context-saving if $t$ 5, and update conversational context. The result $t$ 6 is either a truncated tool output or a reference to externally stored data, minimizing context window overflow [(Molinari et al., 3 Dec 2025), Algorithmic Pseudocode §3].

This division means RPA’s context remains concise and focused strictly on sub-questions and results, while execution complexity and noisy tool outputs are abstracted away in the PEA.

3. Context-Saving Mechanism

A critical innovation is the context-saving strategy within the PEA. For tool output $t$ 7 exceeding threshold $t$ 8 tokens, the PEA stores the full output externally (database, file, or object store) and returns only a preview $t$ 9 along with a retrieval variable $s_t$ 0 to the LLM context. Formally:

Context cost without saving: $s_t$ 1
With context-saving: $s_t$ 2

This mechanism yields linear growth in context cost with $s_t$ 3 (number of tool calls $s_t$ 4 preview tokens) rather than $s_t$ 5. Empirical savings $s_t$ 6 are substantial for large table/text outputs, ensuring operational viability for agents employing narrow-window open-weight LLMs [(Molinari et al., 3 Dec 2025), §2.3, §4.2].

4. Dynamic Replanning and Trajectory Stability

RP-ReAct employs continuous dynamic replanning: after each execution result, the binary success function $s_t$ 7 ensures the RPA can adjust strategy or re-issue sub-steps as needed. Metrics for empirical stability are defined using trajectory accuracy $s_t$ 8 for model $s_t$ 9 and agent $r_t$ 0:

$r_t$ 1
$r_t$ 2
$r_t$ 3
Coverage Product Score: $r_t$ 4

Empirical studies report lower trajectory standard deviation and higher $r_t$ 5 versus baselines, particularly on hard ToolQA tasks, supporting robustness across LLM and tool variations [(Molinari et al., 3 Dec 2025), Table 3].

5. Architectural Design Decisions

The RP-ReAct architecture results from several explicit decisions:

Full Decoupling: Separation of high-level planning (reasoning, intent) from low-level execution (tool interaction, API error handling). The RPA is insulated from LLM context pollution due to noisy, verbose, or malformed tool responses (cf. “context-drift”).
Budgeted Steps: RPA and PEA each use a capped number of planning/execution steps (e.g., $r_t$ 6), guaranteeing upper bounds on latency and preventing unbounded loop execution [(Molinari et al., 3 Dec 2025), Experimental Setup].
External Storage Abstraction: Supports tool APIs returning arbitrarily large data objects, with only pointer variables ever carried in LLM memory.
Multi-Agent Compatibility: Architecture is readily extensible to multiple PEAs acting under a shared RPA—this suggests applicability to scenarios with high tool diversity or parallelizable subtasks.

These design elements collectively yield a system that is modular, robust to model scaling, and empirically generalizable across diverse task domains.

6. Empirical Evaluation and Application Domains

RP-ReAct was evaluated on the multi-domain ToolQA benchmark, using six open-weight reasoning models. Results demonstrate:

Superior overall and task-specific performance versus various monolithic and tool-integrated baselines.
Improved generalization to unseen domains, attributed to context isolation and replanning.
Robustness and stability across LLM scales—lower variance in task trajectories.

The agentic paradigm is directly applicable in enterprise environments characterized by strict privacy requirements (local LLMs), heterogeneous toolchains (DB, spreadsheet, API, Python), and frequent large-output scenarios. The architecture supports deployments requiring strong modularity and compositional reasoning over complex workflows [(Molinari et al., 3 Dec 2025), § Results].

7. Summary Table of RP-ReAct Key Components

Component	Function	Notable Feature
Reasoner-Planner	High-level planning, step-wise	Maintains clean subgoal context
Proxy-Execution	Sub-step execution, ReAct loop	Context-saving, external storage
Dynamic Replanning	Trajectory correction after feedback	Ensures goal-achievement robustness
Context Management	Manages tool output in LLM memory	Enables narrow-window deployment

References

"Reason-Plan-ReAct: A Reasoner-Planner Supervising a ReAct Executor for Complex Enterprise Tasks" (Molinari et al., 3 Dec 2025)

Markdown Report Issue Upgrade to Chat

References (1)

Reason-Plan-ReAct: A Reasoner-Planner Supervising a ReAct Executor for Complex Enterprise Tasks (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RP-ReAct Architecture.