RCBSF: A Multi-Agent Framework for Automated Contract Revision via Stackelberg Game

Published 12 Apr 2026 in cs.CL | (2604.10740v1)

Abstract: Despite the widespread adoption of LLMs in Legal AI, their utility for automated contract revision remains impeded by hallucinated safety and a lack of rigorous behavioral constraints. To address these limitations, we propose the Risk-Constrained Bilevel Stackelberg Framework (RCBSF), which formulates revision as a non-cooperative Stackelberg game. RCBSF establishes a hierarchical Leader Follower structure where a Global Prescriptive Agent (GPA) imposes risk budgets upon a follower system constituted by a Constrained Revision Agent (CRA) and a Local Verification Agent (LVA) to iteratively optimize output. We provide theoretical guarantees that this bilevel formulation converges to an equilibrium yielding strictly superior utility over unguided configurations. Empirical validation on a unified benchmark demonstrates that RCBSF achieves state-of-the-art performance, surpassing iterative baselines with an average Risk Resolution Rate (RRR) of 84.21\% while enhancing token efficiency. Our code is available at https://github.com/xjiacs/RCBSF .

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper demonstrates that modeling contract revision as a Stackelberg game leads to stronger risk mitigation and higher semantic fidelity.
It presents a multi-agent leader-follower paradigm where a Global Prescriptive Agent sets risk constraints, guiding iterative contract updates.
Empirical results show state-of-the-art performance with improved risk resolution rates (84.21%) and token efficiency over traditional methods.

RCBSF: Formalizing Automated Contract Revision as a Bilevel Multi-Agent Stackelberg Game

Motivation and Problem Formulation

The limitations of vanilla LLMs in automated legal contract revision are well-documented, particularly regarding unbounded hallucinations, lack of explicit behavioral constraints, and insufficient robustness for high-stakes regulatory environments. The paper "RCBSF: A Multi-Agent Framework for Automated Contract Revision via Stackelberg Game" (2604.10740) directly addresses these deficiencies by adopting a non-cooperative game-theoretic approach. The framework leverages hierarchical bilevel optimization, operationalized through a Stackelberg game instantiation, to enforce precise, risk-aware, and resource-efficient contract generation.

The essential innovation is the imposition of a multi-agent Leader-Follower architecture. The leader, a Global Prescriptive Agent (GPA), generates a fine-grained risk budget using a structured five-dimensional taxonomy (Category, Location, Evidence, Issue, Suggestion), communicated as a constraint vector. The follower system, consisting of a Constrained Revision Agent (CRA) and a Local Verification Agent (LVA), is strictly regulated by these constraints and iteratively optimizes contract drafts under adversarial scrutiny. This paradigm stands in sharp contrast to prior iterative or cooperative multi-agent frameworks, presenting strictly stronger guarantees for risk minimization and semantic fidelity.

Figure 1: Comparison of legal contract generation workflows between the Baseline (Standard LLM) and the Risk-Constrained Bilevel Stackelberg Framework (RCBSF).

Figure 2: RCBSF models the revision process as a bilevel optimization game, employing a GPA as leader and Revision/Auditor agents as followers with multi-round iterative interaction for dynamic risk minimization.

Stackelberg Bilevel Optimization in RCBSF

Game Structure and Theoretical Guarantees

RCBSF reframes contract revision as a hierarchical Stackelberg game $\mathcal{G} = \langle \mathcal{N}, \mathcal{S}, \mathcal{A}, \mathcal{J} \rangle$ , embedding textual semantics in a continuous high-dimensional manifold. The leader (GPA) maximizes a risk-mitigation utility subject to strict computational and token budget constraints; the follower system (CRA+LVA) maximizes the likelihood of revised text under constraints induced by the leader, penalized by divergence from target risk distributions.

Critically, the framework enforces convergence to a Stackelberg equilibrium, which strictly dominates Nash-equilibrium-based baselines when considering non-convex risk manifolds. The paper provides formal proofs: the optimal Stackelberg strategy always dominates the unguided one, and the coupled revision-audit refinement provably converges to a stationary point minimizing residual risk under the RCBSF loss landscape.

Algorithmic Implementation

The RCBSF architecture implements the following iterative protocol:

The GPA projects the contract into a five-dimensional risk vector space and prioritizes risk instructions using Q-score-weighted softmax strategies.
The CRA generates candidate revisions to minimize joint residual risk, regularized by the GPA's risk vector and subject to hard token budgets.
The LVA evaluates all generated revisions, returning structured feedback (gradient signals) to enforce stricter convergence.
The interaction cycles until a stable equilibrium is reached.

Empirically optimized iteration depth (K) and softmax temperature ( $\tau$ ) regulate exploration-exploitation dynamics to achieve optimal performance-cost trade-offs.

Empirical Evaluation

Datasets and Baselines

A unified legal contract benchmark aggregating MAUD, CUAD, ContractNLI, and PrivacyQA is used to test the generality and robustness of the framework, covering 41 categories with diverse and challenging risk profiles. Baselines include Standard LLM prompting, Chain-of-Thought (CoT) refinement, Retrieval-Augmented Generation (RAG), and Iterative Self-Refinement, without hierarchical leader constraints.

Figure 3: The unified benchmark covers a long-tail distribution of 41 contract categories, ensuring robust evaluation of domain generalization.

Quantitative Results

RCBSF delivers state-of-the-art performance across all metrics:

Risk Resolution Rate (RRR): With the Qwen2.5-7B-Chat backbone, RCBSF achieves 84.21% average RRR—an absolute gain of 4.65% over the best iterative baseline.
Contract Quality (CQ): RCBSF sets new SOTA in clarity, rigor, balance, and professionalism metrics. Heatmap analysis shows uniform improvement across all categories and models.
Token Efficiency Score (TES): The framework attains 87.29 risks resolved per 1,000 tokens, outperforming baselines on resource efficiency—a direct validation of the utility-cost theories.
Figure 4: Diagonal block heatmap showing RCBSF's consistent superiority across fine-grained quality metrics and datasets.
Ablation demonstrates that removing 5-dimensional constraints induces sharp drops in RRR and Win Rate (by 11.06% and 85pp, respectively), establishing the non-redundancy of the constraint design. Eliminating budget penalties marginally increases RRR but sharply reduces token efficiency, verifying the cost control mechanism.

Sensitivity and Optimization

Optimal effectiveness is achieved at K=3 iterations, balancing diminishing returns in performance with escalating token costs.

Figure 5: Risk resolution rate and token cost as functions of $K$ ; optimal utility-cost at $K=3$ .

Softmax temperature analysis confirms a central peak at $\tau=1.0$ . Both conservative ( $\tau=0.5$ ) and high-entropy ( $\tau=2.0$ ) regimes underperform, indicating that moderate entropy best balances risk coverage and focus.

Figure 6: Temperature ( $\tau$ ) analysis shows optimal performance at $\tau=1.0$ , avoiding both under- and over-dispersion in constraint application.

Qualitative and Case-Based Analysis

The RCBSF framework demonstrates robust handling of complex multi-risk, multi-objective scenarios, as well as resilience against spurious revisions. It enables the explicit targeting and repair of high-stakes risks (e.g., unlimited liability, ambiguous ownership, open-ended termination) according to leader-generated instructions, and iterative convergence under adversarial audit. In contrast, standard and even iterative baselines frequently fail to actualize suggested clause modifications or introduce irrelevant textual changes.

Theoretical and Practical Implications

The work demonstrates that adversarial, hierarchical structuring—modeling the legal revision process as a Stackelberg-structured multi-agent game—enables both higher risk mitigation rates and strong semantic/language quality, while also enforcing token efficiency via hard constraints. The RCBSF paradigm provides a template for other high-risk generation domains requiring explicit constraint induction, robust auditing, and cost control.

The findings highlight the critical importance of granular, multi-dimensional constraint induction and of explicit iterative adversarial interactions in legal text generation, superseding the limitations of solely cooperative or additive-multi-agent setups. Furthermore, the formal convergence and optimality guarantees, empirically validated in-domain, provide a bridge between continuous optimization/game-theoretic theory and practical neural text generation.

Future Directions

Data/Scenario Complexity: While the current benchmarks span 41 domains, real-world contracts with highly entangled, cross-jurisdictional or deeply interdependent risks require additional attention. Extending RCBSF to such environments necessitates generalizing the risk manifold embeddings and introducing more sophisticated inter-agent communication protocols.
Jurisdictional Adaptation: Current models exhibit overfitting to US/UK legal logic; future versions should integrate jurisdiction-aware modules and legal ontology conditioning.
Scalability: Analyzing the scaling behavior as both contract/document length and risk granularity increase will aid in deploying RCBSF-class models in enterprise-grade contract platforms.
Cross-domain Transfer: The RCBSF principle is directly applicable to other critical domains (e.g., financial regulations, safety-critical engineering proposals) requiring explicit equilibrium-driven revision.

Conclusion

This work formally establishes a mathematically guaranteed, empirically validated framework—RCBSF—for automated legal contract revision, demonstrating substantial gains in risk resolution, text quality, and efficiency. By synthesizing game-theoretic Stackelberg optimality with multi-agent LLM interaction, it offers both a robust practical tool for LegalAI and a generalizable paradigm for risk- and constraint-intensive automated text generation.

Markdown Report Issue