Safe Bilevel Delegation (SBD): A Formal Framework for Runtime Delegation Safety in Multi-Agent Systems

Published 30 Apr 2026 in cs.AI | (2604.27358v1)

Abstract: As LLM agents are deployed in high-stakes environments, the question of how safely to delegate subtasks to specialized sub-agents becomes critical. Existing work addresses multi-agent architecture selection at design time or provides broad empirical guidelines, but neither provides a runtime mechanism that dynamically adjusts the safety-efficiency trade-off as task context changes during execution. We propose Safe Bilevel Delegation (SBD), a formal framework for runtime delegation safety in hierarchical multi-agent systems. SBD formulates task delegation as a bilevel optimization problem: an outer meta-weight network phi learns context-dependent safety-efficiency weights lambda(s) in [0,1]; an inner loop optimizes the delegation policy pi subject to a probabilistic safety constraint P(safe) >= 1-delta. The continuous delegation degree alpha in [0, 1] controls how much decision authority is transferred to each sub-agent, interpolating smoothly between full human override (alpha=0) and fully autonomous execution (alpha=1). We establish three theoretical results: (1) Safety Monotonicity--higher outer safety weight produces a weakly safer inner policy; (2) Inner Policy Convergence--projected gradient descent on the inner problem converges linearly under standard smoothness assumptions; (3) an Accountability Propagation bound that distributes responsibility across multi-hop delegation chains with a provable per-agent ceiling. We instantiate SBD in three high-stakes domains--medical AI (MIMIC-III), financial risk control (S and P 500), and educational agent supervision (ASSISTments)--specifying datasets, safety constraint sets, baselines, and evaluation protocols. This manuscript presents the formal framework and theoretical results in full; empirical validation following the protocols described herein is planned and will be reported in a forthcoming revision.

Abstract PDF Upgrade to Chat

Authors (1)

Yuan Sun

Summary

The paper introduces SBD, a formal bilevel optimization framework that dynamically adjusts delegation authority to meet context-specific safety constraints.
It proves safety monotonicity, convergence properties, and accountability propagation, ensuring robust performance across multi-agent delegation chains.
The framework is validated for high-stakes applications in healthcare, finance, and education, offering actionable runtime safety assurance.

Safe Bilevel Delegation: A Formal Framework for Runtime Delegation Safety in Multi-Agent Systems

Motivation and Contribution

The increasing deployment of LLM-based multi-agent systems in high-stakes scenarios such as healthcare, finance, and education has amplified the need for robust and adaptable delegation strategies. The critical challenge addressed in "Safe Bilevel Delegation (SBD): A Formal Framework for Runtime Delegation Safety in Multi-Agent Systems" (2604.27358) is the absence of a principled mechanism that determines, at runtime, the appropriate amount of autonomy to grant to sub-agents in the face of fluctuating task risk. Existing safety approaches in multi-agent systems predominantly target static, design-time controls and do not provide dynamic adaptation or formal safety guarantees that respond to the real-time context.

This work introduces Safe Bilevel Delegation (SBD), a runtime, bilevel optimization framework that allows continuous and context-dependent adjustment of delegation authority, with provable safety properties. SBD’s formulation generalizes prior static models by introducing learnable, state-conditioned trade-offs between safety and efficiency, formalized through a meta-weight network that modulates a continuous delegation degree $\alpha \in [0,1]$ .

Formal Framework

Problem Setting

SBD considers a hierarchical multi-agent paradigm where a principal agent delegates subtasks to specialized sub-agents, each with varying degrees of authority. The delegation decision at each timestep is characterized by:

Selection of a sub-agent,
Assignment of a continuous delegation degree $\alpha$ (from full human override to full autonomy).

A delegation is declared "safe" if the tuple $(s, a_i, \alpha)$ resides in a domain-specific safety constraint set $C$ . Safeguards must satisfy probabilistic safety constraints $safe(\pi, s) \geq 1 - \delta$ across all relevant states $s$ , allowing for risk tolerance $\delta$ suited to practical, stochastic domains.

Bilevel Optimization

SBD is formulated as a strictly bilevel problem:

The outer loop meta-learns a context-conditioned safety weight $\lambda_\phi(s)$ through a neural network $\phi$ , modulating the safety-efficiency objective at runtime.
The inner loop optimizes the actual delegation policy $\pi$ , minimizing a weighted combination of safety and efficiency losses, subject to the aforementioned probabilistic safety constraints.

This structure enables SBD to adapt safety strictness responsively to context—raising safety emphasis dynamically in high-risk conditions and relaxing it in low-risk settings. The delegation degree $\alpha$ 0 acts as a soft regulator of autonomy, interpolating between human-controlled and autonomous execution.

Theoretical Guarantees

SBD's theoretical contributions are threefold:

Safety Monotonicity Theorem: For any two meta-weight functions $\alpha$ 1 with $\alpha$ 2 for all $\alpha$ 3, the safety of the resulting inner policies is monotone:

$\alpha$ 4

This result formalizes that contextually increasing safety weighting in the outer loop cannot worsen (and typically improves) safety in the resulting delegation decisions.

Inner Policy Convergence Theorem: Under standard assumptions (smoothness, strong convexity, convex feasible set), projected gradient descent on the inner problem converges at a linear rate, with iteration complexity governed by the condition number $\alpha$ 5.
Accountability Propagation Proposition: In multi-hop delegation chains, the per-agent accountability weight is formally defined and shown to have a provable upper bound—ensuring that, regardless of delegation depth and autonomy degrees, no single agent ever bears complete responsibility, which diffuses as chains lengthen and delegation degrees approach one.

Algorithmic Instantiation

The SBD algorithm proceeds as follows:

The outer loop updates $\alpha$ 6 via meta-gradient steps, backpropagating through the inner loop by implicit differentiation.
The inner loop performs projected gradient updates to optimize $\alpha$ 7 for fixed safety-efficiency weights dictated by the current $\alpha$ 8.
At each step, delegation degrees are projected into the safety constraint set $\alpha$ 9 (often a simple clipping operation), preserving compliance with probabilistic safety requirements.

This paradigm is computationally efficient and compatible with standard deep learning infrastructure, with per-iteration cost on par with modern meta-learning and hyperparameter optimization frameworks.

Planned Empirical Protocol

While this manuscript version does not report empirical results, it provides a rigorously pre-registered evaluation protocol across three domains:

Medical AI (MIMIC-III): Safe delegation of clinical recommendations based on patient acuity and specialist expertise.
Financial Risk Control (S&P 500 constituents): Safe portfolio allocation leveraging sub-agent strategies conditioned on volatility regimes.
Educational Agent Supervision (ASSISTments): Context-sensitive delegation of content delivery, protecting at-risk students.

Each protocol specifies domain-specific safety constraints, baselines, falsifiable hypotheses, and ablation studies. The metrics focus on Safety Rate, Task Efficiency, the Safety–Efficiency Area under the Pareto curve, and Accountability Entropy.

Implications

Practical Implications

SBD provides a principled, closed-loop mechanism for runtime safety assurance in multi-agent systems, mitigating modes of failure that may arise under static delegation schemas, especially in dynamically changing environments. The continuous delegation degree $(s, a_i, \alpha)$ 0 serves as a flexible interface for human-in-the-loop oversight, and the accountability propagation result offers a formal tool for post-hoc audit and attribution—important for regulatory and governance applications.

Theoretical Implications

The primary theoretical innovation is the formal linkage between outer-loop meta-weight adaptation and inner-loop policy safety, rendering the resulting multi-agent system robust against contextually fluctuating risk with a mathematically provable monotonicity guarantee. Furthermore, the bilevel formulation opens new connections with constrained MDPs and advances safe RL by enabling context-adaptive risk tuning without repeated retraining or manual intervention.

The accountability results also furnish the theoretical foundation for scalable responsibility allocation in extended delegation chains—relevant for AI liability, auditing, and explainability in complex, agentic workflows.

Future Research Directions

SBD's formalism invites several research extensions:

Recursive, compositional application of SBD across multi-level agent hierarchies;
Integration with Skill-compilation frameworks to unify delegation and artifact optimization;
Augmentation with adversarial skill/command detection for security robustness;
Online bilevel adaptation for non-stationary environments;
Extension and empirical evaluation in settings involving unreliable safety verifiers or partial observability.

Conclusion

Safe Bilevel Delegation (SBD) advances the state of the art in runtime safety for hierarchical multi-agent systems by providing a formal, bilevel optimization framework with provable safety monotonicity, efficient inner-loop policy convergence, and well-defined accountability propagation in delegation chains. Its context-sensitive, continuous delegation paradigm fundamentally enhances the deployability and governance of LLM-driven multi-agent platforms in risk-sensitive domains, and it establishes a structured foundation for the operationalization of AI safety, responsibility, and oversight. Empirical validation across medical, financial, and educational tasks is planned to substantiate the theoretical guarantees outlined here.