Neural Cooperative Reach-While-Avoid Certificates

Updated 5 February 2026

The paper introduces a neural framework that certifies coordinated multi-agent reach-while-avoid performance with decentralized, dynamic-localized vector Lyapunov and barrier functions.
It employs deep neural networks for joint synthesis and verification of control policies, ensuring safety and scalability in systems with varying agent counts.
Experimental results demonstrate high safety rates (up to 99.5%), efficient certification, and reliable generalization through structural reuse and probabilistic guarantees.

Neural Cooperative Reach-While-Avoid Certificates are neural network-based constructs that certify the ability of multiple interacting agents to reach designated goal regions while strictly avoiding unsafe sets, under complex decentralized dynamics and potentially large-scale coupling. These certificates generalize classical barrier and Lyapunov function approaches, enabling formal verification and synthesis of policies for multi-agent coordinated behaviors at scale. By embedding control-theoretic safety and reachability conditions into neural architectures, they address both tractability and robustness in distributed, data-driven control environments.

1. Formal Multi-Agent Reach-While-Avoid Problem

The cooperative reach-while-avoid (RWA) specification in interconnected systems entails that, for $q$ agents indexed by $i \in \mathcal N = \{1, \dots, q\}$ , each agent state $x_i \in \mathbb X_i \subseteq \mathbb R^n$ satisfies two requirements:

Safety: For all $t \geq 0$ , $x_i(t) \notin \mathbb X_{i,U}$ (the unsafe set).
Liveness: There exists $T_{i,G}$ such that $x_i(t) \in \mathbb X_{i,G}$ (the goal set) for all $t \geq T_{i,G}$ .

Communication and subsystem interaction are captured by dynamic, state-dependent neighborhood sets $\mathcal N_i(\bm x)$ , leading to extended local state representations $\bar x_i = (x_i, \{x_j\}_{j \in \mathcal N_i(\bm x)})$ . Policies are fully decentralized: $u_i = \pi_i(\bar x_i)$ with agent-wise dynamics $\dot x_i = f_i(\bar x_i) + g_i(\bar x_i) u_i$ (Zhou et al., 28 Jan 2026). Consequently, the global RWA property becomes a jointly distributed, compositional constraint over the product space $\mathbb R^{q\,n}$ , demanding scalable methods for certificate construction and verification.

2. Dynamic-Localized Vector Control Lyapunov and Barrier Functions

To enable scalable certification of cooperative behavior, neural cooperative RWA frameworks utilize dynamic-localized vector control Lyapunov functions (DL-VCLFs) and barrier functions (DL-VCBFs):

DL-VCLF: Vector of local Lyapunov candidates $V(\bm x) = (V_1(x_1), ..., V_q(x_q))$ $V (x) = (V_{1} (x_{1}), ..., V_{q} (x_{q}))$ with each $V_i$ $V_{i}$ depending on $x_i$ $x_{i}$ and its neighbors.
- Decentralized Lyapunov condition: For each $i$ ,
$\inf_{u_i \in \mathbb U_i} \Bigl\{L_{f_i} V_i(x_i) + L_{g_i} V_i(x_i)\, \pi_i(\bar x_i) \Bigr\} \leq W_i(\bm x)^\top V(\bm x),$

where $W_i(\bm x)$ encodes interaction via a Metzler matrix (Zhou et al., 28 Jan 2026).
DL-VCBF: Barrier candidates $h_i(\bar x_i)$ $h_{i} (\overset{x}{ˉ}_{i})$ define safe sets $\mathcal C_i = \{ \bar x_i | h_i(\bar x_i) \geq 0 \}$ for each agent.
- Decentralized barrier condition:
$\sup_{u_i \in \mathbb U_i} \Bigl\{ L_{f_i} h_i(\bar x_i) + L_{g_i} h_i(\bar x_i)\, \pi_i(\bar x_i) \Bigr\} \geq \Gamma_i(\bm x)^\top h(\bm x),$

with suitable Metzler coupling $\Gamma_i(\bm x)$ (Zhou et al., 28 Jan 2026).

These decentralized, vectorized forms encode local certifications with coupling through neighborhood graphs. Pairwise barrier conditions and the associated invariance properties generalize classical Nagumo-type arguments to scalable, sparse graphs.

3. Neural Joint Synthesis and Verification

Certificates and policies are realized as deep neural networks (DNNs), parameterized for each agent:

$\pi_i = \pi_{\theta_i}(\bar x_i)$ , $V_i = V_{\phi_i}(\bar x_i)$ , $h_i = h_{\psi_i}(\bar x_i)$ .

The joint synthesis problem seeks to minimize deviation from a nominal control policy (from RL/imitation) while satisfying Lyapunov and barrier inequalities. Soft constraints are implemented as ReLU-hinge loss terms for feasibility, e.g.:

$\mathcal L = \sigma_1 L_{\rm ctrl} + \sigma_2 L_{\rm DL\text{-}VCLF} + \sigma_3 L_{\rm DL\text{-}VCBF}$

(Zhou et al., 28 Jan 2026). Training incorporates stochastic gradient descent, with counterexample-guided refinement from off-the-shelf verifiers (Marabou, $\alpha-\beta$ -crown), which generate trajectories violating certificate conditions to improve generalization (Zhou et al., 28 Jan 2026, Rickard et al., 8 Feb 2025).

Discretization and model error are addressed by learning neural surrogates $\tilde f_i, \tilde g_i$ and bounding finite-grid error, guaranteeing that discrete-time inequalities (with explicit error bounds) preserve certificate correctness.

4. Permutation- and Cardinality-Invariant Neural Architectures

Neighborhood-dependent state representations necessitate neural architectures invariant to permutation and cardinality:

Encoding: Inspired by PointNet, neighbor states $o_i \in \mathbb R^{n \times |\mathcal N_i|}$ are embedded via

$\rho(o_i) = \mathrm{RowMax}(\sigma(W o_i))$

with $W \in \mathbb R^{p \times n}$ and $\sigma=$ ReLU activation (Qin et al., 2021).

This encoding ensures that $\rho(o_i)$ is invariant under neighbor swapping and adapts to dynamically changing neighborhood sizes. Final computation of $\pi_i$ and $h_i$ involves an MLP over $[s_i; \rho(o_i)]$ , maintaining full decentralization and scalability to thousands of agents.

5. Structural Reuse and Scalability Mechanisms

To avoid prohibitive retraining costs as network size grows, certificates and controllers are transferable between substructure-isomorphic systems:

A subsystem $\widetilde{\mathcal I}$ isomorphic to a larger $\mathcal I$ via injective mapping $\tau$ can reuse certificate networks as

$\widetilde \pi_j = \pi_{\tau(j)}, \quad \widetilde V_j = V_{\tau(j)}, \quad \widetilde h_j = h_{\tau(j)}$

(Zhou et al., 28 Jan 2026). Theoretical guarantees assert that such transfer maintains formal RWA certification.

This approach enables near-constant cost for expanding the system size, validated by experiments showing that certificate reuse scales to $q=300$ agent vehicle platoons with no increase in verification time (RedVer strategy).

6. Generalization Guarantees and Probabilistic Bounds

Generalization is quantified via Rademacher complexity (Qin et al., 2021) and scenario-compression methods (Rickard et al., 8 Feb 2025):

Rademacher bound: For empirical zero-loss on $z_i$ trajectories and margin $\gamma$ , the violation probability $\epsilon_i$ for agent $i$ admits an explicit bound in terms of function class complexity and sample size, holding for all $N$ agents.
Compression set PAC bounds: For neural certificates trained on $N$ trajectories, the existence of a compression set $C_N$ (algorithmically constructed) yields a bound on violation probability that depends only on $|C_N| \ll N$ , enhancing scalability and reducing conservatism (Rickard et al., 8 Feb 2025).

7. Experimental Results and Practical Performance

Key benchmarks and results include:

Task	Agent Count	Safety Rate	Notable Findings
2D ground robots (e.g., Predator-Prey, Navigation)	8–1024	99–99.5%	Trained on 8, generalizes to 1024 agents with no loss of safety.
3D quadrotor swarms	32	>99%	Maintains safety, outperforming model-based baselines.
Multi-robot formations	4	Full	Reaches goal formations, avoids obstacles, margin >0.3 m.
Vehicle platoons	up to 300	Full	RedVer approach achieves constant verification time.

Test-time policy refinement further increases safety by 1–2% via gradient-based adjustment of control inputs when neural outputs violate certificates (Qin et al., 2021). Empirical comparisons consistently indicate superior safety and control rewards over non-cooperative baselines, especially in scalability and adaptation to varying agent numbers.

Neural cooperative RWA certificates extend and unify prior work on neural Lyapunov, barrier, and supermartingale certificates (Jin et al., 2020, Žikelić et al., 2022):

Classical safe control policies were obtained by jointly learning barrier and Lyapunov-like neural networks satisfying sampling-based relaxations of control-theoretic guarantees (Jin et al., 2020).
Stochastic reach-avoid problems further generalize to neural reach-avoid supermartingale (RASM) representations, with formal tail bounds and sample-based learner–verifier loops (Žikelić et al., 2022).
The shift to cooperative, decentralized, dynamically localized certificates distinguishes current frameworks by accommodating intertwined agent objectives, sparse coupling, and structural reuse for scalability (Zhou et al., 28 Jan 2026, Qin et al., 2021).

A plausible implication is that further integration of these paradigms promises scalable, distributed safe control for large heterogeneous collectives in uncertain environments.