Control Barrier-Value Functions

Updated 29 January 2026

Control Barrier-Value Functions (CBVFs) are functions that unify optimal control and reinforcement learning by embedding safety via forward invariance of safe set level-sets.
They can be constructed using analytical methods like viscosity solutions and data-driven techniques, enabling robust control in both continuous and discrete systems.
CBVFs offer scalable safety certificates integrated within MPC and QP frameworks, supporting practical applications in high-dimensional and uncertain dynamical systems.

Control Barrier-Value Functions (CBVFs) are a class of functions that unify value-function methodologies from optimal and reinforcement learning with the formal safety guarantees of control barrier functions (CBFs). CBVFs precisely characterize safety–performance trade-offs for deterministic and stochastic systems by embedding invariance—forward set invariance of a defined safe set—directly into a value or cost-to-go function. This synthesis supports the algorithmic construction, learning, and certification of safety guarantees for complex, possibly high-dimensional dynamical systems, including model-free and model-based, continuous and discrete problems.

1. Mathematical Formulation and Key Definitions

CBVFs generalize both control barrier functions and value functions, representing safety via level-sets of the value function. For a controlled system, either in discrete or continuous time, the state $x$ is constrained by a safe set typically specified as a 0-superlevel set of a continuous function $h(x)$ . The defining property of a CBVF is that its superlevel set encodes the largest subset of the state space from which the system can be rendered forward-invariant under acceptable controls.

For control-affine systems with disturbances, the CBVF $V: \mathbb{R}^n \to \mathbb{R}$ can be defined as the viscosity solution to a Hamilton-Jacobi-Isaacs variational inequality (HJIVI):

$\min\left\{ D_x V(x) \cdot f(x) + \max_{u \in U} \min_{d \in D}\! \left[D_x V(x) \cdot g(x) u + D_x V(x) \cdot w(x)d \right],\; V(x)-h(x) \right\} = 0,$

with the boundary condition $V(x) = h(x) = 0$ on the safety boundary (Choi et al., 2021). This formulation ensures that for all $x$ in the superlevel set $\{ V(x) \ge 0 \}$ , there exists a control strategy that maintains invariance despite worst-case disturbances.

In the context of reinforcement learning (RL), a CBVF is constructed by crafting a value function $V^*$ under a safety-preserving reward, with an appropriate threshold $R$ , such that the shifted value $h(x) = V^*(x) - R$ meets the discrete-time CBF conditions (Tan et al., 2023). This approach enables direct verification of RL policies for safety.

2. Relationships to Classic Control Barrier Functions and Value Functions

Traditional CBFs enforce a differential (or difference) inequality of the form

$\sup_{u \in U} [L_f h(x) + L_g h(x) u] \geq -\alpha(h(x))$

for a class- $\mathcal{K}$ function $\alpha$ , ensuring forward invariance of $\{ h(x) \ge 0 \}$ , but often require hand-tuned, differentiable functions and may be overly conservative or limited in expressiveness.

CBVFs unify these theories by demonstrating that value functions associated with optimal safety-constrained or anti-discounted performance objectives can naturally fulfill the role of a barrier certificate (Hirsch et al., 11 Oct 2025). In particular, CBVFs admit non-smooth, possibly non-differentiable “viscosity” barrier certificates that are solutions to HJ-based PDEs with anti-discounting, extending classical CBFs to a broader function space.

MPC-based (predictive) control frameworks further connect CBVFs and CBFs: the MPC value function, under suitable design, becomes a Predictive Control Barrier Function (PCBF), certifying the same forward invariance without the need for explicit Lie-derivative checks (Huang et al., 12 Feb 2025).

3. Construction and Algorithmic Realizations

CBVFs can be constructed both analytically and via data-driven learning:

Value Embedding and Bellman Equations: Embed hard safety penalties (e.g., infinite cost for $h(x) < 0$ ) into the running cost, yielding a value function whose superlevel set corresponds to the safe set (Cohen et al., 2020). This value function satisfies a modified HJB equation with a singular cost at the safety boundary.
Viscosity CBVFs and PDEs: Generalize to the nonsmooth regime by characterizing CBVFs as viscosity subsolutions of a HJ-PDE with a general anti-discounting rate. The paper “Viscosity CBFs” provides the precise PDE and shows closure properties under maximization and uniform limits (Hirsch et al., 11 Oct 2025).
Finite-Difference and RL-Driven Backups: For model-free or offline RL scenarios, a finite-difference backup recursion (or Bellman backup with value clipping/expectile regression) is employed. For example, the value-guided offline CBF (V-OCBF) is computed via regression over data, using a backup:

$B(x_t) = \min \left\{ \ell(x_t), \max_{u_t \in \mathcal{U}} B(x_{t+1}) \right\}$

and expectile regression to avoid out-of-distribution actions (Tayal et al., 11 Dec 2025).

Model Predictive PCBFs: In model-predictive control, enforce constraints and costs so that the resultant receding-horizon value $J_N(x)$ acts as a PCBF, whose level sets align with the safe set and whose decrease at each step certifies recursive feasibility (Huang et al., 12 Feb 2025).
QP-Based Online Control Synthesis: Given a CBVF (learned or computed), online controllers can be synthesized via quadratic programs (CBF-QPs) enforcing the barrier constraints in real-time, with robustness to bounded disturbances (affine-in-disturbance extensions) (Choi et al., 2021, Tayal et al., 11 Dec 2025).

4. Theoretical Properties and Safety Guarantees

CBVFs furnish strong invariance guarantees:

Forward Invariance: If the initial condition is within the superlevel set (the safe set), the corresponding controller synthesized from the CBVF keeps the state within this set for all future time, even under bounded uncertainty (Choi et al., 2021, Huang et al., 12 Feb 2025).
Robustness: Viscosity CBVFs admit guaranteed robustness to bounded disturbances through their HJIVI formulation. Their safe set matches the viability kernel recoverable by reachability methods, but can also be synthesized as a smooth online constraint, promoting computational tractability (Hirsch et al., 11 Oct 2025, Choi et al., 2021).
Closure Properties: The set of viscosity CBVFs is closed under pointwise maxima and uniform limits, facilitating compositional safety analysis (Hirsch et al., 11 Oct 2025).
Tractable Certificates: CBVFs can be constructed from RL value functions, finite-horizon optimal control problems, or via sampled-based methods, and their validity certified empirically via pointwise metrics (validity and coverage) (Tan et al., 2023).

Method	Barrier Definition	Online Synthesis	Robustness/Maximality
Classical CBF	$C^1$ $h(x)$ , pointwise derivative bound	QP; hand-crafted	Typically conservative
HJ Reachability	Value function solves HJ/PDE	Bang-bang, conservative	Maximal safe set
Control Barrier-Value Function	Value function as viscosity solution / RL	QP; RL, MPC, learning	Robust, less conservative
Predictive Control Barrier Function (PCBF)	MPC value $J_N(x)$ , receding horizon	QP/NLP, MPC	Smooth, scalable

CBVFs inherit the scalability and flexibility of both RL/value-function-based and CBF-based safe control, exceeding classical CBFs in expressiveness and tractability, and surpassing raw HJ-reachability in on-policy smoothness and applicability to high-dimensional learning contexts (Choi et al., 2021, Tan et al., 2023, Hirsch et al., 11 Oct 2025).

6. Practical Algorithms and Empirical Results

CBVFs support diverse algorithmic instantiations:

RL-driven Barrier Learning: Offline RL methods learn value functions on safety-reward MDPs and transform them into CBVFs, subsequently used for action filtering or as barrier certificates to be enforced via QP constraints (Tan et al., 2023).
Data-driven Offline Synthesis: V-OCBF techniques process offline datasets, using expectile regression to fit barrier functions that generalize well without requiring model knowledge and avoid out-of-distribution actions (Tayal et al., 11 Dec 2025).
Model-based MPC: PCBFs are constructed as the value functions of N-step MPC schemes, certifying safety and recursive feasibility through cost design and terminal set conditions (Huang et al., 12 Feb 2025).
Robust CBF-QPs: In systems with affine disturbances, CBVFs enable the computation of robust optimal controls by QP, always guaranteeing feasibility on the safe set (Choi et al., 2021).

Empirical evaluations demonstrate that CBVF-based controllers typically achieve larger safe sets, increased smoothness and regularity of control actions, and greater resilience to infeasibility than standalone CBF-QP or conservative reachability policies (Cohen et al., 2020, Huang et al., 12 Feb 2025, Tayal et al., 11 Dec 2025).

7. Extensions, Limitations, and Open Problems

CBVFs offer a unified mathematical and computational theory, but several open questions persist:

Computational Complexity: MPC and HJ-based CBVF synthesis can be computationally intensive, especially for high-dimensional systems, though offline learning mitigates some burdens (Huang et al., 12 Feb 2025, Tayal et al., 11 Dec 2025).
Continuity and Regularity: Ensuring that the value/barrier function is continuous (or possesses well-behaved gradients) is crucial for QP-based online implementations (Huang et al., 12 Feb 2025).
Model Mismatch and Uncertainty: Ongoing research targets CBVF robustness to model uncertainty, learning inaccuracies, and stochastic disturbances (Hirsch et al., 11 Oct 2025, Tayal et al., 11 Dec 2025).
Generalization Beyond Exponential Anti-discounting: The extension to arbitrary class- $\mathcal{K}$ anti-discounting rates in viscosity CBVFs expands the admissible candidate barrier class and the range of decaying/growing safety margins (Hirsch et al., 11 Oct 2025).
Compositionality and Large-scale Systems: The closure of CBVFs under maximization and limits enables compositional safety design, although tractable scalable methods for large networks remain active research questions.

A plausible implication is that, as data-driven and learning-based CBVFs become more practical, they may serve as the foundation for verifiable safety in general-purpose agents, both in policy learning and model-based planning.

Principal Sources: (Huang et al., 12 Feb 2025, Cohen et al., 2020, Choi et al., 2021, Hirsch et al., 11 Oct 2025, Tayal et al., 11 Dec 2025, Tan et al., 2023).