Dual LQR Formulation

Updated 4 February 2026

Dual LQR formulation is a control technique that uses convex duality and Riccati equations to reformulate classical LQR and LQG problems.
It enables scalable computational strategies and global convergence through primal–dual and policy-gradient methods in constrained and multi-agent settings.
This framework extends to observer design, optimal transport, and robotics, providing deeper geometric insights and effective dynamic programming alternatives.

A dual Linear-Quadratic Regulator (dual LQR) formulation refers to the family of approaches that leverage convex duality, Lagrangian functionals, or dual variable descriptions to reformulate, analyze, or solve the classical or extended LQR and LQG problems. Duality-based perspectives emerge in cost-constrained control, mean-field and multi-agent systems, conic representations, optimal transport, estimation, constrained dynamics, and in the connection to Riccati equations via reproducing kernel theory. Dual LQR formulations allow for new computational strategies, global convergence guarantees, deeper geometric interpretations, and facilitate generalizations to structured, constrained, or multi-agent settings.

1. Classical Duality in LQR and Covariance Representations

Traditionally, the LQR problem seeks a linear state-feedback policy $u_t = Kx_t$ minimizing a quadratic cost subject to linear dynamics. The primal problem admits a Riccati equation characterization. In the dual perspective, introduced in the context of infinite-dimensional semidefinite programs and covariance representations, the optimal control is equivalently characterized via linear-conic duality: the state–input covariance matrix is constrained to satisfy state-propagation and positive semidefiniteness, while the cost functional becomes linear in these covariances. The Lagrange multipliers, or dual variables, correspond to matrix-valued multipliers that can be interpreted as solutions to a dual Riccati ODE or as the maximal solution of a dual Linear Matrix Inequality (LMI):

$\max_{P(\cdot)}\;x_0^T\,P_{xx}(0)\,x_0$

subject to

$\begin{pmatrix} Q+\dot P_{xx}+A^T P_{xx}+P_{xx}A & P_{xx}B \ B^T P_{xx} & R \end{pmatrix} \succeq0,\; \forall t\in[0,T],\; P_{xx}(T)=Q_f$

The pointwise maximal solution is the Riccati equation, and complementary slackness recovers the classical state-feedback law $u^* = -R^{-1}B^TP_{xx}x$ (Bamieh, 2024).

2. Dual LQR in Cost-Constrained, Mean-Field, and Primal–Dual Policy Learning

Duality is central in constrained LQR settings, where the primal problem minimizes a nominal LQR cost subject to quadratic cost constraints:

$\min_{K:\,\rho(A-BK)<1}\;J_0(K) \quad\text{s.t.}\quad J_i(K)\le c_i,\;i=1,\dots,N$

The corresponding Lagrangian introduces multipliers $\lambda_i\ge0$ , yielding a dual function parameterized by these variables. The policy minimizing the Lagrangian for fixed multipliers is recovered via the Riccati equation with aggregated costs. The dual problem is a concave maximization in $\lambda$ , with strong duality provided Slater's condition holds (existence of an interior feasible $K$ ):

$\max_{\lambda\ge0} D(\lambda) = J_\lambda\left(K^*_\lambda\right) - \sum_i\lambda_ic_i$

Here, the closed-form dual gradient $\nabla D(\lambda) = [J_1(K^*_\lambda)-c_1,\dots,J_N(K^*_\lambda)-c_N]^T$ enables global convergence guarantees for policy-gradient primal–dual methods, even in nonconvex cases (Zhao et al., 2024).

In mean-field stochastic LQR (MF-SLQR), duality is exploited by expressing the infinite-horizon problem as a static SDP involving covariance matrices, with the dual variable satisfying a generalized Lyapunov equation. The dual and primal recursions coincide with classical value and policy iteration, and a partially model-free (sample-based) dual update enables convergence to the optimal policy without persistent excitation (Jiang et al., 9 Dec 2025).

3. Dual LQR in Estimation, Observer Design, and Multi-Agent Systems

LQR–observer duality is foundational: the dual LQR problem (minimum-energy estimation) poses a deterministic state estimation problem with quadratic costs on process disturbances and measurement noise. The dual ARE is

$\max_{P(\cdot)}\;x_0^T\,P_{xx}(0)\,x_0$ 0

resulting in the minimum-energy observer gain $\max_{P(\cdot)}\;x_0^T\,P_{xx}(0)\,x_0$ 1. This dual structure generalizes to distributed/consensus observer design for networked multi-agent LTI systems, where dual LQR formulations guide scalable Riccati-based LMI relaxations for optimal distributed estimation (Vlahakis et al., 2020).

In partially-nested, decentralized LQG/LQR settings, the dual LQR viewpoint governs the separation principle: optimal control and estimation reduce to coupled forward (estimation) and backward (control) Riccati-type recursions involving dual information states—conditional means for each agent—which generalize the single-trajectory duality (Lessard et al., 2013).

4. Dual LQR in Schrödinger Bridge, Optimal Transport, and Kernel Representations

In the LQR-Schrödinger bridge, the entropy-regularized quadratic-cost Markov process interpolation problem is dualized via Lagrange multipliers enforcing boundary marginal constraints. The Kantorovich-type boundary potentials, acting as dual variables, propagate through coupled forward and backward Riccati equations (on the mean and covariance), yielding a non-homogeneous Gaussian Markov chain as the optimal interpolant:

Backward pass: discrete Riccati equation on precision matrices.
Forward pass: dual Riccati on covariance (inverse precision).

This dual LQR system encodes the Schrödinger–Kantorovich potentials and extends the geometric theory of optimal transport, providing a unified duality-driven perspective on entropic interpolation between Gaussian marginals (Lambert, 12 Jun 2025).

In reproducing kernel Hilbert space (RKHS) theory, the diagonal of the LQ reproducing kernel corresponds to the solution of the dual Riccati ODE (inverse of the value function matrix). This links dual LQR analysis to regression in kernel space, highlighting the global “control as regression” viewpoint and offering new connections to learning-theoretic approaches (Aubin-Frankowski, 2020).

5. Dual LQR in Constrained Dynamics and Robotics

Dual LQR methods are instrumental in solving constrained rigid-body dynamics, particularly for acceleration-level constraints interpreted via Gauss's principle of least constraint. The constrained dynamics problem naturally forms a KKT system, which is precisely the first-order stationarity condition of a two-stage quadratic cost (LQR) problem with auxiliary controls as constraint forces. Eliminating primal variables yields the dual Hessian $\max_{P(\cdot)}\;x_0^T\,P_{xx}(0)\,x_0$ 2, the inverse of which is the operational space inertia matrix (OSIM):

The dual Riccati (constraint-space) perspective leads to dynamic programming eliminations that enable scalable algorithms ( $\max_{P(\cdot)}\;x_0^T\,P_{xx}(0)\,x_0$ 3) for robotic simulation, providing alternatives to classic Featherstone LTL solvers and clarifying the impact of local structure on inversion complexity (Sathya et al., 2023).

6. Computational Methods, Primal–Dual Algorithms, and Interpretive Structures

A unifying theme among dual LQR formulations is the emergence of convex or conic programs (SDPs), dual variables as Riccati or Lyapunov solutions, and the possibility of primal–dual or policy-gradient algorithms:

Cost-constrained and mean-field LQR admit primal–dual iterations, where policy evaluation (dual update) and policy improvement (primal update) are carried out via Riccati recursions or Lyapunov equations (Zhao et al., 2024, Jiang et al., 9 Dec 2025).
In Q-learning for LQR, the dual variables parameterize the Q-function constraints, leading to SDP formulations where Lagrangian stationarity yields the Bellman equations and the optimal gain expression $\max_{P(\cdot)}\;x_0^T\,P_{xx}(0)\,x_0$ 4 (Lee et al., 2018).

Dual LQR approaches also elucidate geometric structures, such as the Pareto-multiplier set being the normal cone to the constrained cost surface, and enable efficient distributed or sample-based computation even in high-dimensional or model-free settings.

In summary, the dual LQR formulation encompasses a broad class of convex-analytic, Riccati, or conic-duality-based techniques for analyzing, extending, and computing optimal policies in LQR, LQG, and their constrained or distributed generalizations, integrating methods from optimal control, optimization, estimation, and kernel theory (Zhao et al., 2024, Bamieh, 2024, Jiang et al., 9 Dec 2025, Lambert, 12 Jun 2025, Sathya et al., 2023, Aubin-Frankowski, 2020, Lee et al., 2018, Vlahakis et al., 2020, Lessard et al., 2013).