Learned Model Predictive Game (LMPG)
- LMPG is a framework that fuses model predictive control, game-theoretic reasoning, and machine learning for real-time multi-agent planning and control.
- It employs prediction-aware optimization with learned surrogates and differentiable game solvers to efficiently approximate Nash equilibria in dynamic settings.
- Empirical studies show LMPG achieves low-latency, robust performance in applications such as traffic routing, crowd navigation, and drone racing.
A Learned Model Predictive Game (LMPG) is a class of multi-agent planning, optimization, and control frameworks that fuse model predictive control (MPC) with game-theoretic reasoning and machine learning to enable strategic, real-time decision-making in dynamic, interactive environments. LMPG approaches arise in settings where multiple agents interact through coupled dynamics or costs, with each agent forecasting future states—often using data-driven predictors—and planning through forms of model-based or game-theoretic optimization that leverage these forecasts.
1. Mathematical and Algorithmic Foundations
An LMPG typically models each agent as solving a (possibly repeated or receding-horizon) coupled game. The mathematical structure often involves:
- Agent and Environment Structure: Each agent (or in two-player settings) repeatedly selects actions based on their state and an evolving “state of nature” (context) , which may capture exogenous dynamics, other agents’ actions, or environmental variables (Capitaine et al., 31 Jan 2025, Kim et al., 2024, Papuc et al., 6 Feb 2026).
- Predicted Contexts: Each agent may use a machine learning model to produce a forecast of from local or shared information (sometimes via a neural network predictor, LSTM, or logistic regression) (Capitaine et al., 31 Jan 2025, Le et al., 2023).
- Game-Theoretic Optimization: At each step, agents solve a multi-agent optimal control or trajectory game problem (often a generalized Nash equilibrium, GNE, or coarse-correlated equilibrium) by optimizing their own cost functions, taking into account both predicted environments and the anticipated actions of other agents (Liu et al., 2022, Kim et al., 2024, Papuc et al., 6 Feb 2026).
- Learning and Amortization: Rather than solving the computationally expensive coupled optimization at every step online, LMPG methods amortize this process by learning neural surrogates (e.g., value functions or policies) from offline game-theoretic solutions or real rollouts, enabling real-time inference (Papuc et al., 6 Feb 2026, Kim et al., 2024).
The general schematic is as follows:
- Predict future environment state(s) or context(s) using learned models.
- Solve (exactly or approximately) a coupled strategic optimization problem using the predicted context(s), either online (with differentiable game solvers) or offline (learning a surrogate policy or value function to replace online computation).
- Execute the first (or an appropriate) control/action, shift horizon, repeat.
2. Prediction-Aware and Differentiable Game-Theoretic Planning
Prediction integration into game-theoretic planning is a core LMPG innovation. For instance (Capitaine et al., 31 Jan 2025), agents receive a predicted context before choosing mixed strategies . The POWMU (Prediction-aware Optimistic Multiplicative Weight Update) algorithm instantiates a separate optimistic-hedge copy for each context. Updates exploit full payoff feedback after the true context is revealed. Performance (e.g., regret, social welfare) degrades with prediction errors, measured by a misprediction count , but remains near-optimal if predictions are sufficiently accurate.
In trajectory games with unknown opponent objectives, LMPG leverages differentiable GNE solvers based on KKT/MCP formulations. The agent uses the gradient of the trajectory game solver (via implicit differentiation) to maximize a likelihood of observed opponent behavior, thus inferring unknown agent objectives online and robustly planning Nash-style strategies (Liu et al., 2022):
- The joint solution of the associated mixed complementarity problem (MCP) is differentiated with respect to the agent’s parameter vector using the Implicit Function Theorem, enabling gradient-based maximum likelihood estimation (MLE) of opponent objectives.
- The end-to-end pipeline can be enhanced with neural network warm-starting, yielding efficient real-time control.
3. Model Predictive Game versus Learned/Amortized Planners
A central theme is the trade-off between full online Model Predictive Game (MPG) solvers—solving the game explicitly at every receding horizon step—and amortized or learned approaches (LMPG):
- MPG: Solves for Nash (or GNE) equilibria of dynamic games online. This generates high-quality, interaction-aware policies but incurs substantial computational latency, limiting performance at high control frequencies or speeds (Papuc et al., 6 Feb 2026).
- LMPG (Amortized MPG): Trains parametrized policies (e.g., neural networks with a differentiable trajectory projection layer) to replicate (approximate) the solutions of the full game, enabling real-time closed-loop execution with latencies reduced by more than an order of magnitude (Papuc et al., 6 Feb 2026).
- Empirical findings in multi-agent racing show MPG wins 100% of head-to-head races against MPC in the absence of computational delays, but its win rate degrades as action computation latency grows. The LMPG approach maintains strategic behaviors and robust performance even in high-speed, asynchronous environments (Papuc et al., 6 Feb 2026).
4. LMPG Instantiations: Algorithms and Frameworks
Major algorithmic instances include:
| Algorithm/Framework | Domain | Key Components |
|---|---|---|
| POWMU (Capitaine et al., 31 Jan 2025) | Online multi-agent games | Prediction-aware OMWU, context-based learning, regret |
| Social-LSTM Game (Le et al., 2023) | Multi-robot crowd nav | LSTM pedestrian predictor, MPC, potential game, IBR |
| Stackelberg MBRL (Rajeswaran et al., 2020) | Model-based RL | Policy/model bi-level (PAL/MAL), trust region/data agg. |
| Differentiable GNE (Liu et al., 2022) | Trajectory games | Implicit diff. solver, MLE inference, real-time MPC |
| Value-Function MPC (Kim et al., 2024) | Cooperative-Competitive | Offline GNE dataset, learned terminal value, MPC |
| Drone Racing LMPG (Papuc et al., 6 Feb 2026) | Multi-agent racing | Amortized neural policy, trajectory projection, MPC |
Detailed instantiations may employ iterative best-response (e.g., for potential games or two-player games over robots/humans) (Le et al., 2023), alternating optimization (PAL/MAL) (Rajeswaran et al., 2020), or simultaneous gradient play over neural policy weights (Papuc et al., 6 Feb 2026). In each case, learning-driven prediction and amortization (surrogate value or policy) is fused with the game-theoretic optimization layer, and training often leverages offline simulations or datasets of solved games.
5. Empirical Results and Benchmarking
Empirical validation across multiple domains demonstrates the efficacy of LMPG:
- Traffic Routing: POWMU, using prediction-aware learning, achieves sublinear regret and coarse-correlated equilibrium properties that significantly outperform context-ignorant baselines (Capitaine et al., 31 Jan 2025).
- Crowd Navigation: Multi-robot LMPG achieves >85% success rates, low collision rates (<2%), and robustness to crowd density increases (Le et al., 2023).
- Trajectory Games: Differentiable LMPG achieves real-time planning (0.02–0.1 s per control step for , agents); full pipeline supports neural inference of objective parameters and equilibrium seeking (Liu et al., 2022).
- Racing and Intersection: LMPG/IGT-MPC using learned terminal value achieves >97% feasibility and 0% gridlock in cooperative scenarios and defends position against faster adversaries in competitive racing (>99% win rate in challenging settings) (Kim et al., 2024).
- Drone Racing: Amortized LMPG yields ≈3.5 ms inference (vs. ≈60 ms for game solver MPG), winning 80–90% of head-to-head races against both game-theoretic and MPCC baselines, and successfully transferring to hardware (Papuc et al., 6 Feb 2026).
6. Theoretical Performance and Guarantees
Rigorous performance bounds characterize LMPG approaches:
- Prediction-Aware Regret: Regret and social welfare bounds degrade gracefully in the total prediction error ; when predictions are accurate, regret rates and price-of-anarchy bounds recover those of static games (Capitaine et al., 31 Jan 2025).
- Equilibrium Properties: Empirical flow distributions under POWMU and other prediction-aware updates converge to -approximate coarse-correlated or (contextual) correlated equilibria (Capitaine et al., 31 Jan 2025).
- Sample Efficiency: In model-based RL, LMPG (PAL/MAL) retains the high sample efficiency of model-based approaches, matches model-free performance in the limit, and is robust to various model and policy mis-specifications (Rajeswaran et al., 2020).
7. Domains, Limitations, and Extensions
LMPG is broadly applicable, but several specificities and limitations exist:
- Computation: Full game solution for data generation is often expensive and scales poorly with agent count, though amortized approaches address this for real-time deployment (Kim et al., 2024, Papuc et al., 6 Feb 2026).
- Generalization: Current methods mainly address two-agent or small-scale multi-agent settings; extension to larger agent swarms (e.g., via graph neural networks) remains an open area (Kim et al., 2024).
- Robustness and Adaptation: LMPG’s learned components may require further robustness (e.g., to unseen scenarios) via strategic sampling, meta-learning, or distributionally-robust training (Papuc et al., 6 Feb 2026, Kim et al., 2024).
- Full-Observation Assumption: Most LMPG methods require full state knowledge for all agents; integration with perception modules and partial observability is an active direction for future research (Papuc et al., 6 Feb 2026).
LMPGs yield a unified approach to embedding learned, prediction-aware, and game-theoretic policy components into model-based optimization for multi-agent systems, advancing the state-of-the-art in strategic control for domains ranging from traffic to robotics, trajectory games, and high-speed racing (Capitaine et al., 31 Jan 2025, Le et al., 2023, Rajeswaran et al., 2020, Liu et al., 2022, Kim et al., 2024, Papuc et al., 6 Feb 2026).