$π$-MPPI: A Projection-based Model Predictive Path Integral Scheme for Smooth Optimal Control of Fixed-Wing Aerial Vehicles

Published 15 Apr 2025 in cs.RO and eess.SY | (2504.10962v2)

Abstract: Model Predictive Path Integral (MPPI) is a popular sampling-based Model Predictive Control (MPC) algorithm for nonlinear systems. It optimizes trajectories by sampling control sequences and averaging them. However, a key issue with MPPI is the non-smoothness of the optimal control sequence, leading to oscillations in systems like fixed-wing aerial vehicles (FWVs). Existing solutions use post-hoc smoothing, which fails to bound control derivatives. This paper introduces a new approach: we add a projection filter $π$ to minimally correct control samples, ensuring bounds on control magnitude and higher-order derivatives. The filtered samples are then averaged using MPPI, leading to our $π$-MPPI approach. We minimize computational overhead by using a neural accelerated custom optimizer for the projection filter. $π$-MPPI offers a simple way to achieve arbitrary smoothness in control sequences. While we focus on FWVs, this projection filter can be integrated into any MPPI pipeline. Applied to FWVs, $π$-MPPI is easier to tune than the baseline, resulting in smoother, more robust performance.

Abstract PDF Upgrade to Chat

Summary

The paper presents π-MPPI, which integrates a projection filter within MPPI to enforce bounds on control magnitudes and derivatives, significantly improving smoothness.
It employs a custom ADMM-based QP solver and a neural warm-start strategy to achieve efficient, parallelized optimization suitable for real-time UAV operation.
Empirical validations in obstacle avoidance and terrain following benchmarks demonstrate higher success rates, minimal constraint violations, and faster convergence compared to standard MPPI.

Projection-based Model Predictive Path Integral Scheme for Fixed-Wing Aerial Vehicles

Motivation and Background

Recent advances in sampling-based model predictive control algorithms, specifically Model Predictive Path Integral (MPPI), have enabled robust trajectory optimization for nonlinear systems. However, for fixed-wing aerial vehicles (FWVs), the inherent non-smoothness of MPPI-generated control sequences can result in actuator oscillations, risking instability. Attempts to remedy this by post-hoc filtering (e.g., Savitzky–Golay) or by augmenting control penalties within the cost function have proven inadequate for guaranteeing bounded control derivatives, leading to challenging cost-tuning and often suboptimal smoothness.

Algorithmic Contributions

The paper proposes $\pi$ -MPPI, a variant of MPPI augmented by a projection filter $\pi$ that enforces bounds not only on control magnitudes but also on higher-order derivatives throughout the control sequence. This projection is applied directly to sampled control trajectories, solving a quadratic program (QP) to minimally adjust the samples while satisfying derivative constraints. Notably, the computational overhead is mitigated by a custom, parallelizable ADMM-based solver, with further acceleration through a warm-start neural policy trained in a self-supervised manner.

The $\pi$ -MPPI algorithm modifies the MPPI pipeline by:

Applying projection-based filtering to all control samples prior to their evaluation and averaging,
Updating sample statistics post-projection,
Ensuring feasibility by projecting the final averaged control trajectory,
Enabling arbitrary smoothness orders by adjusting constraints in the projection QP.

The impact is the production of smooth, feasible control profiles for FWVs—without reliance on cost penalties for smoothness or post-hoc filtering.

Efficient Quadratic Programming and Warm-Start

The QP for projection is designed with batch processing and GPU acceleration in mind. All constraints—initial conditions and bounds on control and derivatives—are handled via slack variables and augmented Lagrangian minimization, enabling efficient parallelized ADMM iterations.

Warm-starting is achieved by training a multi-layer perceptron (MLP) to output initial values for the QP variables, with training gradients backpropagated through the solver steps. This approach ensures the neural policy is cognizant of its downstream effect, yielding higher convergence rates while maintaining minimal constraint violations.

Control Parametrization

To further reduce computation, the control sequence is parametrized using time-dependent polynomials, mapping mean trajectories and perturbations to low-dimensional coefficient spaces. The resulting QP operates on these coefficients, compressing the optimization problem and improving speeds compared to way-point parametrization, especially for long planning horizons.

Empirical Validation

Benchmarks

Two complex scenarios were evaluated:

Obstacle Avoidance: FWV encircles a static goal while avoiding randomized 3D obstacles.
Terrain Following: FWV tracks a goal while maintaining altitude constraints above complex terrain.

Metrics included:

Success rate (no collisions/crashes),
Smoothness (constraint residuals on control and derivatives),
Proximity to goal.

Quantitative Results

$\pi$ -MPPI demonstrated:

Higher success rates compared to classic MPPI variants, especially under high control covariance settings,
Minimal constraint violations across all control components and derivatives—often several orders lower than MPPIwSGF and polynomial baseline,
Reduced average distance to goal and fewer outlier events,
Robustness against high perturbation noise, facilitating broader exploration without destabilizing the control process.

Warm-starting using the neural policy further reduced QP iterations required for convergence (as few as 2 iterations per batch), with negligible loss in constraint satisfaction and significant computational gains.

Ablative Studies

Alternative warm-start strategies (sample-based or direct solution prediction) failed to match the residual minimization achieved by the neural policy. Results are consistent across both obstacle avoidance and terrain following tasks.

Computation Time

Polynomial control parametrization consistently outperformed way-point parametrization as batch sizes and projection iterations grew, with $\pi$ -MPPI maintaining feedback rates suitable for real-time FWV operation (≥50 Hz). Neural warm-starting matched MPPIwSGF computation times with lower worst-case delays.

$\pi$ -MPPI generalizes smoothness enforcement in MPPI without penalizing the primary cost function or increasing system dynamics, as seen in [kim2022smooth]. Unlike evolutionary-optimization or learning-based noise adaptation in [bhardwaj2022storm], [sacks2023learning], the projection filter ensures smoothness and feasibility without covariances constraints or indirect tuning. The differentiable solver plus neural warm-start strategy aligns with theoretical advances in fixed-point optimization warm-starting [sambharya2024learning], offering practical benefits for real-time MPC pipelines.

Practical and Theoretical Implications

$\pi$ -MPPI's demonstrable ability to generate smooth bounded control for FWVs addresses actuator reliability and mission robustness in high-speed agile flight. By tolerating larger perturbation noise, it widens the feasible exploration space, potentially benefiting safe navigation, target tracking, and collision avoidance across unmanned aerial vehicles.

On a theoretical front, projection-based enforcement of arbitrarily smooth constraints opens avenues for MPC policy design in other nonlinear systems where control smoothness is critical (e.g., vehicle guidance, manipulator arms, autonomous driving).

Conclusion

The $\pi$ -MPPI scheme delivers robust, computationally efficient smooth trajectory optimization for fixed-wing aerial vehicles by embedding a projection-based QP filter into the MPPI pipeline and leveraging neural warm-starting. Empirical validations underscore its superiority over conventional MPPI and penalty-based methods in both robustness and smoothness across challenging benchmarks. Future work includes generalization to autonomous driving, non-convex projection of state constraints, and expanding the neural warm-starting paradigm for broader optimization algorithms (2504.10962).