Matrix-Inverse-Free WMMSE Methods
- Matrix-inverse-free WMMSE is a family of algorithms that optimize the weighted sum-rate in multi-user MIMO by avoiding direct matrix inversion using first-order updates.
- These methods employ gradient descent, polynomial approximations, and low-dimensional reductions to lower computational complexity and enable parallel processing.
- Empirical results indicate near-optimal performance and significant speedups, making them ideal for real-time, large-scale wireless communications.
Matrix-inverse-free WMMSE refers to a family of algorithmic designs for solving the weighted sum-rate (WSR) maximization problem in multi-user MIMO networks that avoid the computational bottleneck of direct matrix inversion, a limitation of classic weighted minimum mean-square error (WMMSE) methods. These approaches instead use first-order updates (gradient descent, projected gradient, polynomial approximation, or low-dimensional reductions) and are highly parallelizable, making them suitable for real-time and large-scale applications in wireless communications.
1. Problem Setting and Limitations of Classical WMMSE
The standard framework involves downlink MU-MIMO beamforming, where a base station with transmit antennas serves users, each potentially with antennas. The goal is to maximize the WSR under a sum-power constraint: where is the user rate.
This non-convex problem is transformable into an equivalent WMMSE problem by introducing auxiliary receive filters and weight matrices and jointly updating , , and via block coordinate descent. However, the -update in classic WMMSE requires inverting an matrix per iteration, resulting in complexity that becomes prohibitive for large or latency-sensitive applications (Gao et al., 23 Oct 2025, Pellaco et al., 2022, Pellaco et al., 2020).
2. Core Principles of Matrix-Inverse-Free WMMSE
Matrix-inverse-free WMMSE methods remove all \emph{explicit} matrix inversions from the iterative update pipeline. This is achieved by:
- First-order updates: Using gradient descent or projected gradient descent (PGD) for the (precoder) step instead of direct inversion.
- Polynomial approximations: Approximating matrix inverses via truncated series expansions, such as in model-driven deep learning approaches.
- Low-dimensional reduction: Transforming the problem into a reduced subspace where only small-dimensional inversions are needed.
- Recursion and iterative refinements: Alternate approaches using methods like the Newton-Schulz iteration for approximating inverses.
The result is an iterative structure that relies only on matrix-matrix multiplications, additions, projections onto convex sets, and possibly scalar operations—operations that are inherently parallel and suitable for GPU/FPGA acceleration (Gao et al., 23 Oct 2025, Pellaco et al., 2022).
3. Representative Algorithms and Methodologies
3.1 Block Coordinate Gradient Descent (BCGD) and Projected PGD
- A-MMMSE (Gao et al., 23 Oct 2025): Replaces the closed-form with a projected BCGD step. Each is updated as:
Projection onto the Frobenius-norm ball is implemented as rescaling if the power constraint is exceeded.
- PGD WMMSE (Pellaco et al., 2020): For MISO (multi-user single-output), uses steps of projected gradient within each outer loop, fully avoiding inversions or eigen-decompositions.
3.2 Polynomial Expansion and Deep Unfolding
- Learned Truncated Polynomial Expansion (TPE) (Izadinasab et al., 2024): Approximates via
The coefficients are learned offline to best match the linear MMSE or WMMSE mapping over typical channels.
- Deep-unfolded WMMSE (Pellaco et al., 2020, Pellaco et al., 2022): Each forward layer in the unfolded network mimics a WMMSE iteration but replaces all inversion steps with differentiable, learned module blocks.
3.3 Low-Dimensional and Recursion-Based Reductions
- Reduced WMMSE (R-WMMSE) (Zhao et al., 2022): For MU-MIMO under sum-power constraints, exploits the fact that all stationary-point precoders lie in the range of , so the problem is reduced to optimizing over -dim (sum of user stream counts), requiring only inversions (where ).
- PAPC-WMMSE (Zhao et al., 2022): For per-antenna power constraints, recasts the precoder update as a sequence of small norm-ball projections, avoiding large matrix solves entirely.
3.4 Gradient and Iterative Approximation in General MU-MIMO
- MIF-WMMSE (Pellaco et al., 2022): Uses gradient-descent and Newton-Schulz recursion for the weight matrix updates, bringing all update complexity down to matrix-multiplies.
- Finite-horizon optimization with Chebyshev steps (Feng et al., 14 Mar 2025): Applies a fractional programming reformulation and then runs a fixed, optimally scheduled sequence of gradient steps with Chebyshev-optimal step-sizes to minimize the subproblem residual without inversion.
4. Convergence Theory and Optimality
Matrix-inverse-free WMMSE approaches are instances of inexact block coordinate descent over composite (often nonconvex) objectives. Convergence proofs rely on:
- Block-wise convexity: Each subproblem is convex in its own block (e.g., , , individually).
- Lipschitz continuity: Ensures sufficient decrease of auxiliary cost for small enough step sizes.
- Projection and bounding: The use of power constraint projections ensures iterates remain feasible and in a compact set.
- Global convergence: Every accumulation point is a stationary (KKT) point of the original WSR maximization (Gao et al., 23 Oct 2025, Zhao et al., 2022, Pellaco et al., 2022).
Furthermore, finite-layer deep-unfolded versions achieve nearly all the performance gains of classic WMMSE when the number of iterations/layers and PGD steps per layer are chosen appropriately (Pellaco et al., 2020, Pellaco et al., 2022).
5. Computational Complexity and Parallel Implementation
A principal benefit of all matrix-inverse-free WMMSE algorithms is replacing inversion bottlenecks with or multiply-adds per iteration. This unlocks:
- Scalable acceleration: Matrix-matrix operations are highly parallel and map directly to GPU/FPGA hardware (cUBLAS, PyTorch, etc.) (Gao et al., 23 Oct 2025, Pellaco et al., 2022).
- Reduced latency: Wall-clock time reductions up to (CPU) or (GPU) in high-dimensional simulations (e.g., ).
- Suitability for large-scale MIMO: Enables realtime adaptation in massive MIMO where (Feng et al., 14 Mar 2025).
| WMMSE Algorithm | Per-iteration Cost | Inversion Needed? |
|---|---|---|
| Classical WMMSE | Yes () | |
| A-MMMSE / BCGD | No | |
| R-WMMSE | Only () | |
| PGD-Unfolded | No | |
| TPE-Deep Learning | (detection) | No |
6. Performance Profile and Empirical Results
Simulation studies across various platforms and problem sizes consistently show:
- Empirical optimality: For a fixed number of iterations or computation budget, matrix-inverse-free and unfolded WMMSE variants reach of the classical WMMSE WSR, and frequently outperform truncated or ill-budgeted classic WMMSE (Pellaco et al., 2020, Feng et al., 14 Mar 2025, Gao et al., 23 Oct 2025).
- Acceleration via warm starts: Staged initialization (e.g., unweighted MSE minimization followed by full WMMSE) can further cut convergence time by $20$– (Gao et al., 23 Oct 2025).
- Quantitative speedup: In large MU-MIMO, matrix-inverse-free BCGD and finite-horizon Chebyshev-optimized GD are between and faster per problem solved, both on CPU and GPU (Gao et al., 23 Oct 2025, Feng et al., 14 Mar 2025).
- Robustness to SNR and scaling: These methods maintain near-optimal sum-rate across low to high SNRs and for up to several thousand.
7. Implementation Considerations and Practical Guidelines
- Initialization: Appropriate scaling (e.g., matched-filter output rescaled to feasible power) is essential for stable convergence (Pellaco et al., 2020).
- Step-size scheduling: Learning or choosing Chebyshev-optimal, adaptive, or progressively shrinking step sizes accelerates convergence and avoids overshooting (Feng et al., 14 Mar 2025).
- Batching and vectorization: All core updates are amenable to batch execution over multiple users or antennas, allowing end-to-end integration with model-driven or data-driven acceleration frameworks (Gao et al., 23 Oct 2025, Izadinasab et al., 2024).
- Hardware mapping: For on-device or real-time deployment, matrix-inverse-free architectures minimize dependency on serial or non-parallelizable operations.
References
- "An Accelerated Mixed Weighted-Unweighted MMSE Approach for MU-MIMO Beamforming" (Gao et al., 23 Oct 2025)
- "A matrix-inverse-free implementation of the MU-MIMO WMMSE beamforming algorithm" (Pellaco et al., 2022)
- "Finite Horizon Optimization for Large-Scale MIMO" (Feng et al., 14 Mar 2025)
- "Deep unfolding of the weighted MMSE beamforming algorithm" (Pellaco et al., 2020)
- "Rethinking WMMSE: Can Its Complexity Scale Linearly With the Number of BS Antennas?" (Zhao et al., 2022)
- "Truncated Polynomial Expansion-Based Detection in Massive MIMO: A Model-Driven Deep Learning Approach" (Izadinasab et al., 2024)
- "Highly Accelerated Weighted MMSE Algorithms for Designing Precoders in FDD Systems with Incomplete CSI" (Amor et al., 2023)