Matrix-Inverse-Free WMMSE Methods

Updated 5 February 2026

Matrix-inverse-free WMMSE is a family of algorithms that optimize the weighted sum-rate in multi-user MIMO by avoiding direct matrix inversion using first-order updates.
These methods employ gradient descent, polynomial approximations, and low-dimensional reductions to lower computational complexity and enable parallel processing.
Empirical results indicate near-optimal performance and significant speedups, making them ideal for real-time, large-scale wireless communications.

Matrix-inverse-free WMMSE refers to a family of algorithmic designs for solving the weighted sum-rate (WSR) maximization problem in multi-user MIMO networks that avoid the computational bottleneck of direct matrix inversion, a limitation of classic weighted minimum mean-square error (WMMSE) methods. These approaches instead use first-order updates (gradient descent, projected gradient, polynomial approximation, or low-dimensional reductions) and are highly parallelizable, making them suitable for real-time and large-scale applications in wireless communications.

1. Problem Setting and Limitations of Classical WMMSE

The standard framework involves downlink MU-MIMO beamforming, where a base station with $M$ transmit antennas serves $K$ users, each potentially with $N$ antennas. The goal is to maximize the WSR under a sum-power constraint: $\max_{\{V_k\}}\quad \sum_{k=1}^K \alpha_k R_k \quad\text{s.t.}\quad \sum_{k=1}^K \mathrm{Tr}(V_k V_k^H) \leq P_\mathrm{max}$ where $R_k$ is the user $k$ rate.

This non-convex problem is transformable into an equivalent WMMSE problem by introducing auxiliary receive filters $\{U_k\}$ and weight matrices $\{W_k\}$ and jointly updating $\{U_k\}$ , $\{W_k\}$ , and $\{V_k\}$ via block coordinate descent. However, the $V_k$ -update in classic WMMSE requires inverting an $M\times M$ matrix per iteration, resulting in $\mathcal O(M^3)$ complexity that becomes prohibitive for large $M$ or latency-sensitive applications (Gao et al., 23 Oct 2025, Pellaco et al., 2022, Pellaco et al., 2020).

2. Core Principles of Matrix-Inverse-Free WMMSE

Matrix-inverse-free WMMSE methods remove all \emph{explicit} matrix inversions from the iterative update pipeline. This is achieved by:

First-order updates: Using gradient descent or projected gradient descent (PGD) for the $V_k$ (precoder) step instead of direct inversion.
Polynomial approximations: Approximating matrix inverses via truncated series expansions, such as in model-driven deep learning approaches.
Low-dimensional reduction: Transforming the problem into a reduced subspace where only small-dimensional inversions are needed.
Recursion and iterative refinements: Alternate approaches using methods like the Newton-Schulz iteration for approximating inverses.

The result is an iterative structure that relies only on matrix-matrix multiplications, additions, projections onto convex sets, and possibly scalar operations—operations that are inherently parallel and suitable for GPU/FPGA acceleration (Gao et al., 23 Oct 2025, Pellaco et al., 2022).

3. Representative Algorithms and Methodologies

3.1 Block Coordinate Gradient Descent (BCGD) and Projected PGD

A-MMMSE (Gao et al., 23 Oct 2025): Replaces the $V_k$ closed-form with a projected BCGD step. Each $V_k$ is updated as:

$V_k^{t} = \Pi_{\|\cdot\|_F^2 \leq P_\mathrm{max}} \big[ V_k^{t-1} - \gamma \nabla_{V_k} f(U^t, W^t, V^{t-1}) \big]$

Projection onto the Frobenius-norm ball is implemented as rescaling if the power constraint is exceeded.

PGD WMMSE (Pellaco et al., 2020): For MISO (multi-user single-output), uses $K$ steps of projected gradient within each outer loop, fully avoiding inversions or eigen-decompositions.

3.2 Polynomial Expansion and Deep Unfolding

Learned Truncated Polynomial Expansion (TPE) (Izadinasab et al., 2024): Approximates $(H^H H + \sigma^2 I)^{-1}$ via

$X^{-1} \approx \sum_{\ell=0}^{L-1} c_\ell X^\ell$

The coefficients $\{c_\ell\}$ are learned offline to best match the linear MMSE or WMMSE mapping over typical channels.

Deep-unfolded WMMSE (Pellaco et al., 2020, Pellaco et al., 2022): Each forward layer in the unfolded network mimics a WMMSE iteration but replaces all inversion steps with differentiable, learned module blocks.

3.3 Low-Dimensional and Recursion-Based Reductions

Reduced WMMSE (R-WMMSE) (Zhao et al., 2022): For MU-MIMO under sum-power constraints, exploits the fact that all stationary-point precoders lie in the range of $H^H$ , so the problem is reduced to optimizing over $D$ -dim (sum of user stream counts), requiring only $D\times D$ inversions (where $D\ll M$ ).
PAPC-WMMSE (Zhao et al., 2022): For per-antenna power constraints, recasts the precoder update as a sequence of small norm-ball projections, avoiding large matrix solves entirely.

3.4 Gradient and Iterative Approximation in General MU-MIMO

MIF-WMMSE (Pellaco et al., 2022): Uses gradient-descent and Newton-Schulz recursion for the weight matrix updates, bringing all update complexity down to matrix-multiplies.
Finite-horizon optimization with Chebyshev steps (Feng et al., 14 Mar 2025): Applies a fractional programming reformulation and then runs a fixed, optimally scheduled sequence of gradient steps with Chebyshev-optimal step-sizes to minimize the subproblem residual without inversion.

4. Convergence Theory and Optimality

Matrix-inverse-free WMMSE approaches are instances of inexact block coordinate descent over composite (often nonconvex) objectives. Convergence proofs rely on:

Block-wise convexity: Each subproblem is convex in its own block (e.g., $U$ , $W$ , $V$ individually).
Lipschitz continuity: Ensures sufficient decrease of auxiliary cost for small enough step sizes.
Projection and bounding: The use of power constraint projections ensures iterates remain feasible and in a compact set.
Global convergence: Every accumulation point is a stationary (KKT) point of the original WSR maximization (Gao et al., 23 Oct 2025, Zhao et al., 2022, Pellaco et al., 2022).

Furthermore, finite-layer deep-unfolded versions achieve nearly all the performance gains of classic WMMSE when the number of iterations/layers and PGD steps per layer are chosen appropriately (Pellaco et al., 2020, Pellaco et al., 2022).

5. Computational Complexity and Parallel Implementation

A principal benefit of all matrix-inverse-free WMMSE algorithms is replacing $\mathcal{O}(M^3)$ inversion bottlenecks with $\mathcal{O}(KM^2d)$ or $\mathcal{O}(MND)$ multiply-adds per iteration. This unlocks:

Scalable acceleration: Matrix-matrix operations are highly parallel and map directly to GPU/FPGA hardware (cUBLAS, PyTorch, etc.) (Gao et al., 23 Oct 2025, Pellaco et al., 2022).
Reduced latency: Wall-clock time reductions up to $5\times$ (CPU) or $3\times$ (GPU) in high-dimensional simulations (e.g., $M=512, K=20$ ).
Suitability for large-scale MIMO: Enables realtime adaptation in massive MIMO where $M\gg K,N$ (Feng et al., 14 Mar 2025).

WMMSE Algorithm	Per-iteration Cost	Inversion Needed?
Classical WMMSE	$O(M^3)$	Yes ( $M\times M$ )
A-MMMSE / BCGD	$O(KM^2 d)$	No
R-WMMSE	$O(D^3)$	Only $D\times D$ ( $D\ll M$ )
PGD-Unfolded	$O(LK M^2)$	No
TPE-Deep Learning	$O(LNK)$ (detection)	No

6. Performance Profile and Empirical Results

Simulation studies across various platforms and problem sizes consistently show:

Empirical optimality: For a fixed number of iterations or computation budget, matrix-inverse-free and unfolded WMMSE variants reach $\geq 98\%$ of the classical WMMSE WSR, and frequently outperform truncated or ill-budgeted classic WMMSE (Pellaco et al., 2020, Feng et al., 14 Mar 2025, Gao et al., 23 Oct 2025).
Acceleration via warm starts: Staged initialization (e.g., unweighted MSE minimization followed by full WMMSE) can further cut convergence time by $20$– $35\%$ (Gao et al., 23 Oct 2025).
Quantitative speedup: In large MU-MIMO, matrix-inverse-free BCGD and finite-horizon Chebyshev-optimized GD are between $2\times$ and $5\times$ faster per problem solved, both on CPU and GPU (Gao et al., 23 Oct 2025, Feng et al., 14 Mar 2025).
Robustness to SNR and scaling: These methods maintain near-optimal sum-rate across low to high SNRs and for $M$ up to several thousand.

7. Implementation Considerations and Practical Guidelines

Initialization: Appropriate scaling (e.g., matched-filter output rescaled to feasible power) is essential for stable convergence (Pellaco et al., 2020).
Step-size scheduling: Learning or choosing Chebyshev-optimal, adaptive, or progressively shrinking step sizes accelerates convergence and avoids overshooting (Feng et al., 14 Mar 2025).
Batching and vectorization: All core updates are amenable to batch execution over multiple users or antennas, allowing end-to-end integration with model-driven or data-driven acceleration frameworks (Gao et al., 23 Oct 2025, Izadinasab et al., 2024).
Hardware mapping: For on-device or real-time deployment, matrix-inverse-free architectures minimize dependency on serial or non-parallelizable operations.

References

"An Accelerated Mixed Weighted-Unweighted MMSE Approach for MU-MIMO Beamforming" (Gao et al., 23 Oct 2025)
"A matrix-inverse-free implementation of the MU-MIMO WMMSE beamforming algorithm" (Pellaco et al., 2022)
"Finite Horizon Optimization for Large-Scale MIMO" (Feng et al., 14 Mar 2025)
"Deep unfolding of the weighted MMSE beamforming algorithm" (Pellaco et al., 2020)
"Rethinking WMMSE: Can Its Complexity Scale Linearly With the Number of BS Antennas?" (Zhao et al., 2022)
"Truncated Polynomial Expansion-Based Detection in Massive MIMO: A Model-Driven Deep Learning Approach" (Izadinasab et al., 2024)
"Highly Accelerated Weighted MMSE Algorithms for Designing Precoders in FDD Systems with Incomplete CSI" (Amor et al., 2023)