Blockwise Minorization–Maximization (Blockwise-MM)

Updated 26 January 2026

Blockwise-MM is a class of iterative optimization algorithms that partitions variables into blocks and updates them with tailored surrogate functions, handling nonconvex and constrained problems.
It cyclically refines each block using surrogates that satisfy majorization/minorization and geometric requirements, supporting both Euclidean and manifold constraints.
Its convergence guarantees and complexity analyses make it effective for large-scale applications like matrix/tensor factorization, heteroscedastic regression, and manifold-constrained subspace estimation.

Blockwise Minorization-Maximization (blockwise-MM) is a flexible class of iterative optimization algorithms for nonconvex, constrained, and high-dimensional problems. Blockwise-MM generalizes the classic Majorization-Minimization (MM) and Minorization-Maximization principles by partitioning variables into blocks and cyclically updating these using block-specific surrogate functions. This structure allows tailored surrogates per block, geometric flexibility (including Euclidean and manifold constraints), and efficient per-block updates, often vital for large-scale or structured data models.

1. Mathematical Formulation and Blockwise Partitioning

Blockwise-MM approaches partition the parameter vector $x = (x_1, \ldots, x_B)$ into $B$ blocks, with each block $x_i$ constrained in a (possibly non-Euclidean) set $M_i$ . The central objective is

$\min_x \; f(x) \quad \text{s.t.} \quad x \in M_1 \times \cdots \times M_B,$

where $f$ may be nonsmooth and nonconvex, and each $M_i$ can be a closed subset of a Riemannian manifold, convex set, or similar structure (Li et al., 2023, Lopez et al., 2024).

At each iteration and for each block, a surrogate function $g_i(\cdot \mid x^{(k)})$ (majorant or minorant, depending on minimization or maximization context) approximates the objective restricted to that block with others fixed. The essential properties for each surrogate $g_i$ are:

Majorization (or minorization): $g_i(x_i \mid x^{(k)}) \geq f(x_1^{(k)}, \ldots, x_{i-1}^{(k)}, x_i, x_{i+1}^{(k)}, \ldots, x_B^{(k)})$ , with equality at $x_i = x_i^{(k)}$ .
Blockwise convexity or strong convexity (for minimization), or quasi-convexity for manifold blocks (Lopez et al., 2024).
Smoothness properties used for rate analysis (Lyu et al., 2020).

2. Algorithmic Structure and Blockwise Cyclic Updating

Blockwise-MM algorithms proceed iteratively as follows:

For the current iterate $x^{(k)} = (x_1^{(k)}, \ldots, x_B^{(k)})$ , sequentially (or in a chosen order) construct a surrogate $g_i(x_i \mid x^{(k)})$ for each block $i$ .
Update each block by

$x_i^{(k+1)} = \operatorname*{arg\,min}_{x_i \in M_i} g_i(x_i \mid x^{(k)}).$

The cycle continues until convergence to a stationary point or an $\epsilon$ -stationary point as measured by a blockwise first-order optimality criterion.

For blockwise-MM on manifolds, updates utilize manifold geometry (e.g., geodesic convexity), and surrogates often include proximal regularization or penalized distance terms (Li et al., 2023, Lopez et al., 2024).

3. Surrogate Function Design and Geometric Properties

Surrogate functions are block-specific and may exploit structure:

Euclidean blocks: quadratic or prox-linear surrogates, strong convexity, or smoothness constraints (Lyu et al., 2020).
Riemannian/Manifold blocks: geodesically quasi-convex or geodesically smooth surrogates, ensuring feasibility on nonconvex sets such as Stiefel or Grassmann manifolds (Lopez et al., 2024, Li et al., 2023).

Essential surrogate properties per block:

Tightness at the current iterate.
Majorization over the block's feasible set.
Directional derivative matching in the block's tangent space or Euclidean subspace.
(Geo-)quasi-convexity: the surrogate is convex (or quasi-convex) along geodesics for manifold blocks.
Continuity in all arguments (Lopez et al., 2024).

Surrogates with additional regularization (e.g., proximal terms or trust-region radii) can be used to improve convergence properties and rate bounds (Lyu et al., 2020, Lyu, 2022).

4. Convergence Theory and Complexity Analysis

Blockwise-MM guarantees monotonic descent of $f$ in the minimization case, and—under appropriate regularity—a sequence of iterates converging asymptotically to stationary points of $f$ (Lopez et al., 2024, Li et al., 2023). Key results include:

Global stationarity: Provided surrogates satisfy the five axioms (tightness, majorization, first-order matching, continuity, geo-quasi-convexity), unique block-wise minimizers exist, and compact sublevel sets, every limit point of the sequence is stationary (Lopez et al., 2024).
Complexity rates: For surrogates with strong convexity or trust-region radii $r_k$ , blockwise-MM produces an $\epsilon$ -stationary point within $O(\epsilon^{-2})$ iterations (modulo logarithmic factors), matching optimal rates for first-order methods in nonconvex optimization (Li et al., 2023, Lyu et al., 2020). For diminishing radius or weakly convex/block-convex surrogates, explicit rates such as $O((\log n)^{1+\varepsilon}/n^{1/2})$ for empirical loss or $O((\log n)^{1+\varepsilon}/n^{1/4})$ for expected loss can be achieved (stochastic blockwise-MM setting) (Lyu, 2022).

Proofs rely on surrogate optimality, summable progress gaps, and stationarity measures based on blockwise gradients and feasible directions.

5. Practical Implementations and Computational Aspects

Blockwise-MM frameworks offer computational advantages when each block subproblem has an efficient or even closed-form solution:

Heteroscedastic regression: Each block update is reduced to weighted least-squares for regression coefficients, and a minorized quadratic in the dispersion parameters, with closed-form updates, dramatically reducing per-iteration cost compared to Newton's method (Nguyen et al., 2016).
Distributed/parallel architectures: Separable surrogates permit deployment across clusters or workers, with communications limited to reduction of aggregated statistics for each block (Nguyen et al., 2016).
High-dimensional/structured models: The blockwise structure allows updates within blocks (e.g., alternating updates of matrix factors in NMF or tensor decomposition), where joint optimization is intractable (Lyu et al., 2020, Lyu, 2022).

Surrogate selection, step sizes (fixed or diminishing), and proximal regularization are tuned to balance per-iteration efficiency and global convergence speed.

6. Extensions to Manifold-Constrained and Stochastic Settings

Recent work extends blockwise-MM:

Manifold constraints: Algorithms operate on product manifolds (e.g., Grassmannian, Stiefel, or Hadamard), with surrogates incorporating geodesic properties and first-order stationarity defined via Riemannian gradients (Lopez et al., 2024, Li et al., 2023). Applications include robust PCA, subspace tracking, and CP-dictionary learning.
Stochastic blockwise-MM: Empirical surrogates are dynamically averaged and block-minimized within diminishing radii, permitting weakly convex or multi-convex surrogates. Convergence rates under non-i.i.d. data and dependent processes are established, relevant for online matrix/tensor factorization and online empirical risk minimization (Lyu, 2022).

7. Applications and Algorithmic Variants

Blockwise-MM encompasses, generalizes, and subsumes several canonical frameworks:

Block coordinate descent (BCD): As special cases when surrogates are linearizations plus strong convexity (Lyu et al., 2020).
EM and blockwise EM: When surrogates emerge from expectations over latent distributions (Lyu et al., 2020).
Proximal algorithms: Surrogates with proximal regularization enable extensions to composite nonsmooth problems and block-projected gradient schemes (Li et al., 2023, Lyu, 2022).
Heteroscedastic regression: The blockwise MM algorithm provides a scalable alternative to Newton-type methods with guaranteed monotonicity and stationarity (Nguyen et al., 2016).

Typical application domains include matrix/tensor factorizations, structured sparsity models, manifold-constrained subspace estimation, and large-scale regression.

Blockwise Minorization-Maximization algorithms furnish a versatile, theoretically grounded set of methods for block-structured nonconvex optimization, accommodating both classical Euclidean and modern manifold constraints, with rigorous monotonicity, convergence, and complexity guarantees (Lopez et al., 2024, Li et al., 2023, Lyu et al., 2020, Lyu, 2022, Nguyen et al., 2016). Practical deployments leverage blockwise separability, surrogate flexibility, and structural efficiency for scalable optimization in high-dimensional and distributed statistical problems.