Blockwise-MM Methods for Optimization

Updated 18 February 2026

Blockwise-MM methods are iterative optimization algorithms that partition variables into blocks and minimize majorizing surrogates for structured nonconvex problems.
They simplify subproblem structures, enable distributed computation, and effectively handle complex constraints, including manifold restrictions.
Acceleration techniques such as extrapolation and adaptive momentum enhance convergence speed and practical performance in blockwise-MM implementations.

Blockwise Minorization-Maximization (blockwise-MM) methods are a family of iterative optimization algorithms for structured nonconvex or nonsmooth problems, characterized by decomposing the variable space into blocks and, at each iteration, minimizing a majorizing surrogate function over one block while the others are held fixed. This blockwise strategy simplifies subproblem structure, allows for distributed computation, and enables effective treatment of complex constraints including manifold restrictions. The framework subsumes many well-known routines such as block coordinate descent, block proximal-point algorithms, block mirror descent, multiplicative updates, and the expectation-maximization algorithm.

1. Problem Classes and Formalisms

Blockwise-MM targets problems of the form

$\min_{x = (x_1, \dots, x_m) \in \mathcal{X}=\mathcal{X}_1 \times \ldots \times \mathcal{X}_m} F(x) := f(x) + \sum_{i=1}^m g_i(x_i)$

where $f$ may be differentiable but nonconvex and the $g_i$ are typically proper, lower semicontinuous, possibly nonsmooth and block-separable; the feasible sets $\mathcal{X}_i$ are closed and convex or, more generally, closed subsets of Riemannian manifolds. A blockwise-MM algorithm cyclically constructs and minimizes, for each block $i$ , a majorizing convex surrogate $M_i(\cdot; x)$ for the partial objective with other coordinates fixed (Hien et al., 2021, Hien et al., 2024, Lyu et al., 2020, Li et al., 2023).

Generalization to maximization/minorization is immediate by substituting a minorizing surrogate. Extensions exist to accommodate multiconvex, constrained, nonsmooth, or manifold-valued block variables (Lopez et al., 2024, Li et al., 2023, Lyu et al., 2020).

2. Construction of Blockwise Surrogates

For each block $x_i$ at iteration $k$ , the blockwise-MM scheme constructs a surrogate $M_i(y_i; x)$ satisfying:

Tangency: $M_i(x_i; x) = f(x)$ ,
Majorization: $M_i(y_i; x) \ge f(x_{<i}, y_i, x_{>i})$ for all $y_i$ in $\mathcal{X}_i$ ,
Convexity: $M_i(\cdot; x)$ is convex (or geodesically convex) on $\mathcal{X}_i$ .

Surrogates are typically based on blockwise Taylor expansions (with or without Bregman distances), quadratic upper bounds, or Jensen-type inequalities (as in NMF), and are sometimes further regularized with strongly convex or Bregman terms to ensure sufficient descent and facilitate convergence analysis (Hien et al., 2021, Lyu et al., 2020).

When blocks are constrained to manifolds, surrogate construction often involves Taylor approximation along geodesics and appropriate retraction maps (Lopez et al., 2024, Li et al., 2023).

3. Blockwise Update and Algorithmic Structure

The standard blockwise-MM update for block $i$ at iteration $k$ is: $x_i^{k+1} \in \arg\min_{y_i\in\mathcal{X}_i} M_i(y_i; x^k)$ where $x^k$ denotes the current iterate.

A full cycle updates blocks either cyclically or with another fixed order, setting $x^{k+1} = (x_1^{k+1}, \ldots, x_m^{k+1})$ . The process admits parallelization across data blocks and can integrate inexact or approximate solutions as long as the resulting optimality gaps are summable (Lyu et al., 2020).

For surrogates equipped with trust-region constraints or diminishing radii, the update is restricted to local neighborhoods: $x_i^{k+1} \in \arg\min_{y_i\in\mathcal{X}_i, \|y_i - x_i^k\| \le r_k} M_i(y_i; x^k)$ where $\{r_k\}$ is a sequence of radii with $\sum_k r_k = \infty$ , $\sum_k r_k^2 < \infty$ (Lyu et al., 2020).

A prototypical pseudocode for blockwise-MM is:

Initialize x^(0)
for k = 0,1,2,...
    for i = 1,...,m
        Compute surrogate M_i(y_i; x^k)
        x_i^{k+1} ← argmin_{y_i ∈ X_i} M_i(y_i; x^k)
    end
end

4. Acceleration: Extrapolation, Mirror Descent, and Adaptive Momentum

Acceleration of blockwise-MM is realized via Nesterov-type extrapolation, which involves a momentum term computed using previous block iterates. For block $i$ : $y_i^k = x_i^k + \beta_i^k (x_i^k - x_i^{k-1})$ where $\beta_i^k$ is adaptively chosen (often with a Nesterov recurrence and backtracking) to ensure Bregman divergence control and convergence (Hien et al., 2021, Hien et al., 2024). The extrapolated point $y_i^k$ is then used as the linearization point or initial value in the blockwise minimization.

Blockwise-MM with Bregman surrogates and extrapolation—e.g., the BMME or BMMe algorithms—can be interpreted as a multi-block mirror descent, with the update for each block given by: $x_i^{k+1} = \arg\min_{x_i \in \mathcal{X}_i} \big\langle \nabla_i f(y_i^k, x_{-i}^k), x_i - y_i^k \big\rangle + D_{h_i}(x_i, y_i^k)$ where $h_i$ is a strongly convex kernel generating a Bregman divergence and the step size is implicitly absorbed (Hien et al., 2024).

Practical schemes use adaptive or scheduled decay of extrapolation coefficients to ensure stability and theoretical guarantees (Hien et al., 2024, Hien et al., 2021).

5. Convergence Theory and Complexity

Convergence analyses of blockwise-MM methods rely on the properties of the surrogates:

Monotonicity: Each update guarantees $F(x^{k+1}) \leq F(x^k)$ (or non-increasing for maximization).
Stationarity: Every limit point is a stationary (first-order critical) point of $F$ .
Complexity: For surrogates that are $\rho$ -strongly convex and $L_g$ -smooth, blockwise-MM attains an $\epsilon$ -stationary point in $\widetilde{O}((1 + L_g + \rho^{-1})\epsilon^{-2})$ iterations; with trust-region or diminishing-radius strategies, the complexity is $\widetilde{O}((1 + L_g)\epsilon^{-2})$ (Lyu et al., 2020, Li et al., 2023).

Blockwise-MM with Riemannian or Euclidean prox-surrogates achieves the same $\mathcal{O}(\epsilon^{-2})$ rate up to logarithmic factors (Li et al., 2023), provided geodesically smooth surrogates and bounded sublevel sets. When block variables lie on Grassmann or Stiefel manifolds, convergence holds under geodesic convexity and majorant surplus conditions (Lopez et al., 2024, Li et al., 2023).

For nonconvex $F$ , convergence to critical points requires either the Kurdyka–Łojasiewicz property or sufficiently regular surrogate design (Hien et al., 2021). Distributed and parallel blockwise-MM implementations maintain these guarantees when the majorization and optimality gap properties are preserved (Nguyen et al., 2016).

The next table summarizes key convergence conditions:

Condition	Guarantee	Reference
Strongly convex, smooth surrogates	$\mathcal{O}(\epsilon^{-2})$ iter.	(Lyu et al., 2020)
Diminishing radius trust-region	$\mathcal{O}(\epsilon^{-2})$ iter.	(Lyu et al., 2020)
Geodesically smooth/convex surrogates	Stationary point convergence	(Li et al., 2023)
KL property, bounded iterates	Global convergence	(Hien et al., 2021)

6. Representative Algorithms and Applications

Numerous classical and modern algorithms are concrete realizations or special cases of blockwise-MM:

Expectation–Maximization (EM): Single-block minorize-maximize, often with Jensen surrogates (Lyu et al., 2020).
Multiplicative Updates for NMF: Blockwise MM with Jensen-type or diagonal surrogates, admits accelerated variants for $\beta$ -divergence objectives (Hien et al., 2024, Hien et al., 2021).
Block Coordinate Descent and Proximal Point: Special case with quadratic (Euclidean or Bregman) surrogates (Lyu et al., 2020, Li et al., 2023).
Nonnegative Tensor Decomposition: Blockwise updates of factors using convex surrogates (Lyu et al., 2020, Li et al., 2023).
Heteroscedastic Regression: Partition parameters into regression and variance blocks, alternating closed-form updates via quadratic minorizing surrogates (Nguyen et al., 2016).
Riemannian MM, Subspace Tracking, Robust PCA: Blocks on Stiefel or Grassmann manifolds, surrogate minimization via geodesic quadratic majorants and manifold projections (Li et al., 2023, Lopez et al., 2024).
Coordinatewise Soft-thresholding: Blockwise MM for sparsity-promoting nonsmooth penalties, each block update in closed form (Schifano et al., 2010).

Empirical benchmarks confirm that blockwise-MM scales well in distributed regimes, especially for large-scale regression and matrix/tensor factorization problems, with acceleration from extrapolation and trust-region strategies (Nguyen et al., 2016, Hien et al., 2021).

7. Extensions and Advanced Topics

Blockwise-MM admits several advanced adaptions:

Bregman Surrogates and Relative Smoothness: Allows non-Euclidean geometry and more general smoothness structures; crucial for efficient updates in penalized nonnegative matrix factorization and related problems (Hien et al., 2021).
Adaptive Momentum and SQUAREM: SQUAREM acceleration for MM maps, exploiting the quasi-Newton direction to cut convergence time dramatically (Schifano et al., 2010).
Manifold Constraints: Surrogates leveraging geodesically convex models and retractions permit application to constrained subspace learning, dictionary learning, and information geometry (Li et al., 2023, Lopez et al., 2024).
Trust-Region Methods: Diminishing radius strategies yield strong complexity bounds even when surrogates are not uniformly strongly convex (Lyu et al., 2020).
Mirror-Descent Interpretation: Connects blockwise-MM with coordinate mirror descent, providing theoretical unity and justifying the utility of Bregman divergences (Hien et al., 2024).

A plausible implication is that further intersection with adaptive optimization and Riemannian geometry will yield even broader applicability and sharper complexity guarantees.