Papers
Topics
Authors
Recent
Search
2000 character limit reached

Blockwise-MM Methods for Optimization

Updated 18 February 2026
  • Blockwise-MM methods are iterative optimization algorithms that partition variables into blocks and minimize majorizing surrogates for structured nonconvex problems.
  • They simplify subproblem structures, enable distributed computation, and effectively handle complex constraints, including manifold restrictions.
  • Acceleration techniques such as extrapolation and adaptive momentum enhance convergence speed and practical performance in blockwise-MM implementations.

Blockwise Minorization-Maximization (blockwise-MM) methods are a family of iterative optimization algorithms for structured nonconvex or nonsmooth problems, characterized by decomposing the variable space into blocks and, at each iteration, minimizing a majorizing surrogate function over one block while the others are held fixed. This blockwise strategy simplifies subproblem structure, allows for distributed computation, and enables effective treatment of complex constraints including manifold restrictions. The framework subsumes many well-known routines such as block coordinate descent, block proximal-point algorithms, block mirror descent, multiplicative updates, and the expectation-maximization algorithm.

1. Problem Classes and Formalisms

Blockwise-MM targets problems of the form

minx=(x1,,xm)X=X1××XmF(x):=f(x)+i=1mgi(xi)\min_{x = (x_1, \dots, x_m) \in \mathcal{X}=\mathcal{X}_1 \times \ldots \times \mathcal{X}_m} F(x) := f(x) + \sum_{i=1}^m g_i(x_i)

where ff may be differentiable but nonconvex and the gig_i are typically proper, lower semicontinuous, possibly nonsmooth and block-separable; the feasible sets Xi\mathcal{X}_i are closed and convex or, more generally, closed subsets of Riemannian manifolds. A blockwise-MM algorithm cyclically constructs and minimizes, for each block ii, a majorizing convex surrogate Mi(;x)M_i(\cdot; x) for the partial objective with other coordinates fixed (Hien et al., 2021, Hien et al., 2024, Lyu et al., 2020, Li et al., 2023).

Generalization to maximization/minorization is immediate by substituting a minorizing surrogate. Extensions exist to accommodate multiconvex, constrained, nonsmooth, or manifold-valued block variables (Lopez et al., 2024, Li et al., 2023, Lyu et al., 2020).

2. Construction of Blockwise Surrogates

For each block xix_i at iteration kk, the blockwise-MM scheme constructs a surrogate Mi(yi;x)M_i(y_i; x) satisfying:

  • Tangency: Mi(xi;x)=f(x)M_i(x_i; x) = f(x),
  • Majorization: Mi(yi;x)f(x<i,yi,x>i)M_i(y_i; x) \ge f(x_{<i}, y_i, x_{>i}) for all yiy_i in Xi\mathcal{X}_i,
  • Convexity: Mi(;x)M_i(\cdot; x) is convex (or geodesically convex) on Xi\mathcal{X}_i.

Surrogates are typically based on blockwise Taylor expansions (with or without Bregman distances), quadratic upper bounds, or Jensen-type inequalities (as in NMF), and are sometimes further regularized with strongly convex or Bregman terms to ensure sufficient descent and facilitate convergence analysis (Hien et al., 2021, Lyu et al., 2020).

When blocks are constrained to manifolds, surrogate construction often involves Taylor approximation along geodesics and appropriate retraction maps (Lopez et al., 2024, Li et al., 2023).

3. Blockwise Update and Algorithmic Structure

The standard blockwise-MM update for block ii at iteration kk is: xik+1argminyiXiMi(yi;xk)x_i^{k+1} \in \arg\min_{y_i\in\mathcal{X}_i} M_i(y_i; x^k) where xkx^k denotes the current iterate.

A full cycle updates blocks either cyclically or with another fixed order, setting xk+1=(x1k+1,,xmk+1)x^{k+1} = (x_1^{k+1}, \ldots, x_m^{k+1}). The process admits parallelization across data blocks and can integrate inexact or approximate solutions as long as the resulting optimality gaps are summable (Lyu et al., 2020).

For surrogates equipped with trust-region constraints or diminishing radii, the update is restricted to local neighborhoods: xik+1argminyiXi,yixikrkMi(yi;xk)x_i^{k+1} \in \arg\min_{y_i\in\mathcal{X}_i, \|y_i - x_i^k\| \le r_k} M_i(y_i; x^k) where {rk}\{r_k\} is a sequence of radii with krk=\sum_k r_k = \infty, krk2<\sum_k r_k^2 < \infty (Lyu et al., 2020).

A prototypical pseudocode for blockwise-MM is:

1
2
3
4
5
6
7
Initialize x^(0)
for k = 0,1,2,...
    for i = 1,...,m
        Compute surrogate M_i(y_i; x^k)
        x_i^{k+1}  argmin_{y_i  X_i} M_i(y_i; x^k)
    end
end

4. Acceleration: Extrapolation, Mirror Descent, and Adaptive Momentum

Acceleration of blockwise-MM is realized via Nesterov-type extrapolation, which involves a momentum term computed using previous block iterates. For block ii: yik=xik+βik(xikxik1)y_i^k = x_i^k + \beta_i^k (x_i^k - x_i^{k-1}) where βik\beta_i^k is adaptively chosen (often with a Nesterov recurrence and backtracking) to ensure Bregman divergence control and convergence (Hien et al., 2021, Hien et al., 2024). The extrapolated point yiky_i^k is then used as the linearization point or initial value in the blockwise minimization.

Blockwise-MM with Bregman surrogates and extrapolation—e.g., the BMME or BMMe algorithms—can be interpreted as a multi-block mirror descent, with the update for each block given by: xik+1=argminxiXiif(yik,xik),xiyik+Dhi(xi,yik)x_i^{k+1} = \arg\min_{x_i \in \mathcal{X}_i} \big\langle \nabla_i f(y_i^k, x_{-i}^k), x_i - y_i^k \big\rangle + D_{h_i}(x_i, y_i^k) where hih_i is a strongly convex kernel generating a Bregman divergence and the step size is implicitly absorbed (Hien et al., 2024).

Practical schemes use adaptive or scheduled decay of extrapolation coefficients to ensure stability and theoretical guarantees (Hien et al., 2024, Hien et al., 2021).

5. Convergence Theory and Complexity

Convergence analyses of blockwise-MM methods rely on the properties of the surrogates:

  1. Monotonicity: Each update guarantees F(xk+1)F(xk)F(x^{k+1}) \leq F(x^k) (or non-increasing for maximization).
  2. Stationarity: Every limit point is a stationary (first-order critical) point of FF.
  3. Complexity: For surrogates that are ρ\rho-strongly convex and LgL_g-smooth, blockwise-MM attains an ϵ\epsilon-stationary point in O~((1+Lg+ρ1)ϵ2)\widetilde{O}((1 + L_g + \rho^{-1})\epsilon^{-2}) iterations; with trust-region or diminishing-radius strategies, the complexity is O~((1+Lg)ϵ2)\widetilde{O}((1 + L_g)\epsilon^{-2}) (Lyu et al., 2020, Li et al., 2023).

Blockwise-MM with Riemannian or Euclidean prox-surrogates achieves the same O(ϵ2)\mathcal{O}(\epsilon^{-2}) rate up to logarithmic factors (Li et al., 2023), provided geodesically smooth surrogates and bounded sublevel sets. When block variables lie on Grassmann or Stiefel manifolds, convergence holds under geodesic convexity and majorant surplus conditions (Lopez et al., 2024, Li et al., 2023).

For nonconvex FF, convergence to critical points requires either the Kurdyka–Łojasiewicz property or sufficiently regular surrogate design (Hien et al., 2021). Distributed and parallel blockwise-MM implementations maintain these guarantees when the majorization and optimality gap properties are preserved (Nguyen et al., 2016).

The next table summarizes key convergence conditions:

Condition Guarantee Reference
Strongly convex, smooth surrogates O(ϵ2)\mathcal{O}(\epsilon^{-2}) iter. (Lyu et al., 2020)
Diminishing radius trust-region O(ϵ2)\mathcal{O}(\epsilon^{-2}) iter. (Lyu et al., 2020)
Geodesically smooth/convex surrogates Stationary point convergence (Li et al., 2023)
KL property, bounded iterates Global convergence (Hien et al., 2021)

6. Representative Algorithms and Applications

Numerous classical and modern algorithms are concrete realizations or special cases of blockwise-MM:

Empirical benchmarks confirm that blockwise-MM scales well in distributed regimes, especially for large-scale regression and matrix/tensor factorization problems, with acceleration from extrapolation and trust-region strategies (Nguyen et al., 2016, Hien et al., 2021).

7. Extensions and Advanced Topics

Blockwise-MM admits several advanced adaptions:

  • Bregman Surrogates and Relative Smoothness: Allows non-Euclidean geometry and more general smoothness structures; crucial for efficient updates in penalized nonnegative matrix factorization and related problems (Hien et al., 2021).
  • Adaptive Momentum and SQUAREM: SQUAREM acceleration for MM maps, exploiting the quasi-Newton direction to cut convergence time dramatically (Schifano et al., 2010).
  • Manifold Constraints: Surrogates leveraging geodesically convex models and retractions permit application to constrained subspace learning, dictionary learning, and information geometry (Li et al., 2023, Lopez et al., 2024).
  • Trust-Region Methods: Diminishing radius strategies yield strong complexity bounds even when surrogates are not uniformly strongly convex (Lyu et al., 2020).
  • Mirror-Descent Interpretation: Connects blockwise-MM with coordinate mirror descent, providing theoretical unity and justifying the utility of Bregman divergences (Hien et al., 2024).

A plausible implication is that further intersection with adaptive optimization and Riemannian geometry will yield even broader applicability and sharper complexity guarantees.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Blockwise Minorization-Maximization (blockwise-MM) Methods.