Expanded AO: Enhanced Nonconvex Optimization
- Expanded AO is a nonconvex optimization method that augments classical alternating optimization with targeted subspace escapes to avoid saddle points and poor local minima.
- It integrates scaling subspaces and restricted joint search strategies, using gradient and Hessian information to improve solution quality in matrix factorization and penalized regression.
- The technique ensures monotonic descent of the objective and allows for practical, problem-specific customizations, leading to faster convergence and enhanced performance.
Expanded Alternating Optimization (Expanded AO) is a technique for enhancing the performance of classical alternating optimization (AO)—especially for nonconvex problems—by supplementing conventional blockwise minimization with targeted subspace escapes. By judiciously choosing expanded subspaces for search at each iteration, Expanded AO addresses the well-known issue of coordinate descent becoming trapped in saddle points or poor-quality local minima, leading to improved objective values and faster convergence in problems such as matrix factorization and penalized regression (Murdoch et al., 2014).
1. Fundamentals of Alternating Optimization
Classical alternating optimization (AO), also known as blockwise coordinate descent, iteratively minimizes a multivariate nonconvex function over one block variable at a time, keeping others fixed. The standard AO update at iteration for block is: This process cycles through all blocks until convergence. However, for nonconvex , AO can become stuck at stationary points that are not local minima. For example, in
the origin is a saddle—AO cannot escape once stuck.
2. Core Principles of Expanded AO
The central idea in Expanded AO is to augment the search directions considered by AO, temporarily optimizing over multi-dimensional subspaces that contain the original coordinate update direction but introduce additional, problem-informed directions. Two broad strategies are used:
- Scaling (Perspective Variable) Subspace: At each AO step for variable , introduce a scalar parameter and jointly optimize:
0
This yields a two-dimensional subspace, extending the classical coordinate line to allow for simultaneous scaling of other variables.
- Restricted Joint Search: Select a subset of blocks and define search directions 1. In a single step, solve for the optimal coefficients 2 in:
3
where 4 are binary selectors indicating which blocks participate.
Choosing 5 can be random or, more effectively, based on greedy problem-specific directions derived from local gradients or Hessians for faster convergence and improved minimization.
3. Generic Expanded AO Algorithmic Workflow
The overall workflow for Expanded AO consists of alternating standard AO cycles with expanded subspace “escape” cycles:
- Run standard AO updates to a fixed point.
- For each block, perform a scaling subspace search or a joint restricted search as above.
- If the updated point achieves an objective decrease exceeding tolerance 6, repeat; else, declare convergence.
Pseudocode as stated in (Murdoch et al., 2014): 9 EscapeSteps involves cycling through scaling and/or joint searches over selected subspaces.
4. Applications to Matrix Factorization and Penalized Regression
Expanded AO has been concretely validated on two nonconvex optimization domains:
Matrix Factorization (MF)
Given observed ratings 7, the standard AO factors the matrix into 8 and 9 with ridge regularization: 0 Expanded AO applies scaling steps (jointly optimizing 1 or 2 and a global scalar) or greedy restricted joint updates on small subsets of user/item vectors, yielding faster convergence and lower mean absolute error (MAE).
Penalized Regression (MC+)
In coordinate descent for MC+ regression (Minimax Concave Penalty), Expanded AO introduces joint scaling over subsets of coefficients, or selectively scales variables based on correlation thresholds. This leads to improved objective values and variable-selection accuracy, especially at grid points where coordinate descent is highly suboptimal.
5. Theoretical Properties and Computational Aspects
Each expanded subspace step in Expanded AO strictly reduces the objective, ensuring monotonic descent and convergence to a stationary point. The method does not guarantee global optimality—no such guarantee exists for generic nonconvex 3—but empirical results show significant improvements over baseline AO in both speed and final objective quality.
Computational overhead per iteration scales with the size and complexity of the chosen escape subspaces. Greedy or problem-adaptive subspaces offer a favorable trade-off between compute cost and optimization progress.
6. Empirical Performance and Observed Benefits
On the Amazon matrix factorization task (4K, 5K, 6), greedy restricted subspace AO reduced test MAE by up to 0.12 versus baseline and converged in a quarter the iterations of random subspace updates. In MC+ simulations (7, 8), selective scaling steps reduced objective values by 5% and variable-selection error by 2% for a substantial fraction of hyperparameter grid points (Murdoch et al., 2014).
7. Practical Considerations and Customization
Expanded AO is a generic method, but its greatest gains are realized when escape subspaces are customized using problem structure or data-driven heuristics. While random subspaces yield some improvement, greedy direction selection amplifies the algorithm’s advantage in both rate and final objective. The method is compatible with classical AO frameworks and is readily implementable in large-scale scientific computing and machine learning contexts.