Papers
Topics
Authors
Recent
Search
2000 character limit reached

Nonlinear Matrix Decompositions (NMD)

Updated 22 December 2025
  • Nonlinear Matrix Decompositions are methods that approximate a data matrix by applying nonlinear functions to low-rank factors, enabling enhanced modeling for sparse and nonnegative data.
  • The approach employs various algorithmic paradigms such as block coordinate descent, ADMM, and gradient-based methods, which improve convergence and computational efficiency.
  • Empirical studies show that NMD techniques yield lower reconstruction errors and better memory efficiency than traditional linear SVD methods in diverse applications.

Nonlinear Matrix Decompositions (NMD) generalize classical low-rank matrix approximations by introducing nonlinearity between latent factors and the data reconstruction. In the canonical setting, given a target rank rr and data matrix XRm×nX \in \mathbb{R}^{m \times n}, NMD seeks factors WRm×rW \in \mathbb{R}^{m \times r} and HRr×nH \in \mathbb{R}^{r \times n} such that Xf(WH)X \approx f(WH), where ff is an element-wise nonlinear function. This abstraction subsumes models with diverse application domains, including data compression, manifold learning, matrix completion, nonnegative and structured data modeling, and dynamical system analysis. Recent years have seen focused methodological development, particularly for nonlinearities related to ReLU (rectified linear unit), radial basis functions, and broader activation functions, as well as algorithmic advances facilitating large-scale and robust computation.

1. Mathematical Frameworks for NMD

The defining property of NMD is the incorporation of a nonlinear function ff in the reconstruction: minW,H  L(X,f(WH))\min_{W, H} \; L\big(X,\, f(WH)\big) where LL is a loss, usually 2\ell_2 (Frobenius), 1\ell_1, or divergence (e.g., KL) (Awari et al., 19 Dec 2025, Seraghiti et al., 2023). Choices for ff commonly found in the literature include:

  • ReLU: f(x)=max(0,x)f(x) = \max(0, x), suited for nonnegative and sparse matrices.
  • Elementwise square: f(x)=x2f(x) = x^2, applicable to probabilistic and circuit representations.
  • Bounded transform: f(x)=min(b,max(a,x))f(x) = \min(b, \max(a, x)), as in recommender systems.

An alternative family represents each matrix component with a parameterized nonlinear kernel, e.g., radial basis function (RBF) kernels,

[Φk]ij=φ(ak,i,bk,j)[\Phi_k]_{ij} = \varphi(a_{k,i}, b_{k,j})

and

Xk=1rΦkX \approx \sum_{k=1}^r \Phi_k

where φ\varphi can be Gaussian, multiquadric, etc. (Rebrova et al., 2021).

Formulations may be further modified to address explicit rank constraints (rank(WH)=r\operatorname{rank}(WH) = r), regularization (nuclear norm, Tikhonov), or robustness to missing/outlying data (Seraghiti et al., 2023, Wang et al., 2024).

2. Principal Algorithmic Paradigms

A variety of iterative optimization methods are proposed for NMD, each tailored to the structure induced by the chosen nonlinearity.

Block Coordinate and Alternating Minimization

Block coordinate descent (BCD) schemes exploit the separable structure of auxiliary variables:

  • Alternating update of nonlinear latent variable ZZ and (possibly low-rank) Θ\Theta, e.g., for ReLU models:
  • Three-block factorization: parameterize Θ=WH\Theta = WH and alternately update ZZ, WW, HH through projection and least squares (Seraghiti et al., 2023, Gillis et al., 31 Mar 2025).

Extrapolation and Momentum

Adaptive extrapolation (e.g., Nesterov, blockwise positive/negative momentum) is incorporated for acceleration:

Proximal and ADMM Approaches

Alternating Direction Method of Multipliers (ADMM) approaches handle a wide array of nonlinearities and losses:

  • Introduce an auxiliary variable Z=f(WH)Z = f(WH).
  • Minimize the augmented Lagrangian with respect to ZZ, WW, HH, and dual variable YY in an alternating scheme (Awari et al., 19 Dec 2025).
  • Proximal updates for ZZ can be computed in closed form for typical choices of ff (ReLU, square, MinMax) and LL (Frobenius, 1\ell_1, KL divergence).

Gradient-based RBF Decomposition

Gradient descent and stochastic variants (e.g., Adam) are employed for RBF-based NMD:

  • Each parameter vector in the RBF kernel is updated via backpropagated gradients of Frobenius loss (Rebrova et al., 2021).

3. Theoretical Properties and Well-Posedness

NMD problems are generally nonconvex and, for many nonlinearities (notably ReLU), nonsmooth. The following properties are established:

  • For ReLU-NMD, both the direct formulation and latent-variable (with ZZ) formulation can yield different Θ\Theta; the latent formulation may be ill-posed even when the original is not, as shown via explicit matrix examples (Gillis et al., 31 Mar 2025).
  • BCD and ADMM schemes for factorized three-block models have convergence guarantees under the boundedness of iterates and mild regularity assumptions (Kurdyka–Łojasiewicz property, subanalyticity) (Wang et al., 2024, Gillis et al., 31 Mar 2025, Awari et al., 19 Dec 2025).
  • Momentum-accelerated algorithms can be globally convergent to critical points, and blockwise use of positive or negative momentum enhances stability and acceleration (Wang et al., 2024).
  • For RBF decompositions, the non-convex loss landscape typically results in multiple critical points—many random restarts are often used in practice to find a near-optimal solution (Rebrova et al., 2021).

4. Initialization Strategies and Computational Considerations

Initialization is critical due to the nonconvexity of NMD.

  • Nuclear-norm minimization under affine constraints is used for ReLU-based problems: solve minΘΘ\min_\Theta \|\Theta\|_* with Θij=Xij\Theta_{ij}=X_{ij} for Xij>0X_{ij}>0, Θij0\Theta_{ij}\le 0 for Xij=0X_{ij}=0, followed by rank-rr SVD truncation (Seraghiti et al., 2023).
  • For RBF-NMD, small random initializations of parameter vectors are employed, with parallel random restarts (Rebrova et al., 2021).

Per-iteration complexities:

  • For three-block NMD (with factorized model), each iteration scales as O(mnr)O(m n r) (Seraghiti et al., 2023).
  • For ADMM, the total per-iteration cost is O(mnr+r3+mn)O(m n r + r^3 + m n), dominated by matrix multiplications and small matrix inversions (Awari et al., 19 Dec 2025).
  • For RBF, parameter storage is O(r(m+n))O(r(m+n)) with potential for $2$–6×6\times memory reduction versus linear SVD for matched error (Rebrova et al., 2021).

5. Empirical Performance and Comparative Studies

Extensive experiments benchmark NMD schemes across synthetic, image, text, and kernel data:

  • ReLU-NMD (including accelerated, three-block, and momentum schemes) consistently outperforms linear SVD and TSVD in reconstructing sparse nonnegative data, with reductions in error and computational time (Seraghiti et al., 2023, Gillis et al., 31 Mar 2025, Wang et al., 2024).
  • In image data (e.g., MNIST, CBCL faces), three-block NMD and momentum-accelerated NMD provide the best memory–accuracy trade-offs; NMF basis compression via NMD achieves lower error than TSVD (Seraghiti et al., 2023, Wang et al., 2024).
  • RBF-based NMD yields $2$–6×6\times reduction in memory relative to SVD for a fixed error across Gaussian, graph, and kernel matrices, and outperforms SVD visually and quantitatively for edge preservation in images (Rebrova et al., 2021).
  • ADMM-based NMD unifies diverse loss/nonlinearity combinations and achieves lower error and greater robustness to outliers/poisson noise compared to classical weighted low-rank approximation, and it is up to eight times faster than existing coordinate descent alternatives on benchmark datasets (Awari et al., 19 Dec 2025).

Sample Table: Comparative Performance of ReLU-NMD Algorithms on MNIST (r=40r=40) (Wang et al., 2024):

Method Relative Error (Tol) Time (s)
EM-NMD 0.120 20.0
A-EM 0.105 20.0
3B-NMD 0.085 15.2
A-NMD 0.078 18.1
NMD-TM (momentum) 0.080 12.5

6. Model Variants and Extensions

NMD models are not restricted to the Frobenius objective or standard nonlinearities:

  • Loss variants: 1\ell_1 and KL divergence adapt NMD to robustness and probabilistic settings (Awari et al., 19 Dec 2025).
  • Nonlinearities: the ADMM-based approach can accommodate arbitrary entrywise functions (e.g., softplus, modulus, Huber, logistic sigmoid) as long as scalar proximal updates are available (Awari et al., 19 Dec 2025).
  • Tensor decompositions: CP and Tucker analogues under ReLU and other nonlinearities are suggested as future work (Wang et al., 2024).
  • RBF-NMD offers a proximity-based intrinsic geometry, beneficial for manifold learning and unsupervised structure discovery (Rebrova et al., 2021).

NMD methods are deployed across a range of scientific and engineering domains:

  • Data compression: compact encoding of sparse or nonnegative matrices, often used for large-scale vision or text datasets (Seraghiti et al., 2023, Rebrova et al., 2021).
  • Matrix completion: robust handling of missing-not-at-random (MNAR) entries, with ReLU sampling offering superior recovery in challenging regimes (Gillis et al., 31 Mar 2025).
  • Recommender systems: bounded nonlinearities (MinMax) provide a natural fit for preference matrices with explicit upper/lower limits (Awari et al., 19 Dec 2025).
  • Robust PCA and circuit representations: square nonlinearity supports modeling of physical systems with quadratic activations (Awari et al., 19 Dec 2025).
  • Power systems: unrelated usage of the acronym NMD (Nonlinear Modal Decoupling) for dynamical stability analysis via coordinate transformation and Lyapunov-theoretic analysis; this approach is unrelated to matrix factorizations and instead focuses on modal transformation of ODE systems (Wang et al., 2018).

In summary, Nonlinear Matrix Decomposition encompasses a family of models and algorithms with demonstrated superiority over linear counterparts for structured data modeling, robust compression, and recovery in the presence of nonlinearity or sparsity. The field remains active, especially in adapting flexible optimization schemes (e.g., ADMM) for broader classes of nonlinearities and losses, integrating acceleration strategies, and extending theory on global and local optimality.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Nonlinear Matrix Decompositions (NMD).