Elastic Tensor Factorization

Updated 10 January 2026

Elastic tensor factorization is a method that integrates CP/PARAFAC decomposition with elastic net regularization to enforce both sparsity and low-rank constraints.
It employs block coordinate descent and stochastic optimization to efficiently balance data fidelity with regularization, enabling simultaneous model selection.
Empirical evaluations demonstrate high recovery scores and improved clustering accuracy in synthetic, imaging, and clinical datasets compared to standard approaches.

Elastic tensor factorization refers to a class of tensor factorization methods that incorporate both sparsity and low-rank structure via elastic net–type regularization, enabling simultaneous model selection on both the rank and factor sparsity of tensor decompositions. This approach is notably instantiated in the context of the CANDECOMP/PARAFAC (CP) model, where the combination of $\ell_1$ and $\ell_2$ regularization on the factor matrices results in robust recovery even under noise and missing data, as well as controllable trade-offs between sparsity, rank, and reconstruction error (Shi et al., 2017).

1. The Elastic Net Regularized PARAFAC Model

The canonical model underlying elastic tensor factorization is the CP/PARAFAC decomposition of an $N$ -way tensor $\mathcal{X}\in\mathbb{R}^{I_1\times I_2\times\cdots\times I_N}$ as

$\mathcal{X} = [\![A^{(1)},\,A^{(2)},\dots,A^{(N)}]\!] = \sum_{r=1}^R a^{(1)}_r\circ a^{(2)}_r\circ \cdots \circ a^{(N)}_r,$

where $A^{(n)}\in\mathbb{R}^{I_n\times R}$ and $a^{(n)}_r$ is the $r$ -th column. The elastic tensor factorization framework extends this model by assuming access to (possibly incomplete and noisy) observations $\mathcal{Z} = \mathcal{X} + \mathcal{E}$ masked by $\Delta$ , and estimating the factor matrices by minimizing a penalized loss with elastic net regularization:

$\min_{\{A^{(n)}\}}\, \frac12\| (\mathcal{Z} - [\![A^{(n)}]\!])\circ\Delta \|_F^2 + \lambda \sum_{n=1}^N\sum_{r=1}^R \left[\frac{1-\alpha}{2}(a^{(n)}_r)^\top R_n^{-1}a^{(n)}_r + \alpha\|a^{(n)}_r\|_1\right],$

where $\lambda>0$ trades off regularization and data fidelity, and $\alpha\in(0,1)$ tunes between sparsity ( $\ell_1$ ) and ridge ( $\ell_2$ ) penalties (Shi et al., 2017).

2. Bayesian Formulation and Model Selection

The elastic tensor factorization objective arises as the MAP estimator under a Gaussian likelihood for the observed entries and a factor-column prior given by a product of zero-mean Gaussian and Laplace distributions. The trace constraint $\mathrm{Tr}(R_n) = \theta$ is imposed across modes for identifiability, and the elastic net controls the effective tensor rank and the sparsity pattern within and across factors. As the regularization parameter $\lambda$ increases, both the estimated tensor rank and factor matrix sparsity increase, providing a path for simultaneous model selection via regularization.

A key feature is the solution-path strategy: by computing solutions for a grid of $\lambda$ values, one tracks the stabilization of both rank and sparsity patterns. When a plausible configuration $(R^*, S^*)$ emerges, a final sparse-constrained optimization with small regularization is performed to minimize the LS error (Shi et al., 2017).

3. Algorithms: Block Coordinate and Stochastic Optimization

The principal algorithmic approaches are block coordinate descent (BCD) and its stochastic variant. For each rank- $r$ component and each mode- $n$ factor, the BCD update involves solving

$a_r^{(n)}[i] := \operatorname{sign}(u_i) \cdot \max(|u_i| - \lambda\alpha, 0)/d_i,$

where $d_i$ and $u_i$ are derived from the current residual, mask, and Khatri–Rao products of other factors. The stochastic BCD incorporates mini-batch sampling and Adamax-style adaptive steps to scale to large tensors, and convergence to a critical point is established for the full BCD under standard assumptions. The stochastic variant is heuristic but effective in practice for large-scale data.

The per-iteration complexity is $O\left(\sum_n I_n \prod_{m\neq n}I_m\right)$ , making stochastic sampling advantageous for high-dimensional tensors (Shi et al., 2017).

4. Empirical Evaluation and Applications

Elastic tensor factorization demonstrates robust recovery and model selection in synthetic and real data contexts. Key experimental findings include:

On synthetic $20\times 20\times 20$ tensors ( $R=10$ , SNR 20 dB), the elastic net solution achieves average recovery score $0.967\pm 0.121$ with negligible excess error versus standard CP–ALS ( $0.57\pm 0.36$ ) and LRTI ( $0.41\pm 0.28$ ).
In the presence of $25\%$ missing data ( $50\times50\times50$ ), the method with $\alpha=0.8$ recovers the true rank $30/30$ times versus $10/30$ for LRTI.
On COIL-20 images, clustering accuracy using the third-mode factors is $75$– $76\%$ (elastic net) versus $60$– $63\%$ (CP–ALS).
For COPDGene clinical data analysis, the best solution yielded rank $328$, relative error $0.137$, and $26\%$ factor sparsity, successfully identifying salient clinical features (Shi et al., 2017).

5. Practical Implementation Aspects

Selection of $\alpha$ in $[0.2,0.8]$ allows practitioners to trade off rank and sparsity. The regularization parameter $\lambda$ is typically explored on a logarithmic scale. Initialization strategies include random Gaussian and leading singular vectors for each mode unfolding, though the latter may not favor sparsity. Warm starts along the regularization path improve efficiency.

The mode-specific covariance matrices $R_n$ can be estimated from data via Gram matrices if unknown, using the relationship

$K_n(i,j) = X_{(n)}(i,:)\,X_{(n)}(j,:)^T \approx R\,\theta^{N-1}\,R_n, \quad \theta = \left(\frac{\|\mathcal{X}\|_F^2}{R}\right)^{1/N}.$

Stopping criteria include the relative error on held-out entries or stability of the active set $(R_\ell, S_\ell)$ (Shi et al., 2017).

6. Significance, Limitations, and Research Context

The elastic net strategy provides a convexified model selection mechanism on both rank and factor sparsity within the CP/PARAFAC framework, improving robustness to noise and missing data. Convergence guarantees are available for the BCD algorithm under standard conditions. The stochastic optimization approach offers scalability, with the practical caveat that convergence guarantees do not extend to the stochastic (Adamax) variant.

A plausible implication is that the solution path approach enables identification of both low-rank and sparse structure without extensive tuning or prior knowledge of the true model complexity. However, optimal recovery is not guaranteed in every scenario, and computational cost is higher than unregularized ALS due to the non-separability of the elastic net penalty and the need for path exploration. Further, the approach is defined concretely only for PARAFAC/CP factorization and does not address other tensor network formats.

7. Relation to Other Forms of Tensor Factorization

Elastic tensor factorization is distinct from harmonic factorization of elasticity tensors, which is concerned with equivariant decompositions of fourth-order harmonic tensors into lower-order covariants for the reconstruction and analysis of the elasticity tensor in continuum mechanics (Olive et al., 2016). The elastic net–based approach is independent of tensor symmetry classes and instead operates directly on arbitrary $N$ -way tensors through regularized CP decomposition (Shi et al., 2017).

Markdown Report Issue Upgrade to Chat

References (2)

Learning the Sparse and Low Rank PARAFAC Decomposition via the Elastic Net (2017)

Harmonic factorization and reconstruction of the elasticity tensor (2016)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Elastic Tensor Factorization.