Canonical Polyadic Decomposition (CPD)

Updated 9 February 2026

Canonical Polyadic Decomposition (CPD) is a tensor factorization method that expresses an N-way tensor as a sum of rank-one outer products.
It guarantees unique identifiability under mild conditions, enabling robust latent variable modeling and interpretable multiway feature extraction.
CPD finds applications in signal processing, neuroscience, and machine learning with algorithms like ALS and dGN enhancing scalability and accuracy.

The Canonical Polyadic Decomposition (CPD), also known as CANDECOMP/PARAFAC, is a fundamental multilinear algebraic tool for expressing an $N$ -way tensor as a sum of rank-one outer products. CPD preserves multi-way data structure, provides unique identifiability properties under mild conditions, and offers a direct means for compact feature extraction, latent variable modeling, and efficient representation of interaction structures. Its theoretical foundations, algorithmic developments, and applications span statistical machine learning, signal processing, neuroscience, chemometrics, scientific computing, and combinatorial optimization.

1. Formal Definition and Core Model

Given an $N$ -way tensor $X \in \mathbb{R}^{I_1\times I_2\times \dots \times I_N}$ , a rank- $R$ CPD writes $X$ as

$X \approx \sum_{r=1}^R a_r^{(1)} \circ a_r^{(2)} \circ \dots \circ a_r^{(N)},$

where $a_r^{(n)} \in \mathbb{R}^{I_n}$ and “ $\circ$ ” is the vector outer product. The minimal such $R$ is called the tensor rank. Let $A^{(n)} = [a_1^{(n)}, \dots, a_R^{(n)}]\in\mathbb{R}^{I_n \times R}$ denote the factor matrices; the shorthand $X = [[A^{(1)},\dots,A^{(N)}]]$ or $X = [A^{(1)},\dots,A^{(N)}]_R$ is commonly used (Li et al., 2014).

Elementwise, for any entry $(i_1,\dots,i_N)$ ,

$X_{i_1,\dots,i_N} \approx \sum_{r=1}^R a^{(1)}_{i_1 r}\, a^{(2)}_{i_2 r}\cdots a^{(N)}_{i_N r}.$

CPD preserves the original multidimensional structure, naturally encoding mode interactions without flattening or vectorization. In the unsupervised setting, the standard fitting loss is the least squares (Frobenius norm): $L_0(A^{(1)},...,A^{(N)}) = \frac{1}{2}\|X - \sum_{r=1}^R a_r^{(1)} \circ ... \circ a_r^{(N)}\|_F^2$ (Li et al., 2014, Haliassos et al., 2020).

2. Uniqueness, Identifiability, and Structural Guarantees

Essential uniqueness is a central feature of the CPD: under mild conditions, any two decompositions of minimal rank differ only by permutation and scaling of the $R$ components. For an $N=3$ tensor, Kruskal’s condition asserts that if $k_{A^{(1)}} + k_{A^{(2)}} + k_{A^{(3)}} \geq 2R + 2$ (where $k_{A^{(n)}}$ is the k-rank of $A^{(n)}$ ), then the CPD is unique (Domanov et al., 2013, Domanov et al., 2013, Domanov et al., 2015).

Recent advances have established deterministic conditions for uniqueness based on Khatri-Rao products of compound matrices, substantially generalizing classical Kruskal-type results. Specifically, conditions of the form $\mathcal{C}_m(A)\odot \mathcal{C}_m(B)$ having full column rank (for appropriate compounds) guarantee uniqueness of one or all factors even if none are full column rank. These criteria provide a constructive hierarchy: $(\text{Km}) \Rightarrow (\text{Cm}) \Rightarrow (\text{Um}) \Rightarrow$ (unique factor) (Domanov et al., 2013, Domanov et al., 2013). In higher order, for $N \geq 4$ , similar criteria propagate via mode reductions and Kruskal ranks (Phan et al., 2012).

In practical modeling, uniqueness ensures interpretability and identifiability of latent structure, crucial in blind source separation, multivariate statistics, and neuroscience.

3. Algorithms and Computational Methods

3.1 Alternating Least Squares (ALS) and Variants

The canonical approach for unconstrained CPD is alternating minimization or ALS. Each mode update solves a least squares subproblem on the mode- $n$ unfolding: $A^{(n)} \gets \arg\min_A \frac{1}{2}\|X_{(n)} - A V^{\{n\} T}\|_F^2,$ where $V^{\{n\}}$ is the Khatri–Rao product of all other factor matrices. Nonnegativity constraints are imposed as needed via projected or parametric updates. Enhanced stability is obtained with QR-based ALS (CP-ALS-QR), where each Khatri–Rao product is QR-factorized, avoiding explicit Gram matrix inversion and yielding improved numerical robustness (Xie et al., 24 Mar 2025).

3.2 Nonlinear Least Squares (NLS) and Damped Gauss-Newton

High-precision CPD exploits NLS solved via (damped) Gauss–Newton (dGN) iterations. For large-scale tensors, computation and storage of the full Hessian become a bottleneck. Compression using truncated multilinear SVD (MLSVD) allows dGN optimization on the reduced core, reducing per-iteration cost from $O(R^2 \sum I_n^2)$ to $O(R^2 \sum R_n^2)$ (Diniz, 2019). The dGN step is: $(J^T J + \lambda D)\Delta W = -J^T f,$ with structured block Jacobians and Tikhonov damping.

ALS is dominant in moderate scales; dGN is favored for local quadratic convergence when high fit accuracy is needed.

3.3 Random Projection and Tensor Compression

Random projection-type methods compress each mode via random sketching or SVD, followed by CPD of the resulting core tensor and back-propagation to original space. Coupled random projections (CoRAP) perform several sketches, jointly decomposed (C-CPD), increasing robustness and accuracy in large-scale, high-rank, or noisy settings (Wang et al., 2021).

Tensor network formulations (e.g., tensor train) compress higher-order CPD to a network of low-order (typically order 3) core tensors. Exact and approximate schemes relate CPD factors to TT cores, providing scalable algorithms for very high order tensors (Phan et al., 2018).

3.4 Fast High-Order and Mode-Reduction Strategies

Mode reduction accelerates high-order CPD by reshaping the $N$ -way tensor into an order-3 tensor—via structured unfolding (FCP (Phan et al., 2012) or MRCPD (Zhou et al., 2012))—then decomposing using efficient order-3 algorithms and back-projecting via Khatri–Rao product projection. These methods dramatically decrease iteration cost and memory footprint, allowing practical decomposition of tensors with large $N$ .

4. Supervised, Constrained, and Structured Extensions

4.1 Supervised CPD with Auxiliary Information

Although classical CPD is unsupervised, supervised variants embed categorical or auxiliary labels directly as a new tensor mode, fixing the corresponding factor matrix to a one-hot or identity matrix. This results in a single-step decomposition learning both the features and classifier map, eliminating the need for a post-hoc classifier (e.g., SVM). The supervised loss is minimized over all “free” factors, keeping the label mode fixed. The assignment for a new sample is made by projecting onto this class mode using the Moore–Penrose pseudoinverse (Li et al., 2014).

Supervised CPD exhibits superior robustness to noise and weak features, better stability across initializations, and improved (or at least comparable) classification accuracy compared to two-step “CPD + classifier” workflows, particularly on physiological signal data (EEG/MEG).

4.2 Regularized CPD for Supervised Learning and Statistical Modeling

In supervised learning for non-sequential data, weight tensors representing feature interactions can be compressed efficiently by CPD. The model takes the form: $f(x) = \sum_{r=1}^R \prod_{j=1}^d (\phi_j(x_j)^T a^{(j)}_{r, \cdot}),$ where $\phi_j(x_j)$ is a feature embedding. Loss is regularized with Frobenius norms and explicit penalties on higher-order interactions. CPD outperforms other tensor-network models (e.g., TT, TR) on permutation-invariant, sparse categorical, and dense regression tasks. Feature-mapping to high-dimensional normalized spaces further enhances performance (Haliassos et al., 2020).

4.3 Nonnegative, Parametric, and Coupled CPD

Domain-specific constraints, such as nonnegativity, are incorporated via parametric squaring of latent variables (e.g., $A = D \circ D$ for latent $D$ ), enabling trust-region Gauss–Newton optimization that respects the constraint by construction. Coupled CPD models allow joint decomposition of multisource tensors (e.g., HSI+MSI in hyperspectral super-resolution), enforcing shared factors with parametric NN constraints and nonlinear least squares (Liu et al., 25 Jan 2025). Such integrated frameworks yield higher accuracy and stability over projection-based or ADMM approaches.

5. Algebraic, Deterministic, and Finite Field Algorithms

5.1 Algebraic CPD via GEVD/QZ

For tensors of modest size and in exact arithmetic, algebraic CPD algorithms are available. Classical methods reduce $N=3$ CPD to a generalized eigendecomposition (GEVD) of two frontal slices, extracting factor matrices up to scale and permutation. QZ-based simplifications remove the need to compute eigenvectors directly, enhancing numerical stability and speed (Evert et al., 2022). Generalizations to cases lacking full column-rank factors invoke compound or permanental matrix pencils and nullspaces to guarantee recoverability under Kruskal-type or compound-matrix conditions (Domanov et al., 2013, Domanov et al., 2015).

Recursive generalized eigenspace decomposition (GESD) implements robust algebraic CPD by recursively extracting and splitting joint eigenspaces, greatly improving numerical stability in noisy or near-degenerate regimes (Evert et al., 2021).

5.2 CPD over Finite Fields and Border Rank

CPD theory extends over finite fields in the context of algebraic complexity and fast matrix multiplication. Exact search algorithms, based on systematic mode reductions, row-reduced basis enumeration, and polynomial-time pruning strategies, enable verification (or non-existence proof) of low-rank decompositions for tensors of modest size (Yang, 14 May 2025). These techniques feed directly into the search for new fast matrix multiplication algorithms and the study of maximum tensor rank.

6. Applications and Experimental Findings

CPD’s impact is broad:

Multivariate signal processing and neuroimaging: Preserves the structure in physiological data, enabling interpretable, multiway feature extraction and classification with improved noise robustness and efficiency over vector-based or two-step methods. Empirically, supervised CPD achieves state-of-the-art classification accuracy in EEG and MEG tasks and outperforms baselines on low-information bands (Li et al., 2014).
Supervised machine learning: CPD-based predictors provide linear scaling in dimensionality and are invariant to feature permutation, outperforming other tensor networks on sparse and dense real-world datasets (Haliassos et al., 2020).
Fast high-order computation: Mode-reduction CPD via unfolding and Khatri–Rao projections enables order-of-magnitude acceleration and memory reduction for high- $N$ tensors, maintaining accuracy close to direct algorithms and high global convergence rates (Phan et al., 2012, Zhou et al., 2012).
Hyperspectral super-resolution and multisensor data fusion: Coupled nonnegative CPD with parametric constraints delivers stable, interpretable solutions with superior data-fidelity, robust to noise and model rank mis-specification (Liu et al., 25 Jan 2025).
Tensor representation learning: CPD-SVR provides a direct path to efficient low-rank representations for high-dimensional functions (e.g., potential energy surfaces) with automatic rank compression and efficient warm-start training (Miao et al., 2024).
Algebraic complexity: CPD’s role in fast matrix multiplication and related combinatorial problems connects it to foundational questions in algebra, combinatorics, and theoretical computer science (Yang, 14 May 2025, Tichavsky, 2021).

7. Outlook, Limitations, and Extensions

CPD is underpinned by unique mathematical identifiability, scalable algorithmic advancements, and multidisciplinary applications. Limitations are recognized in scaling to extremely large $N$ or rapidly growing rank, sensitivity of algebraic routines to noise, and need for further theory in the presence of noise, constraints, and highly collinear modes.

Active areas of research include improved theoretical