Partial Least Squares (PLS) Analysis
- Partial Least Squares (PLS) is a supervised dimension reduction technique that extracts latent variables by maximizing covariance between predictors and responses.
- PLS overcomes multicollinearity and high-dimensional challenges by constructing an optimal low-dimensional latent space and mitigating overfitting.
- Extensions such as sparse, kernel, Bayesian, and robust PLS enhance its applicability in fields like chemometrics, genomics, and macroeconomic forecasting.
Partial Least Squares (PLS) is a supervised dimension reduction technique designed to handle multicollinearity and high-dimensional regression problems by extracting latent variables (scores) from the predictor and response spaces that maximize their covariance. Unlike ordinary least squares (OLS), which estimates conditional means and is sensitive to collinearity, PLS constructs a low-dimensional latent space that is optimal for prediction, maintaining robustness even when the number of predictors exceeds the number of observations. The method has a rich algebraic, statistical, and algorithmic structure and has given rise to numerous variants and extensions for handling robustness, sparsity, functional data, kernel methods, and more.
1. Mathematical Formulation and Algorithmic Framework
PLS decomposes the predictor matrix and response matrix into latent score matrices and , with associated loading matrices and , via
where , and , are residuals. The key property is that each extracted PLS component (column of ) is an orthogonal linear combination of that maximizes covariance with a corresponding direction in .
The classical NIPALS-PLS algorithm (for possibly multivariate) operates iteratively:
- At step , compute the cross-covariance .
- Extract the leading eigenvector of (subject to ).
- Compute the -score .
- Form loadings and .
- Deflate: , .
Upon extracting components, is regressed onto (typically via OLS), and the solution is back-projected to -space: where collects the weight vectors and are regression coefficients of on (Civieta et al., 2021).
2. Algebraic and Statistical Properties
Dimensionality and Collinearity: PLS avoids direct inversion of by operating in the orthogonal, low-dimensional -space. This allows PLS to provide stable solutions when , as only components are extracted and overfitting is mitigated.
Bias-Variance Trade-off: Truncating the expansion at components acts as regularization, shrinking unstable directions in towards zero. As grows, the estimate interpolates between a highly regularized solution and OLS.
Supervised Extraction: Unlike principal component regression, PLS components embed the -dependence at each extraction by maximizing cross-covariance, enhancing predictive efficiency especially when the signal is not strongly aligned with top variance directions in (Civieta et al., 2021, Assunção et al., 2024, Blazère et al., 2014).
Krylov Subspace Equivalence: The -step PLS solution is the LS fit restricted to the Krylov subspace ; equivalently, it is the coefficient vector closest to in the Mahalanobis metric, subject to this subspace constraint (Val et al., 2023).
Polynomial Framework: All PLS estimators and residuals can be expressed via discrete orthogonal “residual” polynomials defined through the spectrum of and the projections of the data and noise. This framework fully characterizes the shrinkage/expansion properties of PLS filter factors, the empirical risk, and the monotonic convergence of PLS to OLS as increases (Blazère et al., 2014, Blazère et al., 2014).
3. Extensions and Robust Variants
Partial Quantile Regression (fPQR): PLS can be robustified by replacing mean-based covariance with quantile covariance, enabling direct modeling of quantiles of . The fPQR algorithm replaces the covariance maximization step with: where denotes the quantile covariance for level , leading to robust dimension reduction and quantile regression on the extracted scores. This provides resilience to outliers and heavy-tailed distributions, and by varying one can estimate tails or medians (Civieta et al., 2021).
Sparse and Group-Sparse PLS: Variable selection is achieved by incorporating or group penalties into the weight extraction step (e.g., Regularized PLS, Sparse PLS, Group PLS, Jointly Sparse Global SIMPLS). These variants encourage sparsity either per component or jointly across components, leading to more interpretable models and improved generalization in high-dimensional, noisy regimes. Augmented Lagrangian methods and block coordinate descent are effective strategies for solving these penalized formulations (Allen et al., 2012, Liu et al., 2014, Micheaux et al., 2017).
Functional, Kernel, and Riemannian PLS: Functional PLS generalizes the framework to and in Hilbert spaces, with directions extracted via covariance operators. Kernel PLS introduces nonlinearity by mapping to RKHS and performing PLS in feature space, with recent advances allowing hyperparameter optimization via Kernel Flows (KF-PLS). Extensions to Riemannian manifolds (R-PLS) allow covariance maximization for manifold-valued predictors and responses, such as symmetric positive-definite matrix data (Delaigle et al., 2012, Duma et al., 2023, Ryan et al., 2023).
Ordinal and Dependent Data: PLS algorithms have been adapted to consistently handle ordinal predictors (OPLS, leveraging polychoric correlations) and to remain consistent under temporal or spatial dependence by explicit pre-whitening with a covariance model (covariance-corrected PLS) (Cantaluppi, 2012, Singer et al., 2015).
Bayesian and Probabilistic PLS: Bayesian formulations of PLS (BPLS, PPLS) introduce explicit latent-variable models with shrinkage priors, enabling inference on parameters, uncertainty quantification, and automatic determination of the number of latent components. These frameworks extend PLS's application to situations requiring coherent predictive intervals and formal statistical inference (Bouhaddani et al., 2017, Urbas et al., 2023).
4. Computational Considerations and Scalability
PLS is intrinsically scalable due to its iterative componentwise nature and ability to work in spaces of size or less. For extremely high-dimensional or large-sample data, several approaches enhance scalability:
- Incremental PLS algorithms (CIPLS) update projections on a per-sample basis, supporting streaming and massive datasets without forming dense covariance matrices (Jordao et al., 2019).
- Grammar-compressed PLS (cPLS) enables storage and computation in compressed representations, using efficient row and column access on the fly (Tabei et al., 2016).
- Parallel and distributed algorithms harness chunked or split-merge SVD updates, incremental cross-covariance calculations, and batched processing to handle datasets with billions of entries (Micheaux et al., 2017).
- Cross-validation and early stopping offer practical tools for selecting the number of components, balancing bias and variance, and ensuring regularization (Assunção et al., 2024).
5. Theoretical Analysis, Consistency, and Prediction Bounds
PLS has been analyzed through several theoretical lenses:
- Non-asymptotic analysis provides high-probability bounds on prediction error in high-dimensional regimes, quantifying excess risk in terms of signal-to-noise, eigenvalue “inertia,” and the alignment of the true signal with the leading PLS subspace. Sparse PLS rates match the lasso up to restricted eigenvalue constants (Castelli et al., 2023).
- Orthogonal polynomial theory yields exact expressions for residual norms, bias, variance, and convergence rates, with exponential decay of empirical and predictive risk as a function of the condition number and the number of components (Blazère et al., 2014, Blazère et al., 2014).
- Spectral bounds demonstrate that the rate at which PLS approaches OLS and its predictive accuracy are governed by the clustering of the eigenvalues of ; fewer clusters imply PLS matches OLS with fewer components (Val et al., 2023).
- Functional data theory ensures consistency and rates for functional PLS estimators under regularity and eigenvalue spacing assumptions, with practical improvements over functional principal component regression when the regression signal does not align with high-variance components (Delaigle et al., 2012).
6. Empirical Performance and Application Domains
PLS has demonstrated strong performance relative to OLS, ridge, and lasso, particularly in scenarios with limited sample size, high predictor dimension, or severe collinearity:
- In macroeconomic forecasting (e.g., quarterly GDP), single-component PLS (marginal regression) substantially outperforms OLS, ridge, and lasso in periods of structural shocks or high volatility, while retaining similar accuracy in stable conditions (Assunção et al., 2024).
- In chemometrics, genomics, proteomics, and sensor analytics, sparse and group-sparse PLS variants achieve superior predictive performance and interpretability, often using orders of magnitude fewer variables than classical PLS or unpenalized regression (Liu et al., 2014, Allen et al., 2012).
- Robust and quantile PLS methods (fPQR) supply conditional quantile estimates and resilience to outliers, outperforming mean-based PLS in heavy-tailed noise scenarios (Civieta et al., 2021).
- PLS has been generalized for high-dimensional image and connectomic data via manifold, kernel, and deep-learning extensions, providing both algorithmic scale and flexibility in capturing complex relationships (Polson et al., 2021, Duma et al., 2023, Ryan et al., 2023).
7. Contemporary Directions and Open Challenges
- Automated Regularization and Model Selection: Bayesian and information-theoretic approaches for choosing the number of components and controlling overfitting are active topics, especially in small-sample or multivariate settings (Urbas et al., 2023).
- Extension to Nonlinear, Structured, and Non-Euclidean Data: Deep learning PLS (DL-PLS), kernel PLS, and Riemannian PLS frameworks extend the capacity of PLS to nonlinear and geometric domains, integrating traditional diagnostic tools (scree plots, biplots) with novel statistical guarantees (Polson et al., 2021, Duma et al., 2023).
- Integration with Dependent and Ordinal Data: Continued refinement of methods for structured response types, ordinal regression, and dependent samples enhances PLS's range of empirical validity (Singer et al., 2015, Cantaluppi, 2012).
- Unified Theoretical Understanding: The orthogonal polynomial and spectral perspectives provide avenues for unifying classical, sparse, and robust variants, suggesting new opportunities for sharper bounds, bias–variance analysis, and efficient algorithms (Blazère et al., 2014, Blazère et al., 2014).
PLS remains a foundational tool for supervised dimension reduction, with a mature theoretical apparatus underpinning its robustness, regularization, and extensibility to modern data modalities. The method's continued evolution reflects its broad applicability and the persistent need for interpretable, scalable, and effective supervised learning in high-dimensional regimes.