High-Dimensional Statistical Process Control via Manifold Fitting and Learning

Published 24 Sep 2025 in stat.ML, cs.LG, and stat.AP | (2509.19820v1)

Abstract: We address the Statistical Process Control (SPC) of high-dimensional, dynamic industrial processes from two complementary perspectives: manifold fitting and manifold learning, both of which assume data lies on an underlying nonlinear, lower dimensional space. We propose two distinct monitoring frameworks for online or 'phase II' Statistical Process Control (SPC). The first method leverages state-of-the-art techniques in manifold fitting to accurately approximate the manifold where the data resides within the ambient high-dimensional space. It then monitors deviations from this manifold using a novel scalar distribution-free control chart. In contrast, the second method adopts a more traditional approach, akin to those used in linear dimensionality reduction SPC techniques, by first embedding the data into a lower-dimensional space before monitoring the embedded observations. We prove how both methods provide a controllable Type I error probability, after which they are contrasted for their corresponding fault detection ability. Extensive numerical experiments on a synthetic process and on a replicated Tennessee Eastman Process show that the conceptually simpler manifold-fitting approach achieves performance competitive with, and sometimes superior to, the more classical lower-dimensional manifold monitoring methods. In addition, we demonstrate the practical applicability of the proposed manifold-fitting approach by successfully detecting surface anomalies in a real image dataset of electrical commutators.

Abstract PDF Upgrade to Chat

Summary

The paper presents a novel manifold fitting approach that directly estimates a data manifold to robustly monitor deviations in high-dimensional settings.
It develops a complementary manifold learning framework that embeds observations into a lower-dimensional space for effective online control.
Empirical evaluations on synthetic and real-world datasets demonstrate superior fault detection power and practical applicability in industrial applications.

High-Dimensional Statistical Process Control via Manifold Fitting and Learning

Introduction and Motivation

This paper addresses the challenge of Statistical Process Control (SPC) in high-dimensional, dynamic industrial processes, where the data are assumed to reside on a nonlinear, lower-dimensional manifold embedded in a high-dimensional ambient space. Traditional multivariate SPC methods, such as Hotelling’s $T^2$ chart, suffer from diminished fault detection power as dimensionality increases, and are often inapplicable when the number of Phase I observations is less than the data dimension. The authors propose two complementary frameworks for online (Phase II) SPC: (1) a manifold fitting approach that directly estimates the data manifold in the ambient space and monitors deviations from it, and (2) a manifold learning approach that embeds the data into a lower-dimensional space and monitors the embedded observations. Both frameworks are designed to provide controllable Type I error rates and are evaluated on synthetic and real-world datasets.

Figure 1: Manifold embedding and fitting, adapted from Yao et al. (2023), illustrating the distinction between direct manifold fitting in ambient space and embedding via manifold learning.

Manifold Fitting-Based SPC Framework

Theoretical Foundations

The manifold fitting approach is predicated on the assumption that, under statistical control and small noise, process observations $Y_t$ lie near a compact, twice-differentiable manifold $\mathcal{M} \subset \mathbb{R}^D$ of intrinsic dimension $d < D$ . The key monitoring statistic is the Euclidean distance from each observation to the manifold, $\text{dist}(Y_t, \mathcal{M}) = \inf_{x \in \mathcal{M}} \|Y_t - x\|_2$ . This scalar statistic enables the use of a univariate control chart, circumventing the curse of dimensionality and the need for explicit distributional assumptions.

Manifold Fitting Algorithm

The manifold fitting procedure is based on the method of Yao et al. (2023), which achieves a Hausdorff estimation error of $O(\sigma^2 \log(1/\sigma))$ with $m = O(\sigma^{-(d+3)})$ samples, without requiring prior knowledge of $d$ . The algorithm consists of:

Contraction Direction Estimation: For each point $z$ , construct a Euclidean ball $\mathcal{B}_D(z, r_0)$ and compute a weighted average of points within the ball to estimate the direction toward the manifold.
Local Contraction: Construct a hyper-cylinder aligned with the estimated direction and compute a weighted average of points within the cylinder to estimate the projection $\hat{\pi}(z)$ onto the manifold.
Noise Estimation: Iteratively estimate the noise level $\sigma$ from the residuals of projections.

These steps yield a smooth submanifold $\hat{\mathcal{M}}$ approximating $\mathcal{M}$ , with theoretical guarantees on the estimation error.

Figure 2: Trajectory of $X_t \in \mathcal{M}$ , $Y_t \in \mathbb{R}^3$ , and projections $\hat{\pi}(Y_t)$ for a synthetic process on a 2-sphere.

Distribution-Free Control Chart

The deviations $||Y_t - \hat{\pi}(Y_t)||_2$ are monitored using a novel distribution-free EWMA control chart (UDFM), which employs a rolling window and a rank-based test statistic. The control limits are set via permutation tests to maintain a desired false alarm rate $\alpha$ , and the run length under the null hypothesis follows a geometric distribution. Temporal dependencies in the deviations are addressed by prewhitening via AR modeling.

Figure 3: $||Y_t - \pi(Y_t)||_2$ and $||Y_t - \hat{\pi}(Y_t)||_2$ for synthetic data, illustrating the behavior of true and estimated deviations.

Manifold Learning-Based SPC Framework

Embedding and Monitoring

The manifold learning approach leverages Laplacian-based dimensionality reduction methods (LPP, NPE) to approximate an embedding function $\hat{f}: \mathcal{M} \to \mathbb{R}^d$ . The embedded observations are monitored in the lower-dimensional space, with temporal dependencies filtered via univariate AR models. The DFEWMA multivariate control chart is used for monitoring, with control limits set to achieve a specified in-control ARL.

Out-of-Sample Extension and Limitations

Unlike nonlinear manifold learning methods lacking explicit out-of-sample mappings, LPP and NPE provide linear approximations to the Laplace-Beltrami eigenfunctions, enabling online monitoring. However, these methods require $m > D$ for the generalized eigenvalue problem to be well-posed, limiting their applicability in extreme high-dimensional settings.

Figure 4: PCA elbow plot for the Tennessee Eastman process, indicating intrinsic dimensionality.

Performance Analysis

Synthetic Process on a 2-Sphere

Simulations on a synthetic process evolving on a 2-sphere in $\mathbb{R}^6$ demonstrate that the manifold fitting approach (MF) detects mean shifts in all ambient dimensions, whereas manifold learning methods (LPP, NPE, PCA) fail to detect shifts outside the embedded subspace. MF achieves lower out-of-control ARL, indicating superior detection power.

Tennessee Eastman Process

Experiments on a replicated Tennessee Eastman process ( $D=300$ ) show that MF maintains controllable ARL even when $m < D$ , where manifold learning methods are ill-posed. For moderate $m > D$ , MF outperforms LPP and NPE in detecting large shifts, while NPE is more sensitive to small shifts. PCA fails to detect faults due to the nonlinear nature of the process manifold.

Figure 5: Tennessee Eastman Process schematic, illustrating the complexity and feedback loops.

Kolektor Surface-Defect Dataset

Application to the Kolektor surface-defect image dataset ( $D=720896$ ) demonstrates the practical utility of MF in anomaly detection. The UDFM control chart signals an alarm after only five defective images, with estimated deviations for defective surfaces significantly higher than for non-defective ones.

Figure 6: Example images from the Kolektor surface-defect dataset, showing non-defective and defective surfaces.

Figure 7: UDFM control chart for Kolektor dataset, with alarm triggered at the fifth defective image.

Implementation Considerations

Computational Requirements: Manifold fitting scales with the number of Phase I samples and ambient dimension, but is feasible for $D \gg m$ due to local averaging and efficient data structures.
Parameter Selection: Theoretical guarantees require careful tuning of ball and cylinder radii, which depend on estimated noise level $\sigma$ and manifold reach.
Prewhitening: AR modeling of deviations is essential for maintaining i.i.d. assumptions in control charting.
Out-of-Sample Extension: MF is applicable for $m < D$ , unlike manifold learning methods, which require $m > D$ .
Deployment: Both frameworks are suitable for real-time monitoring, with MF offering a simpler univariate chart and broader applicability.

Implications and Future Directions

The manifold fitting approach provides a robust, distribution-free method for SPC in high-dimensional, nonlinear settings, with strong theoretical guarantees and practical performance. Its ability to operate in extreme high-dimensional regimes and detect faults in directions orthogonal to the learned manifold is a significant advantage over traditional and manifold learning-based SPC methods. The framework is extensible to other types of process changes, such as covariance shifts or manifold shape changes, and can be adapted for variable attribution via analysis of projected points.

Future research may focus on:

Extending MF to detect changes in manifold geometry (e.g., curvature, topology).
Integrating deep learning-based manifold fitting for complex data types.
Developing scalable algorithms for real-time, streaming data.
Investigating theoretical properties under non-Gaussian noise and non-stationary processes.

Conclusion

This work presents two complementary frameworks for high-dimensional SPC under manifold assumptions: a direct manifold fitting approach with a univariate, distribution-free control chart, and a manifold learning approach with multivariate monitoring. Extensive experiments demonstrate that manifold fitting achieves competitive or superior fault detection, especially in extreme high-dimensional and nonlinear scenarios. The practical utility is further validated on real industrial and image datasets, establishing manifold fitting as a powerful tool for modern SPC applications.

Markdown Report Issue