Rank-Based Control Charts
- Rank-based control charts are nonparametric statistical tools that use the ordering of data rather than actual values to detect shifts in process location or scale.
- They employ methodologies like the Wilcoxon signed-rank statistic and sequential rank schemes (SSR-CUSUM/SSR-SR) for distribution-free monitoring and sensitive shift detection.
- Recent extensions integrate deep learning to accurately estimate control limits under complex tie scenarios, enhancing robustness and reducing false alarms.
Rank-based control charts are nonparametric statistical process control tools that rely on the ranks or order statistics of process data rather than their measured values. These charts provide distribution-free methodologies for detecting location and/or scale changes and offer robustness to non-normality, outliers, and tied observations. Rank-based approaches encompass both Shewhart-type and sequential schemes and have been algorithmically extended to handle tied data and estimation complexity via deep learning techniques.
1. Fundamentals and Motivation
Traditional Shewhart control charts assume observations are normally distributed, with location and scale estimated parametrically to set control limits (typically %%%%1%%%%). However, when the process data violate normality—due to heavy tails, skewness, or unknown distributions, or in the presence of ties caused by measurement rounding—parametric designs can exhibit inflated false alarm rates or diminished sensitivity to process shifts. Rank-based (nonparametric) control charts, including those based on Wilcoxon signed-rank, sign, and runs statistics, circumvent distributional assumptions by basing decision statistics on the relative ordering and sign relative to a target or center value (Mortezanejad et al., 26 Mar 2025).
Key properties include:
- Distribution-free performance for in-control data, valid regardless of the underlying continuous distribution (given symmetry for certain designs).
- Robustness to outliers and non-normality, due to reliance on ranks rather than values.
- Adaptability to tied data via explicit modeling of zero differences and randomization techniques.
2. Core Methodologies: Signed-Rank and Sequential Rank Procedures
Wilcoxon Signed-Rank Statistic
For a sample in subgroup with reference value , one computes difference vectors , then forms , and signs . The Wilcoxon signed-rank statistic is . Under no shift and no ties (Untied Observations, UO), the statistic has and (Mortezanejad et al., 26 Mar 2025). For data with ties (Tied Observations, TO), ties are assigned and the statistic and its moments are recomputed after excluding zero differences, requiring a binomial reduction of effective sample size.
Sequential Rank Schemes and CUSUMs
Signed Sequential Rank (SSR) CUSUMs (Lombard et al., 2017) and Shiryaev–Roberts (SSR-SR) schemes (Zyl et al., 2019) extend rank-based ideas to sequential monitoring. Given incoming data , each residual (centered at known median ) is assigned a sign and a sequential rank . The statistic is transformed using a chosen odd, square-integrable score function (e.g., Wilcoxon: ; Van der Waerden: ). For each observation, one computes a normalized score
where . The CUSUM is then updated recursively, e.g.,
with reference constant and threshold chosen to yield the desired in-control average run length (ARL). These procedures are both self-starting (no pre-estimate of variance needed) and distribution-free under symmetry (Lombard et al., 2017, Zyl et al., 2019).
SSR-SR schemes similarly employ the recursion
with being the log moment-generating function, and signal when (Zyl et al., 2019).
3. Distributional Approximation and Tied Data Handling
For untied data, the null distribution of signed-rank statistics is exactly known or Normal-approximable. When rounding or limited device resolution induces ties, signed-rank procedures must explicitly accommodate the increased occurrence of zero differences (). For the Shewhart Signed-Rank Control Chart (SS-RCC), the tied case is handled as follows (Mortezanejad et al., 26 Mar 2025):
- Define the effective nonzero sample size , distributed as Binomial.
- Remove zero-difference points and recompute ranks among remaining data.
- Moments of the tied-statistic are derived by conditional expectation on .
To approximate the null distribution of , a Scaled-Normal Distribution (SND) is constructed,
where are parameters set to match the mean, variance, skewness, and kurtosis of . Because direct moment-matching is computationally intensive for arbitrary tie patterns and data distributions, recent work applies deep learning to estimate rapidly for new configuration settings (Mortezanejad et al., 26 Mar 2025). An 11-feature MLP, trained on simulated tied-data scenarios across Johnson family distributions, yields sub-percent test relative errors for SND parameter estimation. Applicability remains contingent on sample sizes and tie structures seen during training.
4. Control Limit Design and Average Run Length Analysis
Control limits are set to assure a pre-specified type-I error rate , typically leading to an in-control ARL of 370 for -like limits. Formally,
where is the (scaled) normal cdf (untied case) or the SND-based cdf (tied case) (Mortezanejad et al., 26 Mar 2025). The ARL is in-control, and out-of-control, where is the probability that under a shifted process. For sequential charts, ARL calculations depend on the recursion law and can be obtained via Monte Carlo or analytical normal approximations (Lombard et al., 2017, Zyl et al., 2019). The SSR-SR and SSR-CUSUM charts' run length distributions are distribution-free for continuous underlying distributions and symmetric score functions.
Empirical studies across Normal and heavy-tailed Johnson distributions show that, for and moderate tie rates –$0.2$, the SS-RCC preserves ARL–623 and exhibits greater sensitivity to small shifts under ties: e.g., for shift and , ARL(TO) vs ARL(UO) (Mortezanejad et al., 26 Mar 2025).
5. Comparative Performance and Practical Recommendations
Rank-based charts demonstrate several practical advantages:
- Superior robustness to non-normality and outlier contamination over parametric Shewhart, CUSUM, and EWMA charts, with guaranteed nominal ARL under minimal assumptions.
- Enhanced detection performance in the presence of ties due to explicit statistical modeling.
- For SSR-CUSUM and SSR-SR schemes, nearly equal or better out-of-control ARL compared to normal-theory charts for small to moderate shifts; the SSR-SR scheme is particularly efficient for small persistent shifts (Lombard et al., 2017, Zyl et al., 2019).
- Simpler implementation and self-starting properties obviating the need for variance or higher-moment estimation.
Limitations exist where the data properties diverge sharply from conditions seen in training for deep learning parameterizations, or in cases of extreme tie prevalence (). The need to retrain or revalidate deep models is underscored when sampling scope or process conditions shift substantially (Mortezanejad et al., 26 Mar 2025).
Key practitioner recommendations include:
- Prefer SS-RCC or SSR-based charts whenever normality is questionable or measurement rounding is non-negligible.
- Validate the deep learning-based distributional approximations via simulation prior to routine control monitoring.
- Periodically reassess device resolution and retrain machine-learning models as process or instrumentation evolves.
- For very small persistent shifts, supplement with CUSUM or composite detection diagnostics (Mortezanejad et al., 26 Mar 2025).
6. Extensions and Related Sampling-Based Charts
Beyond classical signed-rank and sequential rank charts, ranked set sampling (RSS) and its extensions offer alternative means to incorporate order information when direct measurement is expensive or error-prone. Neoteric ranked set sampling (NRSS) control charts select observations per subgroup from candidates according to ranks derived from auxiliary variables (Silva et al., 2017). These sampling-based charts achieve improved ARL and signal detection over standard SRS and RSS charts, particularly under perfect or high-quality ranking. The variance structure is determined by theoretical or simulation-based order statistic moments, and imperfect ranking scenarios are explicitly addressed via simulation or Mallows-type error models.
Empirical evaluations across concrete-strength datasets with moderate auxiliary–interest correlation () show substantial ARL improvement of NRSS charts over classic approaches (Silva et al., 2017). A plausible implication is that, in settings where partial or proxy ordering is available, sampling-based rank methods can complement or outperform value-based and rank-based charts.
7. References and Key Literature
- "Signed Rank Chart For Tied Observations: An Application of Deep Learning Models" (Mortezanejad et al., 26 Mar 2025)
- "Signed Sequential Rank CUSUMs" (Lombard et al., 2017)
- "Signed Sequential Rank Shiryaev-Roberts Schemes" (Zyl et al., 2019)
- "An improved quality control chart to monitor the mean based on ranked sets" (Silva et al., 2017)