- The paper introduces SCOPE-PD, an explainable ML framework that integrates subjective and objective clinical measurements for early PD diagnosis.
- It employs a Random Forest model with SHAP explainability achieving over 98% accuracy, validated on a robust PPMI cohort.
- The study emphasizes actionable, patient-specific insights to support precision clinical decision-making and future research.
SCOPE-PD: Explainable ML for Precision Parkinson’s Disease Diagnosis
Introduction
The paper “SCOPE-PD: Explainable AI on Subjective and Clinical Objective Measurements of Parkinson's Disease for Precision Decision-Making” (2601.22516) presents a machine learning workflow integrating both subjective (patient-reported) and objective (expert-assessed) clinical assessment data to enhance early Parkinson's disease (PD) identification. The framework emphasizes individualized, explainable predictions with direct relevance to clinical decision-making. Recognizing the challenge that most prior work has focused on either subjective data or black-box models, this study tightly couples high predictive performance with XAI-enabled interpretability, using the SHAP methodology to foster clinical trust, translational transparency, and actionable insights at both population and individual levels.
Dataset and Preprocessing
The foundation is the baseline dataset from the PPMI (Parkinson’s Progression Markers Initiative), leveraging 148 features (79 subjective, 69 objective) from 1,786 subjects. Subjective measures derive from instruments such as MDS-UPDRS I–II, GDS, SCOPA-AUT, and others. Objective measures include MDS-UPDRS III, MoCA, BJLO, HVLT, and additional established neuropsychological tests. Rigorous preprocessing included elimination of data with excessive missingness, imputation-free feature set construction, alignment of scoring directionality, and normalization by min-max scaling to [0,1] per feature.
Three datasets were constructed: subjective only, objective only, and combined, enabling robust model comparison along clinically meaningful data type axes.
Learning and Model Selection
Five supervised algorithms were benchmarked: logistic regression, SVM (RBF), KNN, Random Forest (RF), and XGBoost. Class imbalance was handled via class-wise weighting or, for XGBoost, scale_pos_weight. Model selection and hyperparameter tuning used nested 5-fold stratified cross-validation with an 80/20 train/test split, optimizing for F1 score. All records with missing data were excluded to maintain integrity and comparability.
The highest results were achieved with the Random Forest model on the combined feature set:
- Accuracy: 98.66% (±0.91%)
- F1-score: 0.9917 (±0.0055)
- ROC-AUC: 0.9992 (±0.0006)
- PR-AUC: 0.9998 (±0.0001)
When using only subjective or objective features, the RF model still yielded notable, and nearly equivalent, performance (96.55–97.85% accuracy). These metrics were generated using strict cross-validation without synthetic data augmentation, reducing risk of optimistic bias.
Explainability and Feature Attribution
Interpretability was delivered through SHAP TreeExplainer for tree ensemble models. Both local (individual-level) and global (cohort-level) feature attributions were calculated, quantifying the additive effect of each feature on PD prediction probability. The SHAP paradigm yields precise, cohort- and subject-specific statements such as “self-reported tremor increases PD probability by +0.05”, allowing nuanced clinician-facing decision support.
Globally, the top discriminative features were:
- Combined dataset: Tremor (NP2TRMR; self-reported), bradykinesia (NP3BRADY; observed), facial expression (NP3FACXP; observed) dominated importance.
- Subjective dataset: Nine of ten top features were from MDS-UPDRS II (activities of daily living); the most significant was self-reported tremor.
- Objective dataset: MDS-UPDRS III features were exclusively top-ranked, with emphasis on bradykinesia-related markers.
SHAP analysis forcibly confirmed that established PD biomarkers, especially those relating to tremor and bradykinetic impairment, retain overwhelming predictive value even alongside complex multimodal data. The framework allows quantification of local feature contributions, thus facilitating individualized risk stratification.
Results in the Context of Prior Work
The reported accuracy (98.66% RF, combined) surpasses previously published results that employed single-modality datasets (e.g., sensor or voice only [81.7–92.6%]) and matches/exceeds prior explainable multimodal approaches (DenseNet + clinical data: 96.6–96.8% accuracy [see Dentamaro et al., Priyadharshini et al.]). The strong performance is attributed not only to multimodal integration, but to careful curation and preprocessing of high-fidelity, harmonized baseline assessments, and, crucially, robust XAI post hoc evaluation.
Implications and Future Directions
Clinical and Research Implications
- Precision Clinical Support: The methodology enables patient-specific explanations supporting risk communication, intervention targeting, and differential diagnostics at early disease stages, all within the bounds of interpretable AI.
- Feature-Level Insights: Quantitative feature attributions aid in dissecting the contributions of subjective and objective components, fostering understanding of “what drives” diagnosis in each patient, supporting further subclassification and treatment stratification.
- Validation and Generalizability: The study recognizes the need for external validation on independent multi-site cohorts, as models tuned exclusively on PPMI risk overfitting to site-specific or population-level idiosyncrasies.
Theoretical Advances and Future AI Development
- XAI-Driven Translational ML: This work pushes beyond simple model performance metrics, requiring that high-accuracy AI be accompanied by transparent, quantitative feature attribution frameworks. It demonstrates a practical pipeline for integrating tabular multimodal clinical data with XAI in precision medicine.
- Multimodal Expansion: The authors propose subsequent extensions to include genetic data (e.g., GBA, SNCA), neuroimaging (MRI/fMRI), and longitudinal/progressional modeling. This trajectory will support granular endophenotype discovery and improved prognostication.
- Benchmarking for Trustworthy AI: The approach highlights the criticality of nested validation, robust handling of imbalance, and feature alignment in medical ML, setting a methodological precedent for future disease-oriented decision support systems.
Conclusion
This paper develops a unified, explainable AI framework—SCOPE-PD—for early, accurate PD diagnosis from curated subjective and objective clinical data. By employing both high-performing ensemble models and rigorous SHAP-based interpretability analyses, the study delivers accuracies above 98% with actionable, patient-specific explanations, asserting that multimodal, explainable ML models can make PD risk assessment both precise and clinically interpretable. The results call for continued research on generalizability, broader phenotype integration, and real-world clinical deployment, providing a blue-print for XAI in neurodegenerative disease management (2601.22516).