Simplatab: Radiomics AutoML Framework
- Simplatab is a radiomics-specific AutoML framework that leverages grid search and simple stacking to automate and optimize end-to-end medical imaging pipelines.
- It achieves a mean AUC of 81.81% ±4.4 on 10 datasets, demonstrating statistical superiority over general-purpose AutoML solutions with efficient 5-fold cross validation.
- The framework features a user-friendly, no-code GUI and transparent pipeline construction that supports reproducibility and modularity in radiomic analysis.
Radiomics-specific AutoML (Automated Machine Learning) frameworks are computational systems designed to automate the complex, multi-stage processes required for robust radiomics analysis in medical imaging. Unlike generic AutoML solutions, these frameworks natively support domain-specific requirements—such as radiomic feature extraction, image-space harmonization, segmentation-Radiomics integration, and pipeline reproducibility—within a unified, high-level interface. This article surveys the development, architectures, workflow components, optimization strategies, evaluation metrics, and research directions of radiomics-specific AutoML, with an emphasis on evidence from published frameworks.
1. Evolution and Taxonomy of Radiomics-Specific AutoML
Radiomics-specific AutoML frameworks arose to address the limitations of both hand-crafted radiomics and general-purpose AutoML tools in the context of medical image analysis. Classical radiomics pipelines rely on manual feature engineering—extracting intensity, shape, and texture features (e.g., GLCM, GLSZM, wavelets) followed by heuristic feature selection and conventional machine learning classifiers. This process is not only labor-intensive but also suboptimal for model generalizability, especially across diverse imaging and clinical settings (Shafiee et al., 2015, Starmans et al., 2021).
Radiomics-specific AutoML frameworks can be grouped as follows:
- Pipeline auto-construction systems: These offer modular optimization of entire radiomics analysis chains, selecting and tuning among algorithms for each stage (e.g., feature extraction, selection, classification) (Starmans et al., 2021).
- Discovery radiomics frameworks: These automate feature discovery by learning data-driven representations (deep "radiomic sequencers") using random graph-based or evolutionary search methods, replacing hand-crafted extraction (Shafiee et al., 2015, Shafiee et al., 2017).
- No-code/GUI platforms: These systems enable users to assemble and optimize radiomics pipelines with limited programming requirements, providing graphical interfaces and embedded AutoML for radiomics-specific data (Chang et al., 2020, Lozano-Montoya et al., 13 Jan 2026).
- Agentic and integrative orchestration frameworks: Recent systems are architected as multi-agent platforms orchestrating EDA, feature extraction, segmentation, AutoML, and reporting, often through natural language interfaces (Tzanis et al., 30 Apr 2025).
2. Core Workflow Components and Search Spaces
All radiomics-specific AutoML frameworks operationalize a multistage pipeline, typically comprising:
- Image/ROI Preprocessing: Standardization of image intensities, resampling, and artifact mitigation. Frameworks such as WORC implement fingerprinting to determine workflow pathways, e.g. 2D versus 3D processing based on voxel anisotropy (Starmans et al., 2021).
- Segmentation: Automated or user-driven delineation of regions of interest (ROIs), with integration of tools such as nnU-Net or TotalSegmentator in agentic systems (Tzanis et al., 30 Apr 2025).
- Radiomics Feature Extraction: Extraction of first-order, second-order (GLCM, GLRLM, GLSZM, GLDM, NGTDM), and higher-order features using domain libraries (PyRadiomics, internal toolkits) with standardized parameter logging to ensure reproducibility (Chang et al., 2020, Tzanis et al., 30 Apr 2025).
- Feature Preprocessing and Selection: Procedures include correlation filtering, stability selection (e.g., SULOV), RFE, variance thresholding, normalization, and principal component analysis (Starmans et al., 2021, Lozano-Montoya et al., 13 Jan 2026).
- Classifier/Regressor Selection and Tuning: Inclusion of a wide array of machine learning models—SVM, random forest, logistic regression, XGBoost, MLP, Gaussian processes, with options for stacking and ensembling (Starmans et al., 2021, Lozano-Montoya et al., 13 Jan 2026, Tzanis et al., 30 Apr 2025).
- Hyperparameter and Pipeline Search: Automated inference of module selections and internal hyperparameters via random search, Bayesian optimization (e.g., SMAC, TPE), evolutionary methods, or grid/random search specific to the module (Starmans et al., 2021, Chang et al., 2020, Tzanis et al., 30 Apr 2025).
- Performance Evaluation and Reporting: Multi-metric reporting (AUC, F1, accuracy, sensitivity, specificity, cross-validation statistics, model interpretability via SHAP/LIME) (Lozano-Montoya et al., 13 Jan 2026, Tzanis et al., 30 Apr 2025).
The total search space encompasses algorithmic choices and hyperparameters at each pipeline position, resulting in large, structured CASH (Combined Algorithm Selection and Hyperparameter Optimization) problems (Starmans et al., 2021).
3. Optimization and Model Selection Strategies
Radiomics AutoML frameworks deploy several model selection and hyperparameter optimization strategies:
- Randomized search: Empirically shown to perform on par with Bayesian optimization for moderate pipeline complexities when coupled with ensembling (“Topₙ”) (Starmans et al., 2021).
- Bayesian optimization (SMAC, TPE): Applied for surrogate-based sampling over pipeline configurations, though not consistently superior on test sets for radiomics (Starmans et al., 2021, Tzanis et al., 30 Apr 2025).
- Ensembling approaches: Top-N model averaging, FitNumber, and forward selection are used to improve generalization and reduce variance, with Top-100 exhibiting a favorable performance-stability-compute trade-off (Starmans et al., 2021).
- Evolutionary search (in deep discovery radiomics): Architecture search is driven by probabilistic “DNA” inheritance and survival constraints to evolve compact, high-performing deep radiomic sequencers with explicit model sparsification (Shafiee et al., 2017).
Table: Optimization Methods in Representative Frameworks
| Framework | Search Strategy | Ensemble Method |
|---|---|---|
| WORC | Random/Bayesian (SMAC) | Top-N, FitNumber |
| mAIstro | Grid/Bayesian/Evolution | PyCaret blend/stacking |
| StochasticNet | None (manual HPs) | Implicit (stochastic) |
| Simplatab | Grid search | Simple stacking |
| DARWIN | Grid/Random/Hyperband | None (modular outputs) |
4. Architectures and User Interfaces
Radiomics-specific AutoML frameworks span a spectrum of interfaces and architectural paradigms:
- Code/config-driven toolkits: WORC enforces modularity and reproducibility via config files and scripting, exposing all key parameters and modules for expert users (Starmans et al., 2021).
- No-code GUI systems: Simplatab and DARWIN provide graphical, drag-and-drop pipeline designers, embedding radiomics-aware preprocessing, selection, and transparent reporting, lowering technical barriers (Lozano-Montoya et al., 13 Jan 2026, Chang et al., 2020).
- Natural language agentic orchestration: mAIstro introduces Master Agent-subagent architectures, leveraging LLMs to orchestrate modular “Tools” for EDA, segmentation, radiomics extraction, and AutoML, offering plug-and-play extensibility and detailed workflow audit trails (Tzanis et al., 30 Apr 2025).
- End-to-end deep radiomic sequencer frameworks: StochasticNet and EDRS automate radiomics by replacing feature engineering with learnable, random/compact CNN architectures that discover discriminative representations per phenotype directly from images, thus embedding feature engineering and model selection into network optimization (Shafiee et al., 2015, Shafiee et al., 2017).
5. Empirical Performance and Validation
Performance of radiomics-specific AutoML frameworks has been systematically evaluated across diverse clinical tasks and datasets:
- WORC: Validated on 12 clinical applications (N ≈ 930, CT/MRI), delivered AUCs from 0.45 (melanoma) to 0.87 (Alzheimer), consistently outperforming radiomics baselines and matching or exceeding clinical experts in 6/7 tasks with available visual reads (Starmans et al., 2021).
- Simplatab: Achieved mean AUC 81.81% ± 4.4 on 10 public/private datasets, statistically superior to general-purpose AutoML, with robust performance-efficiency tradeoffs (~1 hour per 5-fold CV) (Lozano-Montoya et al., 13 Jan 2026).
- StochasticNet/EDRS: On LIDC-IDRI lung CT, StochasticNet achieved 91.07% sensitivity, 75.98% specificity, 84.49% accuracy, while EDRS achieved 93.42% sensitivity, 82.39% specificity, 88.78% accuracy, with substantial model compaction and inference speedup (Shafiee et al., 2015, Shafiee et al., 2017).
- DARWIN: Provided radiomics AUC = 0.97 on left/right lung ROI and test accuracy of 90.24% in pGGN invasibility prediction, with minimal runtime and high flexibility (Chang et al., 2020).
- mAIstro: Demonstrated high test-set and cross-validation metrics (e.g., Breast Cancer AUC 0.998), and robust segmentation (Dice scores > 0.95 on BraTS2021/Whole Tumor), unifying radiomics and deep learning in an agentic AutoML workflow (Tzanis et al., 30 Apr 2025).
6. Limitations, Open Challenges, and Future Directions
Despite their progress, contemporary radiomics-specific AutoML frameworks exhibit several gaps:
- Survival analysis: Survival modeling modules (e.g., Cox, random survival forests) remain computationally infeasible for high-dimensional radiomics or are absent from GUI-accessible pipelines. Only AutoPrognosis offers survival modules, and these are impractical in typical radiomics settings (Lozano-Montoya et al., 13 Jan 2026).
- Feature reproducibility and harmonization: No surveyed framework performs automated test–retest stability filtering, IBSI-compliance audits, or ComBat-style harmonization for batch effect correction, leaving pipelines vulnerable to scanner/institution variability (Lozano-Montoya et al., 13 Jan 2026).
- End-to-end integration: Pipelines usually lack full integration of preprocessing, segmentation QC, feature extraction, harmonization, and modeling in a single automated module with transparent audit trails.
- Scalability and federated validation: Few frameworks support federated learning, cross-center harmonization, or domain adaptation workflows, hindering generalizability.
- User transparency for advanced settings: Exposing harmonization/meta-parameters in user-friendly ways remains an open usability challenge.
Suggested directions include: embedding efficient survival algorithms, automated robustness/harmonization modules, transparent parameterization of pre-modeling steps, and native support for batch-effect correction and multi-institutional/federated validation (Lozano-Montoya et al., 13 Jan 2026, Tzanis et al., 30 Apr 2025).
7. Reproducibility, Extensibility, and Best Practices
Radiomics-specific AutoML frameworks have prioritized reproducibility and extensibility:
- Parameter audit and open-source dissemination: Leading pipelines log all hyperparameter choices (e.g., bin widths, feature extraction settings), provide code and datasets under permissive licenses, and encourage end-to-end reproducibility audits (Starmans et al., 2021, Tzanis et al., 30 Apr 2025).
- Public benchmarks and configuration management: WORC, mAIstro, and others release datasets, configurations, and comprehensive documentation/tutorials (Starmans et al., 2021, Tzanis et al., 30 Apr 2025).
- Modular/pluggable architecture: Agentic design patterns and modular node graphs facilitate extension to new imaging modalities, outcome types, or analytic components, including future deep radiomics/explainability modules (Tzanis et al., 30 Apr 2025, Chang et al., 2020).
- Recommended usage: Balanced random search and ensemble sizes (e.g., WORC defaults N_RS=1000, N_ens=100) are found to provide computationally efficient, robust results while minimizing overfitting and instability (Starmans et al., 2021).
Radiomics-specific AutoML thus provides a reproducible, extensible, and domain-tailored foundation for prognostic and diagnostic modeling in imaging-based clinical research. Continuing integration of harmonization, survival modeling, robustness validation, and federated learning represents the next wave of development in the field.