Radiomics AutoML Frameworks

Updated 20 January 2026

Radiomics-specific AutoML frameworks are specialized systems that automate the construction and optimization of machine learning pipelines tailored to radiomics data, reducing the need for manual feature engineering.
They employ modular architectures integrating preprocessing, feature extraction, selection, and advanced search strategies like Bayesian and evolutionary optimization to enhance model performance.
By embedding domain-specific imaging knowledge, these frameworks address reproducibility and heterogeneity challenges, enabling robust, clinically applicable model development.

Radiomics-specific automated machine learning (AutoML) frameworks constitute a class of software and algorithmic systems that automate the construction, selection, and optimization of machine learning pipelines tailored to radiomics data. In contrast to general-purpose AutoML platforms, these frameworks integrate medical imaging domain knowledge (feature extraction, normalization, reproducibility constraints, harmonization, segmentation, etc.), and provide specialized workflow modules to support radiomics-centric research—from image and region-of-interest (ROI) processing, feature extraction and selection, to model training, validation, and interpretability (Shafiee et al., 2015, Starmans et al., 2021, Lozano-Montoya et al., 13 Jan 2026, Tzanis et al., 30 Apr 2025). They address the unique methodological, computational, and reproducibility challenges of medical imaging analysis, particularly in heterogeneous, high-dimensional, and often small-sample-size settings.

1. Paradigm Shift: From Manual Feature Engineering to Automated Workflow Construction

Traditional radiomics studies rely on hand-crafted feature sets (intensity histograms, GLCM textures, wavelets, morphological/shape descriptors) and heuristic pipeline construction, requiring extensive manual design and domain expertise. Radiomics-specific AutoML frameworks automate this process by embedding feature engineering, algorithm selection, and hyperparameter optimization inside tunable workflow modules, formalizing the full model-building process as a combined algorithm selection and hyperparameter (CASH) optimization problem (Starmans et al., 2021).

Recent advances include deep discovery radiomics, where neural architectures (such as randomized CNNs or evolved sequencers) replace explicit, pre-defined feature families, learning data-driven latent representations from images without hand-specification of texture, shape, or intensity operators (Shafiee et al., 2015, Shafiee et al., 2017). This shift enables direct, end-to-end optimization and supports scalable, adaptive workflows across imaging modalities and clinical indications.

2. Modular Architectures and Formal Workflow Optimization

State-of-the-art radiomics-specific AutoML frameworks adopt a modular design. Each pipeline stage is defined as an independent, hyperparameterized module—commonly including preprocessing, feature extraction, feature selection, sample balancing, classifier selection, and ensembling. The optimal configuration is found by searching over both algorithm choices and their internal hyperparameters (Starmans et al., 2021).

Example: WORC Framework Modules

Pipeline Component	Algorithmic Options / HPs	Selection Mechanism
Image/ROI Preprocessing	Intensity normalization, anisotropy mode	Categorical/activator HP
Feature Extraction	564 features: shape, texture, wavelets	Fixed + optional families
Feature Preprocessing	Group-wise drop, imputation (mean, KNN), scaling	Activator + selector HP
Feature Selection	RELIEF, LASSO, RF, PCA, univariate tests	Activator + selector HP
Resampling	SMOTE, ADASYN, under/oversampling	Activator + selector HP
Classification	SVM, RF, LR, LDA, QDA, AdaBoost, XGBoost	Selector HP

The pipeline search space is combinatorially large, and the workflow search is cast as:

$\lambda^*_C = \arg\min_{\lambda_C\in\Delta_C} \frac{1}{k_{\rm train}} \sum_{i=1}^{k_{\rm train}} \mathcal{L}(\text{train}=D^{(i)}_{\rm train}(\lambda_C), \text{valid}=D^{(i)}_{\rm valid}(\lambda_C))$

where $\lambda_C$ encodes algorithm and hyperparameter configurations (Starmans et al., 2021).

Optimization is performed via random search, Bayesian optimization (SMAC-derived), or evolutionary strategies. Pipeline ensembling (e.g., Top-N, forward-selection) further stabilizes predictions (Starmans et al., 2021).

3. Integration of Radiomics-Specific Preprocessing and Feature Extraction

Radiomics-specific frameworks exceed standard tabular AutoML by integrating domain-driven feature extraction and preprocessing steps such as:

PyRadiomics-backed extraction of first-order, shape, and high-dimensional texture features (GLCM, GLRLM, GLSZM, GLDM, NGTDM, wavelet, LoG, LBP, vesselness, monogenic phases) (Chang et al., 2020, Tzanis et al., 30 Apr 2025, Starmans et al., 2021).
Automated image resampling, normalization, and discretization tailored to ROI masks and modality-specific constraints (Tzanis et al., 30 Apr 2025).
Filtering and harmonization modules to address intensity variation, batch effects (ComBat), and inter-center heterogeneity (although most frameworks not yet offering full harmonization automation) (Lozano-Montoya et al., 13 Jan 2026).

Deep-discovery radiomics frameworks, such as the StochasticNet and Evolutionary Deep Radiomic Sequencer (EDRS), use random-graph CNNs or evolutionary architecture search to learn sparse, compact feature compositions, circumventing hand-picking of feature types (Shafiee et al., 2015, Shafiee et al., 2017). For instance, StochasticNet radiomic sequencers generate receptive field masks via Bernoulli random sampling and learn over an ensemble of subnetworks sampled from the Gilbert random graph model, eliminating manual feature engineering (Shafiee et al., 2015).

4. AutoML Search Strategies and Model Selection

Workflow and hyperparameter optimization strategies include:

Exhaustive Grid Search: Enumeration of all parameter combinations (practical for low-dimensional spaces). Found in frameworks such as Simplatab and DARWIN (Lozano-Montoya et al., 13 Jan 2026, Chang et al., 2020).
Random Search: Sampling pipeline and hyperparameter configurations; default in WORC (Starmans et al., 2021). Effective at moderate compute.
Bayesian Optimization: SMAC or TPE-based surrogate modeling and expected improvement acquisition. Enables efficient exploration in large spaces but can overfit at high computational budgets (Starmans et al., 2021, Tzanis et al., 30 Apr 2025).
Evolutionary Search: Population-based, as in EDRS (Shafiee et al., 2017), or evolutionary pipeline composition (WORC supports genetic selection for feature selection and pipeline evolution (Lozano-Montoya et al., 13 Jan 2026)).
Agentic and Orchestration-based Search: Multi-agent coordination (as in mAIstro) where agents invoke underlying toolkits (PyCaret, PyRadiomics, nnU-Net) and pass outputs through an end-to-end orchestrator (Tzanis et al., 30 Apr 2025).

Model selection and validation are typically performed via nested or repeated cross-validation (e.g., stratified 5-fold or 10-fold CV), with internal and external test protocols (Starmans et al., 2021, Tzanis et al., 30 Apr 2025). Ensemble construction (Top-N, bagging, forward selection) consistently improves out-of-sample metrics (Starmans et al., 2021).

5. Notable Frameworks and Comparative Evaluation

Several radiomics-specific AutoML frameworks have been proposed and evaluated:

WORC: Modular workflow optimizer employing random/Bayesian search over pipeline components, validated on twelve clinical applications with competitive or superior AUC and F1 to baseline and clinical experts. Publicly available code and datasets (Starmans et al., 2021).
Simplatab: No-code GUI system supporting stability selection, recursive feature elimination, and basic bias/vulnerability analysis. Offered statistically superior mean AUC (81.81%) over general-purpose AutoML in an independent benchmark on ten datasets (Lozano-Montoya et al., 13 Jan 2026).
mAIstro: Multi-agent system integrating image segmentation (nnU-Net, TotalSegmentator), radiomics (PyRadiomics), and AutoML (PyCaret), coordinated via a natural language interface. Validated across 16 datasets; unique in agentic, NL-driven orchestration (Tzanis et al., 30 Apr 2025).
DARWIN: Web-based GUI allowing custom pipeline assembly and AutoML search in both classic radiomics (PyRadiomics) and deep learning stacks. Utilizes grid/random/Hyperband search, enables modular graph-based definition of experimental workflows (Chang et al., 2020).
AutoRadiomics, Auto-ML for Radiomics, AutoPrognosis: Older frameworks, some now obsolete, with limited or no radiomics-specific modules or infeasibility on contemporary high-dimensional datasets (Lozano-Montoya et al., 13 Jan 2026).

A selection of quantitative results from Simplatab (Lozano-Montoya et al., 13 Jan 2026):

Dataset	AUC (%) ± SD	Runtime (5-fold CV)
Desmoid	95.0 ± 3.8	~1 h
Liver	96.4 ± 2.3	~1 h
Lipo	87.7 ± 5.5	~1 h
GIST	82.3 ± 4.4	~1 h
Prostate	73.3 ± 5.8	~1 h
Mean	81.81 ± 4.4	~1 h

6. Challenges, Limitations, and Future Directions

Radiomics-specific AutoML frameworks remain an active research area with several outstanding challenges (Shafiee et al., 2015, Lozano-Montoya et al., 13 Jan 2026, Starmans et al., 2021, Tzanis et al., 30 Apr 2025):

Survival Analysis: Only a minority of frameworks (e.g., AutoPrognosis) support time-to-event modeling, and such modules are often computationally prohibitive when combined with high-dimensional radiomics.
Feature Reproducibility and Harmonization: Automated assessment of test–retest reliability, IBSI-compliance, and harmonization (e.g., ComBat, domain adaptation) remain largely absent, which impacts generalizability to multisite/multicenter data (Lozano-Montoya et al., 13 Jan 2026).
End-to-End Integration: Most solutions do not couple image preprocessing, segmentation, harmonization, feature extraction, and modeling in a single pipeline. Modular, federated learning-compatible designs are needed for robust multicenter studies.
Interpretability: While some frameworks employ SHAP or LIME for feature importance, deeper clinical explainability and meta-parameter transparency are limited (Lozano-Montoya et al., 13 Jan 2026, Chang et al., 2020, Tzanis et al., 30 Apr 2025).
Scalability: Deep-discovery radiomics and agentic orchestration systems (e.g., mAIstro) incur substantial compute costs for large 3D imaging or comprehensive AutoML searches (Tzanis et al., 30 Apr 2025).

Proposed future directions include embedding survival-analysis algorithms with efficient search, harmonization/reproducibility filtering, GUI-exposed meta-parameters for advanced users, and native support for federated learning (Lozano-Montoya et al., 13 Jan 2026, Shafiee et al., 2015, Tzanis et al., 30 Apr 2025).

7. Clinical Validation and Deployment Implications

Clinical validation across a spectrum of anatomical sites, imaging modalities, and endpoints has demonstrated the potential for radiomics-specific AutoML to outperform human experts and classic radiomics pipelines in several tasks (e.g., EDRS reporting 93.42% sensitivity and 88.78% accuracy for lung cancer detection, exceeding prior methods) (Shafiee et al., 2017, Starmans et al., 2021).

Frameworks such as EDRS are specifically designed for privacy-preserving, on-site deployment, enabling all computation at the institution level with compact, efficient models, and eliminating the need for PHI transfer to third-party servers (Shafiee et al., 2017). Modular open-source frameworks (WORC, mAIstro, DARWIN) promote reproducibility via audit-trails, configuration logging, and public datasets/code repositories (Starmans et al., 2021, Chang et al., 2020, Tzanis et al., 30 Apr 2025).

A plausible implication is that as reproducible workflow optimization, harmonization, and explainability modules mature within these frameworks, radiomics-specific AutoML will become the standard for robust, scalable, and interpretable quantitative imaging biomarker research in multi-center translational and clinical contexts.