Dense Random Survival Forests
- The paper introduces a dense ensemble framework that grows thousands of survival trees across diverse hyperparameter settings to capture treatment–covariate interactions.
- It employs a novel splitting rule optimized for detecting heterogeneity in treatment effects within censored time-to-event data, ensuring high sensitivity and controlled Type I error.
- Spectral clustering on the aggregated proximity matrix facilitates unsupervised discovery of clinically meaningful patient subgroups in both simulated and clinical trial datasets.
Very Dense Random Survival Forests (DRSF) are a methodological framework for unsupervised identification of patient subgroups with heterogeneous treatment response in time-to-event data, characterized by an exceptionally large ensemble of survival trees (up to 100,000) each grown under diverse hyperparameter settings. DRSF is distinguished by its targeted splitting criterion that directly optimizes for treatment–covariate interaction in the presence of censored data, and by the explicit fusion of proximity information across hyperparameter sweeps, culminating in a robust similarity-driven clustering of patients without requiring pre-defined subgroup labels. The technique achieves high sensitivity and stringent Type I error control for heterogeneity detection, with demonstrated interpretability and calibration on both simulated and clinical trial datasets (Li et al., 4 Jan 2026).
1. Algorithmic Structure and Dense Ensemble Construction
DRSF departs from conventional random survival forests by constructing a dense ensemble: rather than training a single forest, it trains forests across a wide grid of hyperparameter configurations including mtry (number of covariates sampled at each split), node size, tree depth, split-search budget (nsplit), and per-variable weighting (xvar.wt). In simulation experiments, each configuration yields 1,500 trees; in clinical datasets, 500 trees per configuration are typical. Combining 60–200 such settings produces a total ensemble containing approximately 50,000–100,000 trees.
All trees are grown using bootstrap samples in the Breiman paradigm but employ a novel splitting rule focused on uncovering treatment–covariate interactions. Once E trees are constructed, a proximity matrix is obtained by aggregating the co-occurrence of patients and in terminal nodes across all trees:
Fusing these proximities for all produces an similarity matrix used for downstream clustering.
2. Unsupervised Subgroup Discovery via Spectral Clustering
Although each individual tree models survival outcomes with respect to treatment and covariate interactions, no patient subgroup labels are provided to the learner. The methodology is unsupervised in that the subgroups are discovered post hoc from the geometry of the proximity matrix.
Spectral clustering is employed to partition the patients into subgroups, where ranges from 2 to 7. The proximity matrix is treated as a weighted adjacency matrix for spectral graph partitioning. This approach allows for identifiably distinct, interpretable patient clusters associated with distinct treatment effects, without reliance on arbitrary or a priori subgroup definitions.
3. Treatment-Heterogeneity–Targeted Splitting Rule
Unlike traditional random survival forests which use log-rank or prognostic score statistics for splitting, DRSF introduces a criterion that directly targets treatment-by-covariate interaction heterogeneity. At each node, given observations , where encodes treatment, a candidate 1D split defines child assignments . The following Cox model is fit:
Split scoring is based on a weighted sum:
where is the C-index of the model, is the Z-score for testing , weights the trade-off between prognostic purity and treatment interaction, and scales the Z-score. Varying interpolates between standard split rules () and pure interaction focus (). Standard survival forests cannot discover subgroups based on response heterogeneity as they lack any explicit term for .
4. Computational and Optimization Strategies
Efficiency and scalability in DRSF are achieved through several mechanisms:
- At each split, only covariates are randomly sampled for consideration.
- For each covariate, at most uniformly random cut-points are evaluated, drastically reducing search space relative to considering all possible splits.
- Boundaries on tree depth and leaf size (minimum samples per terminal node) are imposed.
- Each parameter configuration is trained independently, supporting massive parallelism both across and within configurations.
- The cost of growing a single tree is , with as the maximum depth; cost for the entire ensemble is .
High-performing C++ implementations make ensembles at this scale numerically and computationally feasible.
5. Evaluation Metrics and Statistical Calibration
Detection of meaningful treatment heterogeneity is assessed by fitting two Cox models post-clustering:
- Baseline:
- Subgroup-specific:
The log-likelihood ratio statistic
tests the null hypothesis of no subgroup-by-treatment interaction, with providing an empirical measure of heterogeneity.
To control Type I error at 1%, the significance threshold is empirically calibrated as the 1st percentile from values across 1,000 homogeneous (null) simulated datasets, ensuring that when no heterogeneity exists.
Power to detect heterogeneity () exceeded 90% in heterogeneous scenarios, with near-exact Type I error under global or null scenarios.
6. Application to Randomized Clinical Trial Data
DRSF was validated on Phase III randomized trials of Panitumumab (studies 263 and 309), analyzing endpoints including progression-free survival (PFS) and overall survival (OS). Spectral clustering of the DRSF-derived similarity matrix revealed two or three clinically interpretable subgroups, distinguished chiefly by KRAS mutation status and ECOG performance status.
For example, in study 263 (PFS endpoint), the subgroups were:
- KRAS WT (n=455): HR=0.80,
- KRAS mutant (n=599): HR ≈ 1.0,
Further subdivision by ECOG status in OS analysis refined identification of patients with maximal benefit. These results replicated known oncology findings (KRAS wild-type response to EGFR inhibitors; improved outcomes with ECOG=0) and satisfied the calibrated error control.
7. Hyperparameter Sweeping and Stability via Fusion
Key forest hyperparameters included (case studies), (simulations), node size , maximum depth , , treatment-split weight , variable weighting schemes , and or $1,500$. Instead of selecting a single "best" configuration, all are included and fused at the proximity matrix level, stabilizing the similarity measure and rendering subgroup identification robust to hyperparameter mis-specification. Deeper trees or larger enhance heterogeneity detection at the cost of computation, but the fusion approach mitigates oversensitivity to any one setting (Li et al., 4 Jan 2026).