Random Survival Forests for Survival Analysis

Updated 16 February 2026

Random Survival Forests (RSF) are nonparametric ensemble methods that extend classical random forests to analyze right-censored survival data.
RSF builds multiple survival trees using bootstrap samples and log-rank splitting to estimate cumulative hazards and survival probabilities.
Its flexible design handles high-dimensional, diverse covariate types and missing data, making RSF applicable in oncology, maintenance, and genomics.

Random Survival Forests (RSF) are nonparametric ensemble learning methods developed for right-censored time-to-event data, extending the classical random forest framework to survival analysis. RSF provides robust modeling of nonlinear effects, interactions, and accommodates various covariate types without the proportional hazards constraint. It supports high-dimensional input spaces, missing data, and is adaptable to diverse domains, including oncology, maintenance, critical care, genomics, and longitudinal studies.

1. Theoretical Foundations and Algorithmic Structure

RSF is constructed by growing an ensemble of survival trees, each built on a bootstrap sample from the original dataset. For each subject $i$ , the data consist of observed time $T_i^* = \min(T_i, C_i)$ , event indicator $\delta_i = \mathbf{1}_{\{T_i \leq C_i\}}$ , and covariates $X_i \in \mathbb{R}^p$ .

At each tree node, a randomly selected subset of $mtry$ predictors is used to candidate splits. The split is selected to maximize a node-specific survival difference, quantified via survival-specific tests. The canonical splitting criterion is the two-sample log-rank statistic, which compares survival between left/right daughter nodes.

Each terminal node contains a set of subjects. The within-node cumulative hazard function (CHF) $\hat{H}_s(t)$ is estimated via the Nelson–Aalen estimator: $\hat{H}_s(t) = \sum_{t_{ls} \le t} \frac{d_{ls}}{r_{ls}}$ where $d_{ls}$ is the number of events at time $t_{ls}$ and $r_{ls}$ is the number at risk. For censored observations, they contribute to the risk set up to their censoring time but not to $d_{ls}$ .

Ensemble prediction for a new subject $x$ is then the averaged terminal-node CHF across all $B$ trees traversed by $x$ : $\hat{H}(t | x) = \frac{1}{B} \sum_{b=1}^B \hat{H}_b(t | x)$ and the survival function is computed as $\hat{S}(t | x) = \exp[-\hat{H}(t | x)]$ (Nair et al., 29 Sep 2025).

2. Statistical Properties, Non-Linearity, and Model Complexity

RSF leverages the flexibility of tree-based modeling to automatically capture nonadditive, non-linear, and interaction effects without explicit functional form specifications. The model is not restricted by the proportional hazards assumption that constrains Cox models, enabling it to capture time-dependent effects and non-proportional hazards structures observed in real-world data (e.g., oncology or critical care) (Nair et al., 29 Sep 2025, Korepanova et al., 2019).

Split selection with the log-rank statistic maximizes discrimination between child nodes with respect to event occurrence. However, log-rank splitting is optimal only under proportional hazards; alternative split statistics (e.g., transformation forests using multivariate likelihood scores) extend RSF power to non-proportional hazards regimes (Korepanova et al., 2019).

The algorithm is robust to mixed covariate types (continuous, categorical), missing data, and irregular measurement times with the appropriate preprocessing or extension (e.g., functional data RSF for longitudinal processes) (Romano et al., 2024, Devaux et al., 2022).

3. Split Rules and Recent Extensions

The standard RSF utilizes the log-rank test; however, several extensions have emerged:

C-index-based Splitting: Harrell's C-index as the split statistic reduces end-cut preference and is advantageous in moderate $n$ , high censoring, and settings with informative continuous predictors (Schmid et al., 2015).
Maximally Selected Rank Statistics: Adjust maximum log-rank scores for multiple testing and category count, yielding unbiased variable selection, especially when variables with many split points coexist (Wright et al., 2016).
Gradient-based Brier Score Splitting: Targets calibration by minimizing the Brier loss; more robust to informative censoring (Graf et al., 5 Feb 2025).
ExtraTrees/Randomized Splits: Randomized candidate variable and cut-point selection, increasing computational speed at potential cost to stability (Graf et al., 5 Feb 2025).
Oblique Splitting: Uses linear combinations of variables for splitting, increasing expressive capacity, especially in high-dimensional interactions (Jaeger et al., 2022).
Functional and Longitudinal RSF: Adapt FPCA and node-local mixed models, allowing splitting on features representing longitudinal trajectories or endogenous repeated measures (Romano et al., 2024, Devaux et al., 2022).
Weighted RSF: Learns optimal tree weights via quadratic programming to maximize Harrell's C, improving predictive performance over equal-weight averaging (Utkin et al., 2019).
Kernel-Induced RSF: Utilizes kernel-induced features to project data into a high-dimensional space, thereby capturing nonlinear effects missed by standard axis-aligned splits (Yang et al., 2010).

4. Model Fitting, Hyperparameter Tuning, and Performance Evaluation

Key hyperparameters include the number of trees ( $B$ ), $mtry$ , minimum node size, and the choice of splitrule. Optimal values are data- and task-dependent. Empirical results indicate that $ntree$ and $mtry$ have the highest influence on discrimination (C-index), while nodesize most strongly influences calibration (Brier score) (Yardımcı et al., 20 Apr 2025).

Grid-search or cross-validated optimization using out-of-bag (OOB) C-index and/or integrated Brier score is a recommended approach for hyperparameter selection (Nair et al., 29 Sep 2025, Yardımcı et al., 20 Apr 2025). In empirical benchmarking, RSF can reach discrimination nearly matching or exceeding Cox models in settings with nonlinear and/or interaction effects, but may underperform in calibration under severe non-proportional hazards or high censoring (Graf et al., 5 Feb 2025, Nair et al., 29 Sep 2025).

Performance is typically evaluated using:

Concordance index (C-index): Probability concordance of predicted and observed time orderings.
Brier score: Time-dependent squared error for survival probability estimation (calibration).
Integrated Brier Score (IBS): Overall calibration across time.
OOB error: Internal cross-validation mechanism using bootstrap out-sample cases.

Simultaneous and pointwise prediction intervals for cumulative hazard or survival curves can be constructed via U-statistics theory, delivering honest uncertainty bands for RSF predictions (Formentini et al., 2022).

5. Variable Importance and Interpretation

Variable importance in RSF is most commonly quantified by permutation VIMP: the increase in prediction error (OOB C-index or Brier) when a variable's values are permuted among samples. Minimal depth analysis provides an alternative, measuring tree-depth at first split (Nair et al., 29 Sep 2025, Ehrlinger, 2016). Oblique RSF and kernel-based RSF require adapted measures, such as negation importance for oblique models (Jaeger et al., 2022).

In functional RSF or with node-local model–based transformations, VIMP is computed over derived features (e.g., FPCA components or mixed model BLUPs) (Romano et al., 2024, Devaux et al., 2022).

6. Robustness, Bias, and Data-Adaptive Extensions

RSF can suffer from selection bias toward continuous variables due to a larger number of candidate splits; maximally selected rank corrections resolve this in mixed-type or high-cardinality settings (Wright et al., 2016). Highly imbalanced censor/event distributions bias leaf hazard estimates; synthetic minority oversampling (BRSF) corrects the hazard estimation bias and reduces prediction error under severe class imbalance (Afrin et al., 2018). Weighted and kernel-random forests further leverage risk-based weighting and feature space transformation to improve performance in complex settings (Utkin et al., 2019, Yang et al., 2010).

Computational efficiency has improved with constant-time log-rank split update algorithms, reducing training time especially for large-scale, high-resolution survival data (Sverdrup et al., 4 Oct 2025). For confidence estimation, U-statistics–based variance estimation methods provide practically computable, honest simultaneous bands for predicted hazards or survival (Formentini et al., 2022).

7. Applications and Clinical/Industrial Translation

RSF is widely adopted for survival prediction in biomedical, engineering, and reliability domains:

Clinical Oncology: RSF models integrating baseline and post-treatment (e.g., progression-free interval, residual tumor status) predictors provide discrimination competitive with Cox PH (C-index up to 0.86), enabling dynamic, patient-specific prognostics and risk stratification (Nair et al., 29 Sep 2025).
Predictive Maintenance: In run-to-failure scenarios, careful tuning of RSF enables improved C-index and Brier performance over default settings, supporting maintenance scheduling and lifetime forecasting (Yardımcı et al., 20 Apr 2025).
Critical Care and Epidemiology: Extensions for functional and longitudinal predictors enable modeling of time-varying, endogenous exposures (e.g., SOFA scores, dementia predictors) (Romano et al., 2024, Devaux et al., 2022).
High-Dimensional Omics: Kernel, weighted, and unbiased RSF variants handle high- $p$ , low- $n$ settings and reduce variable selection bias (Yang et al., 2010, Wright et al., 2016).
Decision Support: RSF–derived survival curves enable risk grouping, clinical counseling, and are embedded in clinical web apps for real-time prognosis (Nair et al., 29 Sep 2025).

Recent research emphasizes that RSF discrimination should not be the sole metric of model suitability, and calibration, variable importance, and biological interpretability should also be critically evaluated (Graf et al., 5 Feb 2025).

RSF constitutes a foundational methodology for modern survival analysis, offering an arsenal of methodological extensions for varying data modalities, distributional regimes, and application requirements. Its continued evolution addresses known biases, computational limitations, and domain-specific challenges, supporting its deployment across biomedical and engineering fields.