Unity Forests (UFOs)

Updated 18 January 2026

Unity Forests (UFOs) are an enhanced random forest variant that detects interaction-driven effects using a two-stage tree-building process.
They utilize joint root optimization and Unity VIM to capture both marginal and purely interaction-based signals, improving variable selection and model interpretability.
Empirical studies indicate that UFOs achieve significant gains in AUC and accuracy on benchmark datasets compared to conventional random forests.

Unity Forests (UFOs) are a variant of random forests designed to enhance the modeling of interactions, especially those involving covariates with purely interaction-driven effects, and to facilitate interpretability through specialized variable importance and visualization tools. The motivation, methodology, and empirical validation of UFOs address limitations of conventional random forests in detecting and explaining complex dependencies among covariates (Hornung et al., 11 Jan 2026).

1. Background and Motivation

Random forests (RFs) are ensembles of decision trees, each constructed on bootstrap or random subsamples of data and covariates, where each tree’s predictions are aggregated for regression or majority-voted for classification. The conventional approach to tree construction utilizes greedy recursive splitting: at each node, the covariate and cutpoint leading to maximal reduction in an impurity measure (e.g., Gini for classification, variance for regression) are selected.

A critical limitation in standard RFs arises when interactions exist between covariates that do not exhibit main (marginal) effects. For instance, if the effect of $X_2$ on $Y$ is conditional on $X_1$ but neither covariate individually reduces impurity, greedy splitting omits both $X_1$ and $X_2$ from the tree root, precluding accurate modeling of these interactions. This mechanism limits the ensemble’s ability to detect purely interaction-based signals, resulting in suboptimal prediction and variable selection.

2. Unity Forests Construction

Unity Forests address these deficiencies via a two-stage tree-building approach. Each tree $T_b$ is constructed as follows:

Stage A: Joint Tree Root Optimization

Subsample without replacement a fraction $\text{fract}_n$ (default 0.7) of the $n$ observations.
Randomly select a subset of covariates of cardinality determined by $\text{prop}_\text{var}$ (default $\sqrt{p}/p$ if $p \leq 100$ , else 0.1).
Generate $n_\text{cand\_trees}$ (default 500) candidate "tree roots" of maximal depth $\text{max\_depth\_root}$ (default 3), with randomly selected split covariates and cutpoints within the subset.
For each root $R_k$ , compute the criterion

$C(R_k) = I(D) - \sum_{\ell \in \text{leaves}(R_k)} \frac{|D_\ell|}{|D|} I(D_\ell)$

where $D$ is the subsample and $D_\ell$ indexes leaf data.

Select the root maximizing $C(R_k)$ .

Stage B: Conventional Expansion

Expand the selected root $R^*$ into a full decision tree by standard CART-style splitting over all $p$ covariates, using random subsets of size $mtry$ (default $\lfloor \sqrt{p} \rfloor$ ) at each node.

The root’s joint, stochastic construction enables the ensemble to select combinations of splits that facilitate the detection of interaction terms, regardless of their marginal effects. Diversity is maintained by randomization in both covariate selection and root candidate generation.

3. Unity Variable Importance Measure (Unity VIM)

The Unity Variable Importance Measure (Unity VIM) is designed to quantify the discriminative power of covariates, emphasizing their strongest effects as captured in tree roots. Its computation proceeds as follows:

In-bag Split Scoring: For each internal node $l$ in all roots, compute $\mathrm{SC}_l = N_l \Delta I_l$ , where $N_l$ is node size and $\Delta I_l$ is impurity reduction.
Selection of Top Splits per Covariate: For each covariate $j$ , select the top $\text{prop\_best\_splits}$ (default 1%) of $j$ ’s splits, ensuring a minimum of five. Designate the indices as $\mathcal{B}_j$ .
OOB Evaluation: For each $j$ ,

$\mathrm{UnityVIM}_j = \sum_{l \in \mathcal{B}_j} N_l (\mathrm{OOB\_SC}_l - \mathrm{OOB\_SC\_PERM}_l)$

where $\mathrm{OOB\_SC}_l$ is the impurity reduction for out-of-bag observations at node $l$ , and $\mathrm{OOB\_SC\_PERM}_l$ is analogous after permuting $X_j$ .

In contrast, permutation VIM globally permutes $X_j$ and measures the change in overall OOB error, impacting all interactions, while Gini (or MDI) importance sums impurity reductions across all nodes using $X_j$ , tending to dilute purely interaction-based effects. Unity VIM’s selective, OOB-focused scoring achieves unbiased measurement, particularly for interactions undetectable by standard RF variable importance.

4. Covariate-Representative Tree Roots (CRTRs)

Covariate-Representative Tree Roots (CRTRs) provide an interpretable summary of the role of each covariate, visualizing whether its strongest effect is marginal or interaction-based.

The selection of a CRTR for covariate $j$ proceeds as follows:

OOB Split Scoring: As in Unity VIM, but using OOB data.
Best Roots Identification: Collect tree roots containing at least one of $j$ ’s top splits $\mathcal{B}_j$ .
Representative Selection: Among these, apply the Laabs et al. (2024) approach—compute pairwise weighted tree distances that emphasize upper splits, selecting the tree with the minimal average distance.
Covariate-Score Annotation: At each internal node splitting on $k$ , assign

$s_{k} = \frac{\text{freq\_best}_k}{\text{freq\_best}_k + \text{freq\_all}_k}$

where $\text{freq\_best}_k$ and $\text{freq\_all}_k$ denote $k$ ’s frequencies in best and all roots, respectively.

If a covariate’s effect is marginal, its CRTR will split on $j$ at the root; for interaction-driven effects, the first split is on the interacting covariate, followed by $j$ in a child node. Dashed lines highlight top-scoring splits; line thickness visualizes split frequency.

5. Theoretical and Computational Properties

Unity Forests retain the statistical validity of random forests: each tree depends on an i.i.d. draw of a random vector $\Theta_b$ . The joint root optimization emulates properties of optimal tree induction approaches (e.g., Bertsimas et al. 2017), but is computationally tractable due to randomized candidate evaluation.

The principal computational costs are:

Root Selection: $O(\text{n\_cand\_trees} \times d \times N_\text{root})$ per tree, for candidate root evaluation.
Full Tree Expansion: Matching the complexity of standard RFs post-rooting.
Unity VIM/CRTR: Linear in the number of root nodes.

No formal consistency or risk bounds are currently provided, yet empirical results demonstrate robustness and stability.

6. Empirical Performance and Validation

A. Real-Data Benchmark

A large-scale benchmark on $168$ binary-outcome datasets (Couronne et al., 2018 subset) compared UFOs (default settings) to conventional random forests (ranger) via five-fold cross-validation and five repeats. Metrics included Brier score, AUC, and accuracy:

Metric	UFO better	UFO equal	UFO worse
Brier	78 (46.4%)	0 (0%)	90 (53.6%)
AUC	114 (67.9%)	9 (5.4%)	45 (26.8%)
ACC	101 (60.1%)	10 (6.0%)	57 (33.9%)

AUC and accuracy improvements were statistically significant ( $p < 0.001$ ); Brier scores did not differ significantly ( $p = 0.396$ ).

B. Simulation Studies

UFOs and the Unity VIM were assessed on synthetic data with both marginal and interaction-driven effects, across a range of sample sizes ( $n = 100, 300, 500, 1000$ ):

Qualitative interactions without marginal effects: Unity VIM reliably detected informative covariates for $n \geq 300$ and, with strong signal, $n = 1000$ even in weaker settings; permutation VIM required much larger $n$ and stronger signal, while Gini importance was ineffective (AUC $\approx 0.5$ ).
Quantitative interactions with marginals: All VIMs performed well for strong effects; Unity VIM required a larger $n$ for weaker effects than permutation/Gini.
Pure marginal effects: All VIMs performed comparably, but Unity VIM was more variable at small $n$ or weak effects.

C. CRTR Reproduction

When applied to simulated data ( $n=500$ ), CRTRs for the top Unity VIM covariates correctly revealed true effect types (marginal/interaction). Real-data analysis (e.g., the Wine dataset, $n=178, p=13$ ), showed a high correlation ( $\rho=0.85$ ) between Unity and permutation VIM rankings, and highlighted marginal, conditional, and non-linear effect structures via CRTR visualization.

7. Interpretation, Applications, and Extensions

CRTRs have practical interpretive value: in real data, they reveal nuanced roles for covariates—distinguishing marginal main effects, conditional dependencies, and qualitative reversals (e.g., different split order yields reversed associations).

Practical usage requires minimal tuning for standard settings, though very small ( $n<100$ ) or extremely high-dimensional data may require customization. For exploratory data applications, Unity VIM can be compared to permutation VIM, and CRTR graphics used to interrogate mechanisms underlying covariate importance.

Potential research directions include extensions of the UFO root methodology to survival outcomes (requiring appropriate partition criteria), integration into gradient boosting with shallow trees, identification of multiple heterogeneous CRTRs per covariate, and formal statistical analysis of consistency and risk properties.

Unity Forests represent a significant methodological advance in random forest modeling, specifically enabling detection and interpretation of interaction-based effects that elude standard greedy splitting. Their empirical superiority in discrimination and accuracy, combined with interpretative tools such as Unity VIM and CRTRs, expands the utility and transparency of ensemble tree models (Hornung et al., 11 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Unity Forests: Improving Interaction Modelling and Interpretability in Random Forests (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Unity Forests (UFOs).