Papers
Topics
Authors
Recent
Search
2000 character limit reached

Unity Forests (UFOs)

Updated 18 January 2026
  • Unity Forests (UFOs) are an enhanced random forest variant that detects interaction-driven effects using a two-stage tree-building process.
  • They utilize joint root optimization and Unity VIM to capture both marginal and purely interaction-based signals, improving variable selection and model interpretability.
  • Empirical studies indicate that UFOs achieve significant gains in AUC and accuracy on benchmark datasets compared to conventional random forests.

Unity Forests (UFOs) are a variant of random forests designed to enhance the modeling of interactions, especially those involving covariates with purely interaction-driven effects, and to facilitate interpretability through specialized variable importance and visualization tools. The motivation, methodology, and empirical validation of UFOs address limitations of conventional random forests in detecting and explaining complex dependencies among covariates (Hornung et al., 11 Jan 2026).

1. Background and Motivation

Random forests (RFs) are ensembles of decision trees, each constructed on bootstrap or random subsamples of data and covariates, where each tree’s predictions are aggregated for regression or majority-voted for classification. The conventional approach to tree construction utilizes greedy recursive splitting: at each node, the covariate and cutpoint leading to maximal reduction in an impurity measure (e.g., Gini for classification, variance for regression) are selected.

A critical limitation in standard RFs arises when interactions exist between covariates that do not exhibit main (marginal) effects. For instance, if the effect of X2X_2 on YY is conditional on X1X_1 but neither covariate individually reduces impurity, greedy splitting omits both X1X_1 and X2X_2 from the tree root, precluding accurate modeling of these interactions. This mechanism limits the ensemble’s ability to detect purely interaction-based signals, resulting in suboptimal prediction and variable selection.

2. Unity Forests Construction

Unity Forests address these deficiencies via a two-stage tree-building approach. Each tree TbT_b is constructed as follows:

Stage A: Joint Tree Root Optimization

  1. Subsample without replacement a fraction fractn\text{fract}_n (default 0.7) of the nn observations.
  2. Randomly select a subset of covariates of cardinality determined by propvar\text{prop}_\text{var} (default p/p\sqrt{p}/p if p100p \leq 100, else 0.1).
  3. Generate ncand_treesn_\text{cand\_trees} (default 500) candidate "tree roots" of maximal depth max_depth_root\text{max\_depth\_root} (default 3), with randomly selected split covariates and cutpoints within the subset.
  4. For each root RkR_k, compute the criterion

C(Rk)=I(D)leaves(Rk)DDI(D)C(R_k) = I(D) - \sum_{\ell \in \text{leaves}(R_k)} \frac{|D_\ell|}{|D|} I(D_\ell)

where DD is the subsample and DD_\ell indexes leaf data.

  1. Select the root maximizing C(Rk)C(R_k).

Stage B: Conventional Expansion

Expand the selected root RR^* into a full decision tree by standard CART-style splitting over all pp covariates, using random subsets of size mtrymtry (default p\lfloor \sqrt{p} \rfloor) at each node.

The root’s joint, stochastic construction enables the ensemble to select combinations of splits that facilitate the detection of interaction terms, regardless of their marginal effects. Diversity is maintained by randomization in both covariate selection and root candidate generation.

3. Unity Variable Importance Measure (Unity VIM)

The Unity Variable Importance Measure (Unity VIM) is designed to quantify the discriminative power of covariates, emphasizing their strongest effects as captured in tree roots. Its computation proceeds as follows:

  1. In-bag Split Scoring: For each internal node ll in all roots, compute SCl=NlΔIl\mathrm{SC}_l = N_l \Delta I_l, where NlN_l is node size and ΔIl\Delta I_l is impurity reduction.
  2. Selection of Top Splits per Covariate: For each covariate jj, select the top prop_best_splits\text{prop\_best\_splits} (default 1%) of jj’s splits, ensuring a minimum of five. Designate the indices as Bj\mathcal{B}_j.
  3. OOB Evaluation: For each jj,

UnityVIMj=lBjNl(OOB_SClOOB_SC_PERMl)\mathrm{UnityVIM}_j = \sum_{l \in \mathcal{B}_j} N_l (\mathrm{OOB\_SC}_l - \mathrm{OOB\_SC\_PERM}_l)

where OOB_SCl\mathrm{OOB\_SC}_l is the impurity reduction for out-of-bag observations at node ll, and OOB_SC_PERMl\mathrm{OOB\_SC\_PERM}_l is analogous after permuting XjX_j.

In contrast, permutation VIM globally permutes XjX_j and measures the change in overall OOB error, impacting all interactions, while Gini (or MDI) importance sums impurity reductions across all nodes using XjX_j, tending to dilute purely interaction-based effects. Unity VIM’s selective, OOB-focused scoring achieves unbiased measurement, particularly for interactions undetectable by standard RF variable importance.

4. Covariate-Representative Tree Roots (CRTRs)

Covariate-Representative Tree Roots (CRTRs) provide an interpretable summary of the role of each covariate, visualizing whether its strongest effect is marginal or interaction-based.

The selection of a CRTR for covariate jj proceeds as follows:

  1. OOB Split Scoring: As in Unity VIM, but using OOB data.
  2. Best Roots Identification: Collect tree roots containing at least one of jj’s top splits Bj\mathcal{B}_j.
  3. Representative Selection: Among these, apply the Laabs et al. (2024) approach—compute pairwise weighted tree distances that emphasize upper splits, selecting the tree with the minimal average distance.
  4. Covariate-Score Annotation: At each internal node splitting on kk, assign

sk=freq_bestkfreq_bestk+freq_allks_{k} = \frac{\text{freq\_best}_k}{\text{freq\_best}_k + \text{freq\_all}_k}

where freq_bestk\text{freq\_best}_k and freq_allk\text{freq\_all}_k denote kk’s frequencies in best and all roots, respectively.

If a covariate’s effect is marginal, its CRTR will split on jj at the root; for interaction-driven effects, the first split is on the interacting covariate, followed by jj in a child node. Dashed lines highlight top-scoring splits; line thickness visualizes split frequency.

5. Theoretical and Computational Properties

Unity Forests retain the statistical validity of random forests: each tree depends on an i.i.d. draw of a random vector Θb\Theta_b. The joint root optimization emulates properties of optimal tree induction approaches (e.g., Bertsimas et al. 2017), but is computationally tractable due to randomized candidate evaluation.

The principal computational costs are:

  • Root Selection: O(n_cand_trees×d×Nroot)O(\text{n\_cand\_trees} \times d \times N_\text{root}) per tree, for candidate root evaluation.
  • Full Tree Expansion: Matching the complexity of standard RFs post-rooting.
  • Unity VIM/CRTR: Linear in the number of root nodes.

No formal consistency or risk bounds are currently provided, yet empirical results demonstrate robustness and stability.

6. Empirical Performance and Validation

A. Real-Data Benchmark

A large-scale benchmark on $168$ binary-outcome datasets (Couronne et al., 2018 subset) compared UFOs (default settings) to conventional random forests (ranger) via five-fold cross-validation and five repeats. Metrics included Brier score, AUC, and accuracy:

Metric UFO better UFO equal UFO worse
Brier 78 (46.4%) 0 (0%) 90 (53.6%)
AUC 114 (67.9%) 9 (5.4%) 45 (26.8%)
ACC 101 (60.1%) 10 (6.0%) 57 (33.9%)
  • AUC and accuracy improvements were statistically significant (p<0.001p < 0.001); Brier scores did not differ significantly (p=0.396p = 0.396).

B. Simulation Studies

UFOs and the Unity VIM were assessed on synthetic data with both marginal and interaction-driven effects, across a range of sample sizes (n=100,300,500,1000n = 100, 300, 500, 1000):

  • Qualitative interactions without marginal effects: Unity VIM reliably detected informative covariates for n300n \geq 300 and, with strong signal, n=1000n = 1000 even in weaker settings; permutation VIM required much larger nn and stronger signal, while Gini importance was ineffective (AUC 0.5\approx 0.5).
  • Quantitative interactions with marginals: All VIMs performed well for strong effects; Unity VIM required a larger nn for weaker effects than permutation/Gini.
  • Pure marginal effects: All VIMs performed comparably, but Unity VIM was more variable at small nn or weak effects.

C. CRTR Reproduction

When applied to simulated data (n=500n=500), CRTRs for the top Unity VIM covariates correctly revealed true effect types (marginal/interaction). Real-data analysis (e.g., the Wine dataset, n=178,p=13n=178, p=13), showed a high correlation (ρ=0.85\rho=0.85) between Unity and permutation VIM rankings, and highlighted marginal, conditional, and non-linear effect structures via CRTR visualization.

7. Interpretation, Applications, and Extensions

CRTRs have practical interpretive value: in real data, they reveal nuanced roles for covariates—distinguishing marginal main effects, conditional dependencies, and qualitative reversals (e.g., different split order yields reversed associations).

Practical usage requires minimal tuning for standard settings, though very small (n<100n<100) or extremely high-dimensional data may require customization. For exploratory data applications, Unity VIM can be compared to permutation VIM, and CRTR graphics used to interrogate mechanisms underlying covariate importance.

Potential research directions include extensions of the UFO root methodology to survival outcomes (requiring appropriate partition criteria), integration into gradient boosting with shallow trees, identification of multiple heterogeneous CRTRs per covariate, and formal statistical analysis of consistency and risk properties.


Unity Forests represent a significant methodological advance in random forest modeling, specifically enabling detection and interpretation of interaction-based effects that elude standard greedy splitting. Their empirical superiority in discrimination and accuracy, combined with interpretative tools such as Unity VIM and CRTRs, expands the utility and transparency of ensemble tree models (Hornung et al., 11 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Unity Forests (UFOs).