Papers
Topics
Authors
Recent
Search
2000 character limit reached

Unity VIM: Interaction-Based Variable Importance

Updated 18 January 2026
  • Unity VIM is a specialized variable importance metric for Unity Forests that detects pure interaction effects even when marginal signals are weak.
  • It computes importance by focusing on top out-of-bag impurity-reducing splits at jointly optimized tree roots, using local permutation tests.
  • Unity VIM overcomes limitations of traditional RF measures by reliably ranking covariates that only show predictive power in interaction-dependent contexts.

The Unity Variable Importance Measure (unity VIM) is a variable importance metric tailored to Unity Forests (UFOs), a random forest variant that improves detection and quantification of variables whose effects manifest mainly through interactions—especially those with little or no marginal predictive signal. By focusing on the top out-of-bag impurity-reducing splits in the shallow, jointly optimized roots of UFO trees, unity VIM quantifies each covariate’s importance according to its maximal local impact. Unlike classical random forest importance metrics, unity VIM is specifically crafted to identify and quantify “pure interactors,” i.e., covariates whose discriminative value is only revealed conditionally upon earlier splits on their interacting partners, rather than through main effects (Hornung et al., 11 Jan 2026).

1. Formal Definition and Mathematical Foundations

Let F={T1,,TT}\mathcal{F} = \{T_1, \ldots, T_T\} denote a Unity Forest composed of TT trees. Each tree TtT_t is constructed in two phases:

  • A joint root of depth at most DD (default D=3D=3) is selected from ncandn_{\text{cand}} random candidate roots grown on a random subset of prop_varp\text{prop\_var} \cdot p covariates.
  • The remainder of each tree is grown using standard CART.

Denote by \ell the internal nodes (non-leaf) in the root of each tree and by j{1,,p}j \in \{1, \ldots, p\} the index for each covariate. For each \ell, let SinS_\ell^{\mathrm{in}} (size NinN_\ell^{\mathrm{in}}) and SoobS_\ell^{\mathrm{oob}} represent the in-bag and out-of-bag samples passing through \ell, respectively. Define C()C(\cdot) as the impurity criterion: Gini for classification, variance for regression.

The in-bag split score at node \ell (split on covariate jj) is

s=Nin[C(parent(),in)C(left(),in)C(right(),in)].s_\ell = N_\ell^{\mathrm{in}} \left[ C(\text{parent}(\ell), \mathrm{in}) - C(\text{left}(\ell), \mathrm{in}) - C(\text{right}(\ell), \mathrm{in}) \right].

For each covariate jj, collect all such splits across all trees’ roots (total count mjm_j for jj). Compute kj=max(prop_best_splitsmj,5)k_j = \max(\lceil \text{prop\_best\_splits}\cdot m_j \rceil, 5) and let BjB_j be the set of kjk_j splits for jj with highest ss_\ell.

For all Bj\ell \in B_j, compute the out-of-bag (OOB) impurity drop and its local permutation variant:

  • SC=Nin[C(parent(),oob)C(left(),oob)C(right(),oob)]SC_\ell = N_\ell^{\mathrm{in}} \left[ C(\text{parent}(\ell), \mathrm{oob}) - C(\text{left}(\ell), \mathrm{oob}) - C(\text{right}(\ell), \mathrm{oob}) \right]
  • SCperm(j)=SC_\ell^{\mathrm{perm}(j)} = same, after permuting covariate jj among SoobS_\ell^{\mathrm{oob}}

The unity VIM for jj is then:

VIMj=BjNin(SCSCperm(j))\text{VIM}_j = \sum_{\ell \in B_j} N_\ell^{\mathrm{in}} \left( SC_\ell - SC_\ell^{\mathrm{perm}(j)} \right)

This definition targets the “power effect” of jj under the local context in which jj is most discriminative—either marginally or after interacting splits (Hornung et al., 11 Jan 2026).

2. Algorithmic Workflow and Implementation

The core UFO and unity VIM workflow proceeds in two stages, outlined precisely as follows.

UNITY FOREST Algorithm (model fitting):

  1. For each of TT trees: a. Subsample observations (fraction fract_n\text{fract\_n}) as in-bag, remainder is OOB. b. Randomly select prop_varp\text{prop\_var}\cdot p covariates for the root. c. Generate ncand_rootsn_{\text{cand\_roots}} candidate roots of depth D\leq D; in each, splits are made at random among chosen covariates. d. For each candidate root rr, compute total impurity drop Q(r)Q(r). e. Select the candidate root maximizing Q(r)Q(r) as the tree root. f. Grow the remainder of the tree using standard CART.

UNITY VIM Algorithm (importance computation):

  1. For each covariate jj, collect all root-level splits involving jj, storing (,s,Nin)(\ell, s_\ell, N_\ell^{\mathrm{in}}).
  2. For each jj: a. Sort splits descending by ss_\ell. b. Take top kjk_j splits as BjB_j. c. For each Bj\ell \in B_j: i. Compute SCSC_\ell and SCperm(j)SC_\ell^{\mathrm{perm}(j)} using OOB data. ii. Accumulate Δ=Nin(SCSCperm(j))\Delta_\ell = N_\ell^{\mathrm{in}} (SC_\ell - SC_\ell^{\mathrm{perm}(j)}). d. Set VIMj=BjΔ\text{VIM}_j = \sum_{\ell\in B_j} \Delta_\ell.
  3. Return {VIMj}j=1p\{\text{VIM}_j\}_{j=1}^p.

Recommended defaults are D=3D=3, prop_var=p/p\text{prop\_var} = \sqrt{p}/p (when p100p \leq 100, else $0.1$), ncand_roots=500n_{\text{cand\_roots}} = 500, prop_best_splits=0.01\text{prop\_best\_splits}=0.01, and TT between 1000 and 20000 (Hornung et al., 11 Jan 2026).

3. Comparison with Classical Random Forest Importances

Standard random forest variable importance measures include:

  • Permutation Importance: Measures the increase in OOB error when a covariate jj is permuted globally across OOB samples. Sensitive only to global or marginal effects; interactions without main effects are typically missed.
  • Mean Decrease Impurity (MDI, Gini decrease): Sums impurity drops at all splits on jj; biased toward variables with many split points and fails to recognize pure interactors.

These approaches are inherently limited in detecting covariates that only show discriminative power in the presence of certain ancestor splits. Standard forests’ top-level splits are chosen solely by maximizing local impurity reduction (marginal gain), so “pure” interactions—where no individual variable shows a main effect—are excluded from early splits and thus from major importance assignment. Both global permutation and MDI VIMs assign near-zero importance to those features.

Unity VIM directly overcomes these limitations by:

  • Forcing pure interactors into tree roots via random covariate subsampling and joint optimization,
  • Scoring variables based on their strongest “local context” splits rather than aggregating marginal effects or global permutations,
  • Concentrating attention on those subregions of feature space where interactions manifest.

This ensures that, for patterns such as the XOR, variables that would otherwise be missed are assigned high importance values (Hornung et al., 11 Jan 2026).

4. Identification of Pure Interactions: Illustrative Mechanism

Consider the canonical XOR problem with binary features X1,X2X_1, X_2 and y=X1 XOR X2y = X_1 \text{ XOR } X_2. Neither X1X_1 nor X2X_2 reduces Gini impurity at the root in standard trees—the splits are locally non-discriminative and thus rarely selected. However, with a joint root of depth two (in UFO), candidate structures such as X1X_1 split followed by X2X_2, or vice versa, are eligible. Across many random roots, one matches the correct interaction; impurity is maximized with pure leaves.

Unity VIM for X1X_1 or X2X_2 is then determined by just those nodes where the split on X2X_2 (conditional on X1X_1) achieves maximal separation. When X2X_2 is permuted only in the OOB samples reaching such nodes, its local discriminative capacity is destroyed, collapsing the impurity improvement—this “local” permutation directly reflects the real value of the interaction, unlike global permutations or cumulative impurity reductions (Hornung et al., 11 Jan 2026).

5. Computational Considerations and Hyperparameter Selection

Unity VIM is efficient with recommended hyperparameters for typical datasets up to a few hundred features and observations. The design leverages:

  • Parallelization at the node or covariate level (e.g., via multithreading in C++),
  • Storage of all tree roots’ split information before variable-based filtering and score aggregation,
  • Focus only on top 1% (default) of splits for each variable, minimizing memory and compute costs.

Key parameters include number of trees (TT), subsampling fractions, covariate sampling for root growth (prop_var\text{prop\_var}), number of root candidates (ncand_rootsn_{\text{cand\_roots}}), and the filter fraction (prop_best_splits\text{prop\_best\_splits}). Increasing root depth captures higher-order interactions but at cost of more candidate generation. Smaller datasets (n200n \lesssim 200) may exhibit high variance; stability can be checked by comparing to permutation VIM.

Additionally, covariate-representative tree roots (CRTRs) can be used after VIM computation to visualize whether a covariate exerts its strongest effects marginally or interactively (Hornung et al., 11 Jan 2026).

6. Practical Implications, Context, and Significance

Unity VIM represents a major methodological advance for quantifying variable importance where high-order interactions dominate and marginal effects are weak or absent. Prior measures either miss or systematically down-weight “purely interactive” covariates. By focusing measurement on maximally local, context-specific discriminatory power, unity VIM enables:

  • Reliable detection of variables participating only in interactions,
  • Proper ranking of covariates when effect types differ across features,
  • Enhanced interpretability when paired with CRTRs.

In synthetic and real data experiments, unity VIM consistently identified truly interacting variables and outperformed conventional RF-based methods in discrimination and predictive performance (Hornung et al., 11 Jan 2026).

The broader significance encompasses variable importance needs in genomics, epidemiology, and high-dimensional feature selection, where interaction effects are common. Unity VIM occupies a distinct niche among recent VIM frameworks (e.g., targeted learning, conditional permutation, plug-in CATE-based metrics (Wang et al., 2024, Khan et al., 2022)) by adapting the tree-building process itself to optimize for detection of interaction effects at the core of importance assignment.

Table: Comparison of Unity VIM with Conventional RF Importances

Measure Splitting Mechanism Captures Pure Interactions Focus of Importance Scoring
Permutation VIM Global (OOB) Permutation No Overall OOB prediction error
MDI (Gini decrease) All splits (local, marginal) No Sum of all impurity reductions
Unity VIM Jointly optimized tree roots (UFO) Yes Top out-of-bag impurity drops post-filter

The mechanism of unity VIM provides context-sensitive variable scoring, thereby rectifying the failure of standard methods to reflect interaction-driven predictivity (Hornung et al., 11 Jan 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Unity Variable Importance Measure (VIM).