Boosting-Based Sequential Meta-Tree Ensemble Construction for Improved Decision Trees

Published 9 Feb 2024 in stat.ML and cs.LG | (2402.06386v1)

Abstract: A decision tree is one of the most popular approaches in machine learning fields. However, it suffers from the problem of overfitting caused by overly deepened trees. Then, a meta-tree is recently proposed. It solves the problem of overfitting caused by overly deepened trees. Moreover, the meta-tree guarantees statistical optimality based on Bayes decision theory. Therefore, the meta-tree is expected to perform better than the decision tree. In contrast to a single decision tree, it is known that ensembles of decision trees, which are typically constructed boosting algorithms, are more effective in improving predictive performance. Thus, it is expected that ensembles of meta-trees are more effective in improving predictive performance than a single meta-tree, and there are no previous studies that construct multiple meta-trees in boosting. Therefore, in this study, we propose a method to construct multiple meta-trees using a boosting approach. Through experiments with synthetic and benchmark datasets, we conduct a performance comparison between the proposed methods and the conventional methods using ensembles of decision trees. Furthermore, while ensembles of decision trees can cause overfitting as well as a single decision tree, experiments confirmed that ensembles of meta-trees can prevent overfitting due to the tree depth.

Abstract PDF Upgrade to Chat

Summary

The paper presents a novel boosting-based method for sequentially constructing meta-tree ensembles to reduce overfitting in decision trees.
The methodology leverages CART to build representative trees and converts them into meta-trees including all subtrees, optimizing the mean squared error.
Experimental results on synthetic and benchmark datasets show improved accuracy and smaller Bayes risk compared to GBDT and LightGBM.

Boosting-Based Sequential Meta-Tree Ensemble Construction

The paper "Boosting-Based Sequential Meta-Tree Ensemble Construction for Improved Decision Trees" (2402.06386) introduces a novel method for constructing ensembles of meta-trees using a boosting approach, addressing the overfitting issues commonly associated with deepened decision trees. The proposed method leverages the statistical optimality of meta-trees, which are sets of subtrees derived from a representative tree, to enhance predictive performance while mitigating overfitting.

Background and Motivation

Decision trees are widely used in machine learning due to their interpretability and expressivity. However, they are prone to overfitting, especially when the tree depth is excessive. While pruning and penalty terms can help control tree depth, meta-trees offer an alternative approach. A meta-tree comprises all subtrees of a representative tree, allowing for a combination of predictions from both shallow and deep trees. Ensembles of decision trees, constructed via bagging or boosting, are known to improve predictive performance, motivating the exploration of meta-tree ensembles. This paper addresses the gap in existing research by proposing a boosting-based method for sequentially constructing meta-tree ensembles.

Methodology: Sequential Meta-Tree Construction

The proposed method constructs ensembles of B meta-trees sequentially, similar to boosting algorithms. The process involves minimizing an evaluation function, specifically the mean squared error (MSE), by iteratively adding meta-trees that reduce the gradient of the error.

Initialization: Start with an initial prediction $F_0$ and an empty meta-tree set $\mathcal{M}$ .
Residual Calculation: Compute the residual $r_i$ between the true value $y_i$ and the current prediction $F_{b-1}(\bm{x}_i)$ for each data point.
Tree Building: Use CART to construct a representative tree $(T, \bm{k}_b)$ by splitting the data to minimize the sample variance of the residuals.
Meta-Tree Construction: Convert the representative tree to a meta-tree $\mathrm{M}_{T,\bm{k}_b}$ , which includes all its subtrees.
Ensemble Update: Add the new meta-tree to the ensemble and update the prediction function $F_b$ .
Iteration: Repeat steps 2-5 for B iterations.

The algorithm incorporates a learning rate $\gamma$ to scale the predicted values, which helps prevent overfitting by slowing down the learning process.

Figure 1: The notations for the binary model tree (left) and an example of the model tree (right). The subscript of $\bm{x}$ (red) represents the feature $k_s$ , and if $x_{k_s}$ is a continuous variable, it is divided by a threshold value $t_{k_s}$ . If $x_{k_s}$ is a binary variable, it is divided by a binary value of 0 or 1. If $\bm{x}$ is assigned to the root node of the model tree (right), following the red path leads to the leaf node $s_{01}$ . The output is generated from $p(y|\theta_{s_{01}})$ .

Prediction Models and Weighting Schemes

The paper explores different prediction models and weighting schemes for combining the meta-trees in the ensemble:

GBDT-based Model: This model, similar to GBDT, learns the output of the b-th meta-tree as the residual between $F_{b-1}$ and $y$ , using learning and predicting weights of 1.
Probability Distribution-based Models: These models treat the weights as probabilities.
- Uniform Distribution: Analogous to Random Forest, this approach assigns uniform probabilities as weights.
- Posterior Distribution of $\bm{k}$ : This method uses the posterior distribution of the explanatory variable features $\bm{k}$ as weights. The posterior distribution is calculated based on the assumption that the prior distribution of $\bm{k}$ is uniform.

Experimental Results

The authors conducted experiments using synthetic and benchmark datasets to compare the performance of the proposed methods with conventional methods, including GBDT and LightGBM.

Experiment 1: Bayes Optimality

This experiment used synthetic data generated from a true model tree to assess the Bayes optimality of the proposed methods. The results showed that the proposed methods had smaller Bayes risk compared to GBDT and LightGBM. The method using the posterior probabilities of $\bm{k}$ (MT_pos-pos) achieved the smallest Bayes risk, indicating that the constructed meta-tree ensembles closely resembled the true model tree.

Figure 2: The result of Experiment 1

Experiment 2: Influence of Tree Depth

This experiment investigated the influence of the depth of the true model tree and the depth of the meta-trees. The results indicated that when the depth of the meta-trees increased, the methods using posterior probabilities of $\bm{k}$ exhibited smaller Bayes risk. However, when the meta-tree depth was shallower than the true model tree depth, the predictive performance of these methods was worse. In such cases, MT_gbdt and MT_uni-uni performed better.

Figure 3: The result of Experiment 2. The maximum depth of the true model tree is 3,5,7, and the result of increasing the depth of the meta-trees to 3,4,5,6 is shown in (a), (b), and (c).

Experiment 3: Benchmark Datasets

The proposed methods were tested on multiple benchmark datasets for regression. The results showed that the proposed methods were generally more accurate than GBDT and LightGBM. MT_gbdt and MT_uni-uni performed better than MT_pos-pos, suggesting that these datasets might have difficulty representing the true model tree. Furthermore, the proposed methods prevented overfitting when the tree depth was increased.

Conclusion

The paper demonstrates a boosting-based approach for constructing ensembles of meta-trees, which effectively prevents overfitting even with deep trees. The experimental results confirm the properties of the proposed method on both synthetic and benchmark datasets, highlighting its potential for improving decision tree performance. The use of meta-trees and boosting provides a robust framework for handling complex datasets while maintaining statistical optimality.

Markdown Report Issue