Ranking Perspective for Tree-based Methods with Applications to Symbolic Feature Selection
Abstract: Tree-based methods are powerful nonparametric techniques in statistics and machine learning. However, their effectiveness, particularly in finite-sample settings, is not fully understood. Recent applications have revealed their surprising ability to distinguish transformations (which we call symbolic feature selection) that remain obscure under current theoretical understanding. This work provides a finite-sample analysis of tree-based methods from a ranking perspective. We link oracle partitions in tree methods to response rankings at local splits, offering new insights into their finite-sample behavior in regression and feature selection tasks. Building on this local ranking perspective, we extend our analysis in two ways: (i) We examine the global ranking performance of individual trees and ensembles, including Classification and Regression Trees (CART) and Bayesian Additive Regression Trees (BART), providing finite-sample oracle bounds, ranking consistency, and posterior contraction results. (ii) Inspired by the ranking perspective, we propose concordant divergence statistics $\mathcal{T}_0$ to evaluate symbolic feature mappings and establish their properties. Numerical experiments demonstrate the competitive performance of these statistics in symbolic feature selection tasks compared to existing methods.
- Hierarchical Shrinkage: Improving the Accuracy and Interpretability of Tree-based Models. In International Conference on Machine Learning, pages 111–135. PMLR, 2022.
- Decision Tree Learning using a Bayesian Approach at Each Node. In Advances in Artificial Intelligence: 22nd Canadian Conference on Artificial Intelligence, Canadian AI 2009 Kelowna, Canada, May 25-27, 2009 Proceedings 22, pages 4–15. Springer, 2009.
- Generalized random forests. The Annals of Statistics, 47(2):1148 – 1178.
- Variable Selection for BART: an Application to Gene Regulation. The Annals of Applied Statistics, 8(3):1750–1781, 2014. Publisher: Institute of Mathematical Statistics.
- Classification and Regression Trees. Routledge, 1987.
- Randal E Bryant. Symbolic Boolean Manipulation with Ordered Binary-decision Diagrams. ACM Computing Surveys (CSUR), 24(3):293–318, 1992.
- Sourav Chatterjee. A New Coefficient of Correlation. Journal of the American Statistical Association, 116(536):2009–2022, 2021.
- Bayesian CART Model Search. Journal of the American Statistical Association, 93(443):935–948, 1998.
- Bayesian Treed Models. Machine Learning, 48:299–320, 2002.
- BART: Bayesian Additive Regression Trees. The Annals of Applied Statistics, 4(1):266–298, 2010.
- Ranking and Empirical Minimization of U-statistics. The Annals of Statistics, 36(2), 2008.
- The TreeRank Tournament Algorithm for Multipartite Ranking. Journal of Nonparametric Statistics, 27(1):107–126, 2015.
- On Partitioning Rules for Bipartite Ranking. In Artificial Intelligence and Statistics, pages 97–104. PMLR, 2009.
- On Tree-based Methods for Similarity Learning. In International Conference on Machine Learning, Optimization, and Data Science, pages 676–688. Springer, 2019.
- Adaptive Partitioning Schemes for Bipartite Ranking. Machine Learning, 83(1):31–69, 2011.
- Subset ranking using regression. In Learning Theory: 19th Annual Conference on Learning Theory, COLT 2006, pages 605–619. Springer, 2006.
- H. E. Daniels. The Relation Between Measures of Correlation in the Universe of Sample Permutations. Biometrika, 33(2):129, 1944.
- Gene Selection and Classification of Microarray Data using Random Forest. BMC Bioinformatics, 7(1):1–13, 2006.
- Nonparametric Independence Screening in Sparse Ultra-high-dimensional Varying Coefficient Models. Journal of the American Statistical Association, 109(507):1270–1284, 2014.
- Walter D Fisher. On Grouping for Maximum Homogeneity. Journal of the American statistical Association, 53(284):789–798, 1958.
- Variable Selection using Random Forests. Pattern recognition letters, 31(14):2225–2236, 2010.
- Why do tree-based models still outperform deep learning on typical tabular data? Advances in neural information processing systems, 35:507–520, 2022.
- Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (with discussion). Bayesian Analysis, 15(3):965–1056, 2020.
- The Elements of Statistical Learning: Data mining, Inference, and Prediction, volume 2. Springer, 2009.
- Nonparametric Statistical Methods. John Wiley & Sons, 2013.
- Assessing Variable Activity for Bayesian Regression Trees. Reliability Engineering & System Safety, 207:107391, 2021.
- Large scale prediction with decision trees. Journal of the American Statistical Association, 119(545):525–537, 2024.
- Feature selection with the boruta package. Journal of Statistical Software, 36(11):1–13, 2010.
- Contemporary Symbolic Regression Methods and their Relative Performance. arXiv preprint arXiv:2107.14351, 2021.
- Probability in Banach Spaces: isoperimetry and processes, volume 23. Springer Science & Business Media, 1991.
- Meng Li and Li Ma. Learning Asymmetric and Local Features in Multi-dimensional Data through Wavelets with Recursive Partitioning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7674–7687, 2021.
- Antonio R Linero. Bayesian Regression Trees for High-dimensional Prediction and Variable Selection. Journal of the American Statistical Association, 113(522):626–636, 2018.
- Variable Selection Using Bayesian Additive Regression Trees. arXiv preprint arXiv:2112.13998, 2021.
- Frontal Slice Approaches for Tensor Linear Systems. arXiv preprint arXiv:2408.13547, 2024.
- Sharded Bayesian Additive Regression Trees. arXiv preprint arXiv:2306.00361, 2023.
- Efficient Decision Trees for Tensor Regressions. arXiv preprint arXiv:2408.01926, 2024.
- Interpretable Scientific Discovery with Symbolic Regression: A Review. Artificial Intelligence Review, 57(1):2, 2024.
- Bipartite Ranking: a Risk-theoretic Perspective. The Journal of Machine Learning Research, 17(1):6766–6867, 2016.
- Piecewise monotone polynomial approximation. Transactions of the American Mathematical Society, 172:465–472, 1972.
- J. Ross Quinlan. Induction of decision trees. Machine learning, 1:81–106, 1986.
- On theory for BART. In The 22nd international conference on artificial intelligence and statistics, pages 2839–2848. PMLR, 2019.
- Veronika Ročková and Stéphanie van der Pas. Posterior Concentration for Bayesian Regression Trees and Forests. The Annals of Statistics, 48(4):2108–2131, 2020.
- A mixing time lower bound for a simplified version of bart. arXiv preprint arXiv:2210.09352, 2022.
- Consistency of random forests. The Annals of Statistics, 43(4):1716–1741, 2015.
- Conditional variable importance for random forests. BMC Bioinformatics, 9(1):1–11, 2008.
- On Theoretically Optimal Ranking Functions in Bipartite Ranking. Journal of the American Statistical Association, 112(519):1311–1322, 2017.
- Sara van de Geer. On the uniform convergence of empirical norms and inner products, with application to causal inference. Electronic Journal of Statistics, 8(1):543 – 574, 2014.
- ranger: A fast implementation of random forests for high-dimensional data in c++ and r. Journal of Statistical Software, 77:1–17, 2017.
- Operator-induced structural variable selection with applications to materials genomes. Journal of the American Statistical Association, 119(545):81–94, 2024.
- Classification Trees for Imbalanced Data: Surface-to-Volume Regularization. Journal of the American Statistical Association, 118(543):1707–1717, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.