Monotonicity-Constrained GB Surrogate

Updated 12 January 2026

The paper introduces a monotonicity-constrained gradient boosting surrogate that enforces hard linear inequality constraints to guarantee theoretically dictated monotonic effects.
It leverages popular boosting libraries like XGBoost, LightGBM, and CatBoost to rigorously implement monotonic splits and constrained leaf values for improved model fidelity.
Empirical evaluations show negligible predictive loss on large datasets and highlight practical tuning strategies to balance interpretability with calibration and discrimination.

A monotonicity-constrained gradient boosting surrogate is a tree-based ensemble model trained with hard linear inequality constraints that enforce monotonic relationships between specified features and the predicted outcome. This framework is particularly salient in domains where theory, regulation, or economic intuition require predictor variables to exert monotonic effects. Core examples include interpretable surrogates for functional ANOVA decompositions with monotonicity (“Mono-GAMI-Tree” models), monotone-regularized GAMs, and credit scoring models with mandated monotonic trends. Modern implementations adapt boosting libraries such as XGBoost, LightGBM, and CatBoost to achieve hard monotonicity guarantees while retaining competitive predictive accuracy and interpretability (Hu et al., 2023, Hofner et al., 2014, Koklev, 14 Dec 2025).

1. Functional Model Structure and Monotonicity Constraints

Monotonicity-constrained surrogates target a functional form comprising additive main effects and selected bivariate interactions: $f(x) = \sum_{j=1}^p f_j(x_j) + \sum_{1 \leq j < k \leq p} f_{jk}(x_j, x_k)$ where $f_j$ captures the univariate response of feature $j$ , and $f_{jk}$ encodes the second-order interaction for $(x_j, x_k)$ . L₂-identifiability (centering or orthogonality) ensures unique decomposition. Monotonicity is imposed on designated features via the constraint: $\frac{\partial f}{\partial x_j} \geq 0 \quad \text{(or } \leq 0\text{)}$ Feature monotonicity can be contextualized for economic variables (e.g., credit risk factors, dose-response in epidemiology), regulatory compliance, or model interpretability (Hu et al., 2023, Hofner et al., 2014).

2. Algorithmic Enforcement in Gradient Boosting Frameworks

Tree-based boosting models enforce monotonicity at both split-finding and leaf-weight assignment stages. In XGBoost, for instance, when splitting on a constrained feature $j$ , child leaf values $w_\mathrm{left}$ and $w_\mathrm{right}$ must satisfy: $w_\mathrm{right} \geq w_\mathrm{left}$ for “increasing” monotonicity. Higher-level monotonicity across the ensemble is ensured by propagating such split constraints within each tree. Leaf weights $w_{t,l}$ for tree $t$ , leaf $l$ , are subject to global bounds post-fitting. This extends to piecewise-constant fits and interaction-aware models via interaction_constraints and monotone_constraints options (Hu et al., 2023, Koklev, 14 Dec 2025).

CatBoost employs ordered boosting and monotone-specific shrinkage within symmetric trees, maintaining global monotonicity via constrained leaf-weight updates. LightGBM uses similar split-based pruning and bounded leaf-weight assignment.

The general constraint system can be formalized as: $C w \leq 0$ where $w$ stacks all leaf weights and $C$ encodes all pairwise monotonicity conditions, resulting in a linearly constrained optimization within each boosting iteration (Koklev, 14 Dec 2025).

3. Mono-GAMI-Tree Pipeline and Surrogate Extraction

“Mono-GAMI-Tree” [Editor's term] refers to the monotone tree-based surrogate architecture for fitting low-order functional ANOVA (GAMI) models:

Interaction Filtering: Fit a depth-1 monotone XGBoost (or unconstrained GAM) to estimate main effects, calculate residuals, and select the top $K$ interactions explaining maximal residual reduction.
Monotone XGBoost Training: Fit an ensemble of shallow trees with specified interaction and monotonicity constraints, enforcing global non-decreasing or non-increasing behavior for selected features.
Parsing and Purification: Decompose the ensemble into univariate and bivariate terms via tree parsing; apply hierarchical orthogonalization (“purification”) to ensure interaction terms are orthogonal to marginals:
- Extract raw $f_{jk}$ ,
- Fit $v_i \sim g_j(x_{ij}) + g_k(x_{ik})$ for each $(j,k)$ ,
- Update $f_j(x)$ , $f_k(x)$ , and $f_{jk}(x_j, x_k)$ accordingly.

This yields interpretable, piecewise-constant univariate/bivariate fits with monotonicity guaranteed for target features (Hu et al., 2023).

4. Empirical Evaluation: Predictive Performance and Interpretability

Simulated and benchmarked experiments reveal that monotonicity constraints typically incur negligible predictive loss on large datasets (AUC PoM $<0.2\%$ ), with “Price of Monotonicity” (PoM) increasing for smaller datasets or high-coverage constraint scenarios (up to $2-3\%$ AUC; calibration losses up to $13\%$ Brier for ~64% feature coverage). Comparative studies indicate:

Mono-GAMI-Tree and EBM achieve near-identical RMSE/AUC for monotone first-order models, but only Mono-GAMI-Tree guarantees hard monotonicity.
In second-order models with active interactions, Mono-GAMI-Tree demonstrates smoother marginals and less overfitting at region boundaries than EBM, which can exhibit non-monotonic artifacts.
Calibration and discrimination trade-offs are non-uniform and require monitoring; monotone constraints can improve interpretability with minimal impact on classification power in large credit portfolios (Hu et al., 2023, Koklev, 14 Dec 2025).

5. Practical Implementation: Tuning, Feature Selection, and Constraint Specification

Best practices include constraining only features with strong monotonic economic or scientific priors (e.g., risk ratios, payment delays), validating constraints using partial dependence or ICE plots, and omitting ambiguous predictors with known non-monotonic effects (such as U-shaped age trends).

Hyperparameter tuning to minimize PoM favors shallow trees (max_depth $\approx$ 2–6), modest learning rates ( $\eta$ $\approx$ 0.01–0.05), and heightened leaf-weight regularization ( $\lambda$ $\approx$ 1–5). Identical training grids for constrained and unconstrained models enable unbiased PoM estimation. Library selection may be guided by calibration, discrimination, and computational properties; CatBoost may offer slight calibration improvements under monotonic constraints (Koklev, 14 Dec 2025).

6. Alternative Approaches: Spline-Based Boosting and Constrained Regression

Monotone boosting can also be formulated in a basis-expansion context: $f_j(x) = B_j(x)^\top \beta_j$ where monotonicity is enforced by adjacent-coefficient differences: $D^{(j)} \beta_j \geq 0$ with $D^{(j)}$ the first-difference matrix. Fitting proceeds via component-wise boosting and repeated solution of linearly constrained quadratic programs: $\beta_j^{(m)} = \arg\min_{\beta} \sum_{i=1}^n \left(u_i^{[m]} - B_j\left(x_i^{(j)}\right)^\top \beta\right)^2 + \lambda_j \beta^\top P_j \beta$ subject to $D^{(j)} \beta \geq 0$ , where $P_j$ is the smoothness penalty and $u_i^{[m]}$ are negative gradients. Variable selection and shrinkage are controlled by step-length $\nu$ and iteration count $M$ (Hofner et al., 2014).

Case studies (e.g., São Paulo mortality vs. SO₂ exposure) confirm that monotone-boosted surrogates can match traditional constrained GAMs in predictive performance and interpretability while supporting intrinsic variable selection and broad loss function compatibility (Hofner et al., 2014).

7. Guidelines, Limitations, and Decision Frameworks

The application of monotonicity-constrained gradient boosting surrogates is subject to trade-offs between interpretability and predictive performance. Low Price of Monotonicity in large datasets enables robust, interpretable surrogates in highly regulated domains with “free” monotonicity. For moderate or small sample sizes, extensive constraint coverage can elevate PoM, requiring diagnostic evaluation and selective constraint specification.

Empirical guidelines include:

Constrain only features with well-justified monotonic relationships.
Use paired-bootstrap PoM metrics for accuracy monitoring.
Select libraries and regularization parameters to balance calibration, discrimination, and computational efficiency.

A plausible implication is that monotonicity-constrained surrogates represent an optimal fusion of compliance-driven interpretability and ensemble predictive power in modern tree-based machine learning frameworks (Koklev, 14 Dec 2025, Hu et al., 2023, Hofner et al., 2014).