Papers
Topics
Authors
Recent
Search
2000 character limit reached

Feature-wise Kernel Ridge Regression

Updated 28 January 2026
  • Feature-wise kernel ridge regression is a method that integrates kernel methods with ridge regularization applied feature by feature to yield stable predictions.
  • It leverages ensemble averaging and controlled shrinkage to achieve convergence to a kernel ridge regression form with provable asymptotic normality and uncertainty quantification.
  • The approach extends to image processing by isolating filamentary structures via BV–G regularization, demonstrating its versatility across statistical learning and geometric feature extraction.

Boulevard regularization refers to two distinct but thematically linked frameworks in contemporary mathematical and machine learning literature: (1) the Boulevard scheme in stochastic gradient boosted trees (GBT) for regression, which achieves statistical regularization via subsampling and averaging, and (2) the “BV–G” regularization in image processing, designed to separate long, thin structural features—specifically “boulevards” or road networks—from background textures. Both use targeted regularization to isolate specific geometric or probabilistic structure, but operate in different domains, with tree ensembles in statistical learning and variational methods in functional analysis. The following account provides a comprehensive overview of both paradigms, emphasizing technical formulation, theoretical guarantees, and algorithmic mechanisms (Zhou et al., 2018, Gilles et al., 2024).

1. Boulevard Regularization in Stochastic Gradient Boosted Trees

Boulevard, as introduced in stochastic gradient boosting, regularizes tree ensembles by combining two principal mechanisms: randomized subsampling and a modified shrinkage (tree-averaging) schedule. At each iteration bb, a random subsample w{1,,n}w \subseteq \{1, \ldots, n\} of size θn\lfloor\theta n\rfloor (with θ(0,1]\theta \in (0,1]) is selected, and a tree tbt_b is fitted to the residuals only of observations in ww. This reduces inter-tree correlation and mitigates overfitting analogously to stochastic GBT.

The primary distinguishing feature is an averaging scheme governed by the update

fb(x)=b1bfb1(x)+λbtb(x),λ(0,1].f_b(x) = \frac{b-1}{b} f_{b-1}(x) + \frac{\lambda}{b} t_b(x), \quad \lambda \in (0,1].

By telescoping, this induces an ensemble predictor

fb(x)=λbi=1bti(x),f_b(x) = \frac{\lambda}{b} \sum_{i=1}^b t_i(x),

endowing each tree with a diminishing weight λ/b\lambda/b. A final rescaling

f^B(x)1+λλfB(x)\hat{f}_B(x) \leftarrow \frac{1+\lambda}{\lambda} f_B(x)

removes the intentional shrinkage toward zero, yielding the prediction function (Zhou et al., 2018).

2. Convergence Theory and Limiting Distribution

Boulevard regularization achieves a well-defined limit as the number of trees BB \to \infty. Provided the tree-building procedure satisfies structure–value isolation (tree structure is independent of leaf values) and non-adaptivity (distribution of structures does not evolve with bb), the algorithm’s ensemble converges to a fixed-point solution with kernel ridge regression form. Let Y=(y1,,yn)TY = (y_1, \ldots, y_n)^T and SbS_b the n×nn \times n “structure matrix” of tree bb, with Kn=E[Sb]K_n = \mathbb{E}[S_b]. The ensemble prediction vector Y^b=(f^b(x1),,f^b(xn))T\hat{Y}_b = (\hat{f}_b(x_1), \ldots, \hat{f}_b(x_n))^T satisfies

Y^bY=(1λI+Kn)1KnY\hat{Y}_b \to Y^* = \left(\frac{1}{\lambda} I + K_n \right)^{-1} K_n Y

almost surely as bb \to \infty. For a new input xx with mean structure vector kn(x)=E[sn(x)]k_n(x)=\mathbb{E}[s_n(x)], the limiting prediction

f^b(x)a.s.kn(x)T(1λI+Kn)1Y\hat{f}_b(x)\xrightarrow{\text{a.s.}} k_n(x)^T\left(\frac{1}{\lambda}I + K_n \right)^{-1}Y

mirrors classical reproducing kernel regression, substantiating the contraction and averaging effect of the Boulevard update (Zhou et al., 2018).

For large nn, under standard shrinking-leaf and sample-size conditions, the limiting prediction is asymptotically normal:

f^n(x)λ1+λf(x)rndN(0,σϵ2),\frac{\hat{f}_n(x) - \frac{\lambda}{1+\lambda}f(x)}{r_n} \xrightarrow{d} N(0, \sigma_\epsilon^2),

with rnT=knT(1λI+Kn)1r_n^T = k_n^T \left(\frac{1}{\lambda} I + K_n\right)^{-1}, and variance term rn2σϵ2r_n^2\sigma_\epsilon^2. The bias λ/(1+λ)f(x)\lambda/(1+\lambda)f(x) is removed by the final scaling step.

3. Uncertainty Quantification and Theoretical Guarantees

Boulevard regularization supports “reproduction intervals” for prediction uncertainty:

f^n(x)±z1α/2σ^ϵrn.\hat{f}_n(x) \pm z_{1-\alpha/2} \hat{\sigma}_\epsilon r_n.

Here, rnr_n (from the ensemble’s fitted structure) and the residual variance σ^ϵ\hat{\sigma}_\epsilon provide a direct measure of prediction variability under the assumed data-generating process. These intervals are justified by the limiting normality theorem and, empirically, display accurate nominal coverage even for moderate nn (Zhou et al., 2018).

4. Empirical Performance and Applications

Simulations and applied tests (e.g., Boston housing, power-plant, protein structure, airfoil noise data) substantiate that Boulevard achieves mean squared error comparable to Random Forests and conventional GBTs. Unlike standard boosting, Boulevard obviates the need for early stopping: the averaging structure inherently provides stability and prevents overfitting. The predictive distribution of f^n(x)\hat{f}_n(x) approximates Gaussianity with variance matching theoretical predictions, and reproduction intervals delivered by this approach maintain close-to-nominal frequentist coverage (Zhou et al., 2018).

5. Boulevard Regularization in BV–G Structures and Image Decomposition

Independently, in the variational image processing context, “Boulevard regularization” refers to the detection and enhancement of long, thin objects (such as boulevards or roads) via the BV–G (Bounded Variation–Meyer G-space) decomposition model. The task is to decompose an image f:ΩR2Rf:\Omega\subset \mathbb{R}^2 \to \mathbb{R} into

f=u+v+w,f = u + v + w,

where uu (structure) lies in BV(Ω)BV(\Omega), modeling piecewise smooth regions with edges; vv captures small-scale “unstructured” residuals in L2(Ω)L^2(\Omega); ww (texture) lies in the Meyer–G space, tailored to oscillatory and sparse objects, specifically those that can be written as divergences of bounded vector fields (Gilles et al., 2024).

The minimization energy is

E(u,v,w)=uBV+λvL22+μwGsubject tou+v+w=f,E(u, v, w) = \|u\|_{BV} + \lambda \|v\|_{L^2}^2 + \mu \|w\|_G \qquad \text{subject to} \quad u + v + w = f,

where λ,μ>0\lambda, \mu > 0 balance the contribution of each component. The key technical result establishes parameter regimes where thin, long objects—such as ribbons corresponding to roads—are optimally captured in ww (the “texture” component), not in uu, due to the relative penalization of BV (which scales with perimeter) and the G-norm (which scales with thickness) (Gilles et al., 2024).

6. Algorithms for Boulevard-Driven Structure Extraction

To compute (u,v,w)(u, v, w), a split-dual projected algorithm (alternating Chambolle projectors) is employed. Once the ww component is extracted, a road- or boulevard-detection pipeline consists of:

  1. A-contrario line detection on ww to identify candidate segments.
  2. Segment fusion and extension, merging collinear and neighboring segments to recover longer structures.
  3. Active contour refinement, initializing polygonal snakes along segments and evolving them with a probabilistically formulated energy to capture precise road edges.

The BV–G model prescribes explicit parameter thresholds (functions of the desired minimum boulevard width ϵ\epsilon and global regularization constants) to guarantee that features of a given geometry are enhanced in ww. Empirical results (satellite imagery) confirm that roads and boulevards appear with sharp contrast in ww, and the resulting extraction methods yield road networks with improved precision and recall compared to classical edge-defining approaches, particularly for low-contrast or narrow features (Gilles et al., 2024).

Despite their unrelated origins—statistical learning and variational image analysis—both Boulevard regularizations exploit structural/ensemble averaging coupled with specialized penalties to isolate either statistical or geometric “boulevards”: uncorrelated tree contributions in ensembles, or long, thin filamentary regions in images. In both cases, regularization parameters and the form of the penalty/fusion schedule are analytically linked to desired isolation properties, with provable guarantees about prediction limits or geometric representation.

For boosted trees, the innovation is the asymptotic kernel-ridge form and attendant uncertainty quantification through the explicit averaging and shrinkage pathway (Zhou et al., 2018). In the image domain, the G-norm’s scale sensitivity favors boulevard-like features by making their BV penalty prohibitive and G-norm penalty minimal, with threshold conditions guiding practical algorithm design (Gilles et al., 2024).

Boulevard regularization in both domains thus exemplifies how domain-specific, mathematically motivated penalties yield interpretable, stable, and theoretically tractable extraction of objects of structural interest.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Feature-wise Kernel Ridge Regression.