Overlapping Group LASSO: Structured Feature Selection

Updated 30 January 2026

Overlapping Group LASSO is a regularization framework that enforces structured sparsity by decomposing coefficients into latent group-specific effects.
It utilizes optimization methods such as block coordinate descent, FISTA, and ADMM to efficiently handle the non-separable nature of overlapping groups.
Applications in biomedical, neuroimaging, and signal processing demonstrate its ability to improve feature selection, interpretability, and predictive accuracy.

Overlapping Group LASSO is a structured sparsity-inducing regularization framework for regression and classification that enforces selection at the level of predefined groups of features, allowing groups to share features arbitrarily. Such models are essential in domains where feature structure is inherently multi-faceted (e.g., biological pathways, spatial patches, hierarchical covariates), but the non-separability introduced by overlapping groups poses distinct statistical and computational challenges. The core technical mechanism is a latent-variable decomposition: for each group, a latent coefficient vector specific to that group is introduced, and the total effect for each feature is the sum of its contributions across all groups to which it belongs. Penalization is then performed over the groupwise latent vectors via an $\ell_2$ norm. This article covers the mathematical formulation, convexity and dual properties, optimization strategies, screening and acceleration methods, statistical guarantees, and real-world applications in biomedical and signal processing contexts.

1. Mathematical Formulation and Latent Variable Structure

The canonical form of the overlapping group LASSO problem is: $\min_{\beta_0, \{u_g\}}\, L(\beta_0, \beta_{-0}) + \lambda\sum_{g\in G} w_g \|u_g\|_2,\quad \text{subject to}\quad \beta_{-0} = \sum_{g\in G} A_g u_g,$ where:

$L(\beta_0, \beta_{-0})$ is a convex loss (e.g., negative logistic log-likelihood $-\frac{1}{n}\sum_{i=1}^n [y_i x_i^\top\beta - \log(1 + e^{x_i^\top \beta})]$ ),
$G = \{g_1,\ldots,g_J\}$ is a collection of groups (possibly overlapping subsets of ${1,\dots,p}$ ),
$u_g \in \mathbb{R}^{|g|}$ are latent coefficients for group $g$ ,
$A_g \in \{0,1\}^{p\times |g|}$ maps $u_g$ into the full coefficient vector $\beta_{-0}$ ,
$w_g = \sqrt{|g|}$ (or other positive weights) scales the penalty per group,
$\lambda>0$ is the global sparsity tuning parameter.

By eliminating $\beta_{-0}$ , this constraint reduces the problem to a convex minimization in the expanded (latent) space: $\min_{\{\ u_g\ \}}\ L\left(\beta_0, \sum_g A_g u_g\right) + \lambda\sum_g w_g \|u_g\|_2,$ which is isomorphic to non-overlapping group LASSO on a design matrix with duplicated columns matching the group overlap structure (Zeng et al., 2015).

The latent group norm is formalized as: $\Omega(\beta) = \inf_{\{v^g\}\ \text{with}\ \text{supp}(v^g) \subseteq g,\, \sum_g v^g = \beta}\ \sum_g d_g \|v^g\|_2,$ and this norm is convex, positively homogeneous, and ensures that the support is a union of selected groups (Obozinski et al., 2011).

2. Optimization Algorithms and Computational Enhancements

Block coordinate descent is standard for overlapping group LASSO, exploiting the group-separable nature of the latent variable representation. The group-wise update involves weighted soft-thresholding (proximal operator) for each group: $u_g^{(t)} \leftarrow \frac{1}{H_g}\left(1 - \frac{n\lambda w_g}{\|g_g\|_2}\right)_+ g_g,$ where $g_g$ is the gradient and $H_g$ is a group-wise Hessian or Lipschitz constant (Zeng et al., 2015, Rao et al., 2014).

Accelerated proximal-gradient methods (e.g., FISTA) can be applied, but the non-separable proximal operator is computed by projecting onto the intersection of norm balls defined by the overlapping groups. For $\ell_2$ penalties, dual-based projected Newton methods solve the key subproblem efficiently by exploiting active set strategies (Villa et al., 2012, Liu et al., 2010).

ADMM frameworks are especially suited when the groups are large or highly overlapping. Each ADMM step alternates between updating the full coefficient vector and independent group-wise latent variables, with closed-form soft-thresholding for the $\ell_2$ group norm (Zhao, 2023).

Dimension reduction and screening techniques are crucial for scalability. Recent adaptive schemes compute dual certificates for group support:

If $\|\beta^*_{G_t}\|_2 < w_t$ (LASSO certificate) or $\|u^\dagger_{J_t}\|_2<1$ (OGN certificate), group $G_t$ is safely screened as inactive (Bai et al., 28 Jan 2026, Lee et al., 2014). These strategies dramatically reduce the active group set and accelerate convergence.

3. Statistical Theory, Support Recovery, and Oracle Bounds

Overlapping group LASSO achieves structured sparsity: the support of the estimated vector is a union of groups. Model selection error bounds, estimation rates, and prediction guarantees depend on the group sparsity structure, overlap degree, and mutual incoherence properties:

The basic estimation error scales with the number and size of active groups (Rao et al., 2014):

$n \geq C\,\sigma_f^2\,\epsilon^{-2}\,k\,\min\{(1+\mu)^2[\log K + L],\,((1+\mu)/\mu)^2 l \log p\}$

for $(k,l)$ group- and within-group sparsity, $\mu = \lambda_1/\lambda_2$ trade-off.

Oracle bounds (RE-type) extend to the overlapping case, with finite sample rates deteriorating as the number of groups and degree of overlap grow (Percival, 2011):

$(1/n)\|X(\hat\beta-\beta^0)\|_2^2 \leq (64\sigma^2)/(\kappa(s)^2 n) [ \max_g |g| + A\sqrt{\max_g |g|} \log M ]$

Exact asymptotic selection is guaranteed under further non-nesting group conditions, with adaptive weights (e.g., $\lambda_g = 1/\|v_g^{OLS}\|_2^\gamma$ ) serving as key to support recovery (Percival, 2011).

For sparse overlapping group LASSO variants (SOGlasso, SOSlasso), theoretical error bounds interpolate between those for $\ell_1$ and pure group LASSO, achieving minimax rates under sub-Gaussian designs (Rao et al., 2014, Rao et al., 2013).

Statistical equivalence between overlapped and their tightest non-overlapping separable relaxations has been established; the latter attain the same error and support recovery guarantees, but allow scaling to very large problems (Qi et al., 2022).

4. Variants: Hierarchical and Sparse Overlapping Group LASSO

Several generalizations extend the core framework:

Sparse Overlapping Group LASSO (SOGlasso) mixes $\ell_1$ sparsity across and within overlapping groups, with the estimator:

$\min_{w}\ L(w;X,y) + \lambda_1\|w\|_1 + \lambda_2\sum_{G\in\mathcal{G}}\|w_G\|_2$

and equivalent latent variable decompositions (Rao et al., 2014, Rao et al., 2013).

Hierarchical Overlapping Group LASSO (HOGL) structures groups and their overlaps along hierarchies or chains, enforcing additional logical constraints (e.g., variable selection implies degree selection in polynomial models) (Ohishi et al., 22 Oct 2025, Yan et al., 2015).
Latent Overlapping Group LASSO (LOG) is particularly effective for hierarchical sparse modeling (HSM), where the latent variable acts as a group-specific carrier allowing precise enforcement of context- or graph-based selection rules. LOG ensures uniform shrinkage across group nesting depth, as opposed to classical GL which over-penalizes deep parameters (Yan et al., 2015, Wang et al., 2022).

5. Algorithmic Implementation and Practical Tools

Implementations like grpregOverlap (R) build the expanded design matrix only virtually, index efficiently, and delegate the sequence of group coordinate descent problems to proven engines (Zeng et al., 2015). Warm starts, strong rules for active set screening, and group decomposition strategies are explicitly supported, with performance scaling to tens of thousands of effective variables.

Divide-and-conquer methods (DC-ogLasso) split large datasets, solve overlapping group LASSO subproblems locally, aggregate support via majority voting, and refit active features, yielding near-linear scalability and model selection consistency with minimal communication overhead (Chen et al., 2016).

Dimension reduction frameworks (AdaDROPS) utilize adaptive certificate-based support expansion, permitting primal-dual splitting, ADMM, and overparameterized variable projection methods. In high overlap regimes, AdaDROPS delivers order-of-magnitude speedups relative to baseline solvers (Bai et al., 28 Jan 2026).

Screening rules employing dual polytope projection (DPP) and safe group-level tests efficiently discard inactive groups prior to solving, crucial for large-scale statistics and machine learning applications (Lee et al., 2014).

6. Applications and Empirical Performance

Overlapping group LASSO and its variants have demonstrated significant empirical benefit in high-dimensional biomedical data, neuroimaging, genomics, and signal processing:

In gene pathway selection, group-structured regularization accounts for biological overlap, yielding improved classifier accuracy and more parsimonious, interpretable pathway selection compared to marginal hypothesis tests (e.g., GSEA) or plain LASSO (Zeng et al., 2015).
In fMRI and MEG studies, overlapping spatial groups facilitate joint modeling across subjects or brain regions, with SOGlasso and SOSlasso outperforming non-overlapping methods in both predictive risk and anatomical interpretability (Rao et al., 2014, Rao et al., 2013).
In growth curve models (GMANOVA) and spline-based spectrum cartography, hierarchical and overlapping group LASSO penalties recover variable and degree structures reliably, outperforming non-hierarchical methods, especially at higher complexity (Ohishi et al., 22 Oct 2025, Bazerque et al., 2010).

Simulations consistently show lower coefficient RMSE, improved support recovery frequency, and classification error reductions on real and synthetic datasets. For instance, OGLasso yielded up to 10% lower misclassification and selected distinct gene pathways not identified by GSEA or ordinary lasso (Zeng et al., 2015). Separable approximations match statistical rates of classical overlapping LASSO but afford 5–20× faster computation (Qi et al., 2022).

7. Design Considerations, Limitations, and Future Directions

The choice of group structure and weights is critical for identifiability, support recovery, and finite-sample efficiency. Excessive overlap or poorly calibrated weights degrade statistical rates and selection consistency (Percival, 2011, Obozinski et al., 2011). The LOG/latent approach provides a principled route to encode complex selection rules, including hierarchical, hereditary, and mutually exclusive relationships among features, with guaranteed feasibility via group design (Wang et al., 2022).

Safe screening, adaptive weighting, dimension reduction, and acceleration techniques have evolved to address the computational bottlenecks of overlapping sparsity. Future research directions include scaling algorithms for ultra-high-dimensional settings, refining support recovery under complex combinatorial selection rules, and integrating domain-specific hierarchical regularization in emerging fields (e.g., multi-modal signal analysis, clinical coherence modeling).

Overlapping group LASSO presents a mathematically coherent, statistically solid, and practically scalable framework for structured feature selection—particularly when available prior knowledge is rich, groups are complexly interconnected, and model interpretability is central.