Bregman Restricted Strong Convexity

Updated 22 November 2025

Bregman Restricted Strong Convexity is a framework that extends classical strong convexity by using Bregman divergence to measure curvature in sparse optimization settings.
It ensures local convexity along sparse directions, which facilitates uniqueness of critical points and guarantees for ℓ₀-Bregman relaxation recovery.
The approach adapts to non-quadratic losses, such as those in Poisson regression, by employing task-specific reference functions like smoothed Burg entropy.

Bregman Restricted Strong Convexity (B-RSC) is a generalization of the classical Restricted Strong Convexity (RSC) condition that plays a central role in modern nonconvex and sparse optimization, particularly in the analysis of $\ell_0$ -Bregman relaxations and variational methods involving non-Euclidean geometries. B-RSC quantifies the curvature of a function not in the standard $\ell_2$ metric but with respect to an adapted Bregman divergence, enabling effective analysis of problems where quadratic geometry is inadequate—such as generalized linear models or Kullback–Leibler data terms.

1. Formal Definition and Distinction from Classical RSC

Let $F$ , $\Phi : \mathbb{R}^n \to \mathbb{R} \cup \{+\infty\}$ be proper, convex, and differentiable on $\mathrm{int}(\mathrm{dom}\,F) \cap \mathrm{int}(\mathrm{dom}\,\Phi)$ . The Bregman divergence associated with $h$ is

$D_h(x,x') = h(x) - h(x') - \langle \nabla h(x'), x - x' \rangle,$

and its symmetric version is

$D_h^{\mathrm{symm}}(x,x') = D_h(x,x') + D_h(x', x) = \langle \nabla h(x) - \nabla h(x'), x - x' \rangle.$

Definition (Bregman Restricted Strong Convexity, B-RSC): $F$ satisfies the Bregman Restricted Strong Convexity property with respect to $\Phi$ over $X$ at sparsity level $K$ if there exists $C_K > 0$ so that for all $x \in X$ and all $x' \in \mathrm{int}(\mathrm{dom}\,F) \cap \mathrm{int}(\mathrm{dom}\,\Phi)$ with $x \ne x'$ , $\|x-x'\|_0 \le K$ ,

$D_F^{\mathrm{symm}}(x, x') \ge C_K D_\Phi^{\mathrm{symm}}(x, x').$

If $\Phi(x) = \frac{1}{2}\|x\|_2^2$ , $D_\Phi^{\mathrm{symm}}(x, x') = \|x - x'\|_2^2$ , and B-RSC reduces to classical RSC:

$D_F^{\mathrm{symm}}(x, x') \ge \mu_K \|x - x'\|_2^2.$

In practice, B-RSC allows for non-quadratic, task-adapted geometries (e.g., via the Burg entropy for KL divergence) (Chirinos-Rodriguez et al., 15 Nov 2025).

2. Properties and Structural Consequences

Two key properties of B-RSC are monotonicity and restriction to sparse subspaces:

Monotonicity: If $F$ satisfies B-RSC $(\Phi, X, K)$ with constant $C_K$ , then for any $X' \subset X$ , $K' \leq K$ , B-RSC $(\Phi, X', K')$ also holds with $C_{K'} \geq C_K$ .
Subspace Restriction: For any support $\omega$ $ω$ with $|\omega| \leq K$ $∣ ω ∣ \leq K$ , define $F_\omega(u) = F(Z_\omega u)$ $F_{ω} (u) = F (Z_{ω} u)$ (where $Z_\omega$ $Z_{ω}$ zero-pads). Then:
- $h(u) = F_\omega(u) - C_K \Phi_\omega(u)$ is convex on $X_\omega$ ,
- $F_\omega(u) \geq F_\omega(u') + \langle \nabla F_\omega(u'), u - u' \rangle + C_K D_{\Phi_\omega}(u, u')$ (Chirinos-Rodriguez et al., 15 Nov 2025).

B-RSC thus ensures local strong convexity in $K$ -sparse directions relative to the geometry defined by $\Phi$ . This generalizes the classical link between RSC and the lower Restricted Isometry Property (RIP) for quadratic data terms.

3. Motivation: Where Classical RSC Fails and B-RSC Succeeds

Classical RSC may be vacuous for non-quadratic losses. For instance, in Poisson regression with $G_y(w) = \sum_j g_{y_j}^{\mathrm{KL}}(w_j)$ , $g_{y}^{\mathrm{KL}}(z) = z + y \log(y / z) - y$ , the global RSC constant $\mu_K = 0$ since $g_{y}^{\mathrm{KL}}$ is not uniformly strongly convex as $z \to \infty$ . In particular, $G_y(A\cdot)$ fails RSC with $\Phi = (1/2)\|\cdot\|^2$ .

B-RSC resolves this by matching the geometry: choosing $\Phi(x) = \sum_i -\log(x_i + \eta_i)$ (smoothed Burg entropy) renders the symmetric Bregman divergence nontrivial and positivity is restored under suitable lower-RIP conditions on $A$ , yielding $C_K > 0$ . This allows effective landscape and recovery analysis even for Poisson/KL losses where classical RSC provides no information (Chirinos-Rodriguez et al., 15 Nov 2025).

4. Impact on Sparse Critical Points and Oracle Recovery

Within the $\ell_0$ -Bregman relaxation paradigm, B-RSC allows for rigorous statements regarding isolation and uniqueness of critical points.

Isolation Theorem: Assume $F = G_y(A\cdot)$ satisfies B-RSC $(\Psi, X, K)$ with $C_K > 0$ . If $x \in X$ is a critical point of $J_\Psi$ such that $|z_i|$ avoids specific bands tied to the subdifferential structure, for all $i$ , then any other critical point $x' \neq x$ has $\|x' - x\|_0 > K$ . In particular, if $\|x\|_0 \le K/2$ , $x$ is the unique sparsest critical point.
Uniqueness of Global Minimizer: Under the same B-RSC assumptions, if the sparsity penalty parameter $\lambda_0$ exceeds $F(x) / (1 + K - 2\|x\|_0)$ , then $x$ is the unique global minimizer of $J_\Psi$ (also of the original $\ell_0$ -penalized $J_0$ ), and all other critical points are at support distance greater than $K$ (Chirinos-Rodriguez et al., 15 Nov 2025).

B-RSC provides a mechanism that links a form of localized strong convexity to sparse recovery guarantees, encompassing both uniqueness and stability.

5. Oracle Solution Guarantees and “Safe Region” Design

With B-RSC $(\Psi, X, K)$ ( $K \geq 2k^*$ ), where $k^*$ is the sparsity of the true underlying vector $x^*$ , one can guarantee that the oracle solution (the restricted minimizer over the true support) is the unique global minimizer if:

There exists a “safe region” $S \subset \mathbb{R}^{k^*}$ containing $x^*$ and the oracle minimizer, which avoids a “forbidden band” set $\Omega$ in the coordinate directions;
For all $i \notin S^*$ , certain subgradient bounds involving the measurement matrix columns and Bregman reference function hold;
The parameter $\lambda_0$ satisfies $\lambda_0 > F(x^*) / (1 + K - 2k^*)$ .

In practice, $S$ is formed so that all candidate minimizers $u$ stay strictly away from $\Omega$ , with the size of $S$ determined by B-RSC: $\sum_{j=1}^{k^*} d_{\psi_{i_j}}(u_j^*, u_j) \leq F(x^*) / C_K$ , where $d_{\psi_{i_j}}$ denotes the Bregman divergence along each active coordinate (Chirinos-Rodriguez et al., 15 Nov 2025).

6. Specialization: Quadratic and Poisson Data Terms

Least-Squares (LS) Setting

Fidelity: $G_y(w) = \frac{1}{2} \|w - y\|_2^2$ , has Lipschitz gradient $L = 1$ and is 1-strongly convex.
Reference Function: $\Psi(x) = (\gamma/2)\|x\|_2^2$ , typically $\gamma = 1$ .
B-RSC: For $\Phi = \Psi$ , B-RSC reduces to classical RSC, $C_K = \mu_K = (1 - \delta_K^-)/\gamma$ .
Safe Region: $S = \{u: \|u - u^*\|_2 \le \sqrt{2F(x^*)/(\gamma C_K)}\}$ .
Oracle Recovery: One obtains explicit intervals for $\lambda_0$ that often improve prior bounds (Chirinos-Rodriguez et al., 15 Nov 2025).

Poisson (KL) Data Setting

Fidelity: $G_y(w) = \sum_j g^{{\rm KL}}_{y_j}(w_j)$ , $w_j > 0$ .
Reference Function: $\psi_i(x) = \gamma_i g^{{\rm KL}}_\xi(c_i x + \xi)$ , with tunable parameters.
B-RSC: From Theorem 2.4 in (Chirinos-Rodriguez et al., 15 Nov 2025), a positive $C_K$ is achieved whenever $A$ satisfies a lower-RIP and $X$ is compact.
Safe Region: $S = \{u: \sum_j \gamma_{i_j} g^{{\rm KL}}_1\left((c_{i_j}u_j^*+\xi)/(c_{i_j}u_j+\xi)\right) \le F(x^*)/(\xi C_K)\}$ .
Oracle Recovery: Tight lower/upper $\lambda_0$ bounds are derived, which are feasible in the high SNR regime, yielding nontrivial recovery guarantees unavailable in the RSC framework.

7. Relationship to Relative Bregman Strong Convexity and Algorithmic Implications

Related notions of strong convexity relative to Bregman divergences arise in optimization algorithm analysis. In the context of variance-reduced stochastic primal-dual splitting methods, strong convexity of the form

$f(x) \ge f(y) + \langle p, x - y \rangle + a D_\phi(x, y)$

for a Legendre reference function $\phi$ (and similarly for $g^*$ ) ensures linear-primal dual gap contraction. The contraction rate depends explicitly on the Bregman relative strong convexity constant $a$ (Dung et al., 2021).

This formalism matches B-RSC in spirit when the restriction to sparse directions is taken, and when $\phi$ and $\psi$ are chosen to match the problem geometry. Notably, Bregman-based strong convexity is critical for linear convergence guarantees even in non-Euclidean settings, as shown by contraction constants and telescoping descent arguments in accelerated algorithms.

In summary, Bregman Restricted Strong Convexity extends strong convexity notions to non-Euclidean geometries and restricted (sparse) directions, enabling advanced landscape analysis and recovery theorems for nonconvex sparse optimization, particularly $\ell_0$ -Bregman relaxations under both quadratic and generalized linear models. It provides a unified framework for local convexity, isolation of minimizers, and explicit parameter regimes for exact oracle recovery beyond classical RSC applicability (Chirinos-Rodriguez et al., 15 Nov 2025, Dung et al., 2021).

Markdown Report Issue Upgrade to Chat

References (2)

Optimization landscape of $\ell_0$-Bregman relaxations (2025)

A Stochastic Variance Reduction Algorithm with Bregman Distances for Structured Composite Problems (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bregman Restricted Strong Convexity.