Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bregman Restricted Strong Convexity

Updated 22 November 2025
  • Bregman Restricted Strong Convexity is a framework that extends classical strong convexity by using Bregman divergence to measure curvature in sparse optimization settings.
  • It ensures local convexity along sparse directions, which facilitates uniqueness of critical points and guarantees for ℓ₀-Bregman relaxation recovery.
  • The approach adapts to non-quadratic losses, such as those in Poisson regression, by employing task-specific reference functions like smoothed Burg entropy.

Bregman Restricted Strong Convexity (B-RSC) is a generalization of the classical Restricted Strong Convexity (RSC) condition that plays a central role in modern nonconvex and sparse optimization, particularly in the analysis of 0\ell_0-Bregman relaxations and variational methods involving non-Euclidean geometries. B-RSC quantifies the curvature of a function not in the standard 2\ell_2 metric but with respect to an adapted Bregman divergence, enabling effective analysis of problems where quadratic geometry is inadequate—such as generalized linear models or Kullback–Leibler data terms.

1. Formal Definition and Distinction from Classical RSC

Let FF, Φ:RnR{+}\Phi : \mathbb{R}^n \to \mathbb{R} \cup \{+\infty\} be proper, convex, and differentiable on int(domF)int(domΦ)\mathrm{int}(\mathrm{dom}\,F) \cap \mathrm{int}(\mathrm{dom}\,\Phi). The Bregman divergence associated with hh is

Dh(x,x)=h(x)h(x)h(x),xx,D_h(x,x') = h(x) - h(x') - \langle \nabla h(x'), x - x' \rangle,

and its symmetric version is

Dhsymm(x,x)=Dh(x,x)+Dh(x,x)=h(x)h(x),xx.D_h^{\mathrm{symm}}(x,x') = D_h(x,x') + D_h(x', x) = \langle \nabla h(x) - \nabla h(x'), x - x' \rangle.

Definition (Bregman Restricted Strong Convexity, B-RSC): FF satisfies the Bregman Restricted Strong Convexity property with respect to Φ\Phi over XX at sparsity level KK if there exists CK>0C_K > 0 so that for all xXx \in X and all xint(domF)int(domΦ)x' \in \mathrm{int}(\mathrm{dom}\,F) \cap \mathrm{int}(\mathrm{dom}\,\Phi) with xxx \ne x', xx0K\|x-x'\|_0 \le K,

DFsymm(x,x)CKDΦsymm(x,x).D_F^{\mathrm{symm}}(x, x') \ge C_K D_\Phi^{\mathrm{symm}}(x, x').

If Φ(x)=12x22\Phi(x) = \frac{1}{2}\|x\|_2^2, DΦsymm(x,x)=xx22D_\Phi^{\mathrm{symm}}(x, x') = \|x - x'\|_2^2, and B-RSC reduces to classical RSC:

DFsymm(x,x)μKxx22.D_F^{\mathrm{symm}}(x, x') \ge \mu_K \|x - x'\|_2^2.

In practice, B-RSC allows for non-quadratic, task-adapted geometries (e.g., via the Burg entropy for KL divergence) (Chirinos-Rodriguez et al., 15 Nov 2025).

2. Properties and Structural Consequences

Two key properties of B-RSC are monotonicity and restriction to sparse subspaces:

  • Monotonicity: If FF satisfies B-RSC(Φ,X,K)(\Phi, X, K) with constant CKC_K, then for any XXX' \subset X, KKK' \leq K, B-RSC(Φ,X,K)(\Phi, X', K') also holds with CKCKC_{K'} \geq C_K.
  • Subspace Restriction: For any support ω\omega with ωK|\omega| \leq K, define Fω(u)=F(Zωu)F_\omega(u) = F(Z_\omega u) (where ZωZ_\omega zero-pads). Then:
    • h(u)=Fω(u)CKΦω(u)h(u) = F_\omega(u) - C_K \Phi_\omega(u) is convex on XωX_\omega,
    • Fω(u)Fω(u)+Fω(u),uu+CKDΦω(u,u)F_\omega(u) \geq F_\omega(u') + \langle \nabla F_\omega(u'), u - u' \rangle + C_K D_{\Phi_\omega}(u, u') (Chirinos-Rodriguez et al., 15 Nov 2025).

B-RSC thus ensures local strong convexity in KK-sparse directions relative to the geometry defined by Φ\Phi. This generalizes the classical link between RSC and the lower Restricted Isometry Property (RIP) for quadratic data terms.

3. Motivation: Where Classical RSC Fails and B-RSC Succeeds

Classical RSC may be vacuous for non-quadratic losses. For instance, in Poisson regression with Gy(w)=jgyjKL(wj)G_y(w) = \sum_j g_{y_j}^{\mathrm{KL}}(w_j), gyKL(z)=z+ylog(y/z)yg_{y}^{\mathrm{KL}}(z) = z + y \log(y / z) - y, the global RSC constant μK=0\mu_K = 0 since gyKLg_{y}^{\mathrm{KL}} is not uniformly strongly convex as zz \to \infty. In particular, Gy(A)G_y(A\cdot) fails RSC with Φ=(1/2)2\Phi = (1/2)\|\cdot\|^2.

B-RSC resolves this by matching the geometry: choosing Φ(x)=ilog(xi+ηi)\Phi(x) = \sum_i -\log(x_i + \eta_i) (smoothed Burg entropy) renders the symmetric Bregman divergence nontrivial and positivity is restored under suitable lower-RIP conditions on AA, yielding CK>0C_K > 0. This allows effective landscape and recovery analysis even for Poisson/KL losses where classical RSC provides no information (Chirinos-Rodriguez et al., 15 Nov 2025).

4. Impact on Sparse Critical Points and Oracle Recovery

Within the 0\ell_0-Bregman relaxation paradigm, B-RSC allows for rigorous statements regarding isolation and uniqueness of critical points.

  • Isolation Theorem: Assume F=Gy(A)F = G_y(A\cdot) satisfies B-RSC(Ψ,X,K)(\Psi, X, K) with CK>0C_K > 0. If xXx \in X is a critical point of JΨJ_\Psi such that zi|z_i| avoids specific bands tied to the subdifferential structure, for all ii, then any other critical point xxx' \neq x has xx0>K\|x' - x\|_0 > K. In particular, if x0K/2\|x\|_0 \le K/2, xx is the unique sparsest critical point.
  • Uniqueness of Global Minimizer: Under the same B-RSC assumptions, if the sparsity penalty parameter λ0\lambda_0 exceeds F(x)/(1+K2x0)F(x) / (1 + K - 2\|x\|_0), then xx is the unique global minimizer of JΨJ_\Psi (also of the original 0\ell_0-penalized J0J_0), and all other critical points are at support distance greater than KK (Chirinos-Rodriguez et al., 15 Nov 2025).

B-RSC provides a mechanism that links a form of localized strong convexity to sparse recovery guarantees, encompassing both uniqueness and stability.

5. Oracle Solution Guarantees and “Safe Region” Design

With B-RSC(Ψ,X,K)(\Psi, X, K) (K2kK \geq 2k^*), where kk^* is the sparsity of the true underlying vector xx^*, one can guarantee that the oracle solution (the restricted minimizer over the true support) is the unique global minimizer if:

  • There exists a “safe region” SRkS \subset \mathbb{R}^{k^*} containing xx^* and the oracle minimizer, which avoids a “forbidden band” set Ω\Omega in the coordinate directions;
  • For all iSi \notin S^*, certain subgradient bounds involving the measurement matrix columns and Bregman reference function hold;
  • The parameter λ0\lambda_0 satisfies λ0>F(x)/(1+K2k)\lambda_0 > F(x^*) / (1 + K - 2k^*).

In practice, SS is formed so that all candidate minimizers uu stay strictly away from Ω\Omega, with the size of SS determined by B-RSC: j=1kdψij(uj,uj)F(x)/CK\sum_{j=1}^{k^*} d_{\psi_{i_j}}(u_j^*, u_j) \leq F(x^*) / C_K, where dψijd_{\psi_{i_j}} denotes the Bregman divergence along each active coordinate (Chirinos-Rodriguez et al., 15 Nov 2025).

6. Specialization: Quadratic and Poisson Data Terms

Least-Squares (LS) Setting

  • Fidelity: Gy(w)=12wy22G_y(w) = \frac{1}{2} \|w - y\|_2^2, has Lipschitz gradient L=1L = 1 and is 1-strongly convex.
  • Reference Function: Ψ(x)=(γ/2)x22\Psi(x) = (\gamma/2)\|x\|_2^2, typically γ=1\gamma = 1.
  • B-RSC: For Φ=Ψ\Phi = \Psi, B-RSC reduces to classical RSC, CK=μK=(1δK)/γC_K = \mu_K = (1 - \delta_K^-)/\gamma.
  • Safe Region: S={u:uu22F(x)/(γCK)}S = \{u: \|u - u^*\|_2 \le \sqrt{2F(x^*)/(\gamma C_K)}\}.
  • Oracle Recovery: One obtains explicit intervals for λ0\lambda_0 that often improve prior bounds (Chirinos-Rodriguez et al., 15 Nov 2025).

Poisson (KL) Data Setting

  • Fidelity: Gy(w)=jgyjKL(wj)G_y(w) = \sum_j g^{{\rm KL}}_{y_j}(w_j), wj>0w_j > 0.
  • Reference Function: ψi(x)=γigξKL(cix+ξ)\psi_i(x) = \gamma_i g^{{\rm KL}}_\xi(c_i x + \xi), with tunable parameters.
  • B-RSC: From Theorem 2.4 in (Chirinos-Rodriguez et al., 15 Nov 2025), a positive CKC_K is achieved whenever AA satisfies a lower-RIP and XX is compact.
  • Safe Region: S={u:jγijg1KL((cijuj+ξ)/(cijuj+ξ))F(x)/(ξCK)}S = \{u: \sum_j \gamma_{i_j} g^{{\rm KL}}_1\left((c_{i_j}u_j^*+\xi)/(c_{i_j}u_j+\xi)\right) \le F(x^*)/(\xi C_K)\}.
  • Oracle Recovery: Tight lower/upper λ0\lambda_0 bounds are derived, which are feasible in the high SNR regime, yielding nontrivial recovery guarantees unavailable in the RSC framework.

7. Relationship to Relative Bregman Strong Convexity and Algorithmic Implications

Related notions of strong convexity relative to Bregman divergences arise in optimization algorithm analysis. In the context of variance-reduced stochastic primal-dual splitting methods, strong convexity of the form

f(x)f(y)+p,xy+aDϕ(x,y)f(x) \ge f(y) + \langle p, x - y \rangle + a D_\phi(x, y)

for a Legendre reference function ϕ\phi (and similarly for gg^*) ensures linear-primal dual gap contraction. The contraction rate depends explicitly on the Bregman relative strong convexity constant aa (Dung et al., 2021).

This formalism matches B-RSC in spirit when the restriction to sparse directions is taken, and when ϕ\phi and ψ\psi are chosen to match the problem geometry. Notably, Bregman-based strong convexity is critical for linear convergence guarantees even in non-Euclidean settings, as shown by contraction constants and telescoping descent arguments in accelerated algorithms.


In summary, Bregman Restricted Strong Convexity extends strong convexity notions to non-Euclidean geometries and restricted (sparse) directions, enabling advanced landscape analysis and recovery theorems for nonconvex sparse optimization, particularly 0\ell_0-Bregman relaxations under both quadratic and generalized linear models. It provides a unified framework for local convexity, isolation of minimizers, and explicit parameter regimes for exact oracle recovery beyond classical RSC applicability (Chirinos-Rodriguez et al., 15 Nov 2025, Dung et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bregman Restricted Strong Convexity.