Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sharpness-Aware Black-Box Optimization (SABBO)

Updated 25 December 2025
  • SABBO is a family of methods that applies sharpness-aware minimization to black-box optimization, ensuring solutions reside in flat neighborhoods that generalize well.
  • The approach approximates sharpness by maximizing loss within a local perturbation ball using Monte Carlo estimators and adaptive search distributions.
  • Empirical studies demonstrate SABBO’s efficiency in hyperparameter tuning and prompt optimization, outperforming traditional methods in robustness and sample efficiency.

Sharpness-Aware Black-Box Optimization (SABBO) refers to a family of methods that merge sharpness-aware minimization (SAM) principles—originating in continuous, gradient-based learning—with black-box optimization paradigms where gradient information is unavailable. The canonical aim is to identify parameters, configurations, or discrete artifacts that not only minimize a black-box objective but do so robustly: optimal decisions are those located in “flat” neighborhoods with respect to the objective landscape, avoiding fragile, high-sharpness minima that generalize poorly or are unstable under small perturbations. SABBO instantiations span continuous, discrete, and semantic domains, and have been empirically validated in settings such as hyperparameter tuning for topic models, prompt optimization for LLMs, and prompt fine-tuning in API-restricted machine learning environments (Akramov et al., 18 Dec 2025, Ye et al., 2024, Wan et al., 28 Sep 2025).

1. Mathematical Foundations of SABBO

The SABBO methodology extends SAM to settings where only function values (not gradients) are accessible, typically through a black-box oracle. The abstract goal is to solve

minxX f(x)\min_{x\in\mathcal{X}}~f(x)

where ff is only observable pointwise and may be noisy or stochastic. The solution sought is robust with respect to sharpness: specifically, SABBO algorithms minimize a local max-loss rather than simply the observed loss, ensuring solutions generalize beyond narrow basins of attraction.

Formally, the sharpness-aware objective takes the form

Φ(x)=maxδρf(x+δ)\Phi(x) = \max_{\|\delta\|\leq\rho} f(x+\delta)

where ρ>0\rho > 0 is a user-specified radius controlling the sharpness neighborhood, and the update aims directly at argminxΦ(x)\arg\min_x \Phi(x) rather than argminxf(x)\arg\min_x f(x). In black-box settings, this is approximated using a parameterized search distribution (typically Gaussian) pθ(x)=N(xμ,Σ)p_\theta(x) = \mathcal{N}(x\mid\mu,\Sigma), yielding an objective

minθmaxδρExpθ+δ[f(x)]\min_{\theta} \max_{\|\delta\|\leq\rho} \mathbb{E}_{x \sim p_{\theta+\delta}}[f(x)]

which admits efficient Monte Carlo estimators for practical implementation (Akramov et al., 18 Dec 2025, Ye et al., 2024).

2. Algorithmic Implementation and Core Workflow

A canonical SABBO algorithmic step proceeds as follows (see (Akramov et al., 18 Dec 2025)):

  1. Maintain a search distribution pθ(x)=N(xμt,Σt)p_\theta(x) = \mathcal{N}(x\mid\mu_t, \Sigma_t); initialize with broad coverage of the feasible set.
  2. At each iteration:
    • Sampling: Draw KK candidates {xt,k}\{x_{t,k}\} from the current distribution.
    • Evaluation: Query f(xt,k)f(x_{t,k}) for each sample.
    • Sharpness Estimation: Identify the sample x~t\tilde{x}_t within a radius ρ\rho of μt\mu_t maximizing f(xt,k)f(x_{t,k}).
    • Gradient Approximation: Estimate μf(x~t)\nabla_\mu f(\tilde{x}_t) via stochastic, black-box-compatible estimators (finite differences or pathwise).
    • Parameter Update: Update μt+1μtημf(x~t)\mu_{t+1} \leftarrow \mu_t - \eta\nabla_\mu f(\tilde{x}_t). A similar update may be made for Σt\Sigma_t.
  3. Output the best candidate found or the final mean μG+1\mu_{G+1} after GG iterations.

For instantiations with discrete output oracles (e.g., evaluating integer topic numbers TT for LDA), candidates are rounded and clamped to the integer domain before evaluation (Akramov et al., 18 Dec 2025). In discrete semantic settings (e.g., text prompts), the notion of neighborhood and perturbation is operationalized via semantic distances, with adversarial sampling and selection replacing gradient-based steps (Wan et al., 28 Sep 2025).

3. Theoretical Properties and Guarantees

SABBO and its variants possess convergence and generalization guarantees paralleling their continuous-domain progenitors. Under standard regularity assumptions (convexity, smoothness of the reparameterized expected objective, boundedness of the sampling process), SABBO converges in expectation to a neighborhood of a sharpness-aware stationary point at rates

O ⁣(1G)O\!\left(\frac{1}{\sqrt{G}}\right)

with GG the number of iterations/queries (zeroth-order rate) (Akramov et al., 18 Dec 2025);

For settings using KL-divergence neighborhoods and full-batch queries, the convergence rate strengthens to

O(logTT)O\left(\frac{\log T}{T}\right)

for TT iterations (Ye et al., 2024). A PAC-Bayes analysis shows that the sharpness-aware expected empirical loss upper-bounds the true population loss, reflecting robustness to overfitting and improved generalization (Ye et al., 2024).

4. Applications in Hyperparameter Search and Discrete Optimization

SABBO has been applied to discrete black-box optimization problems, notably in the automated selection of the number of topics TT in Latent Dirichlet Allocation (LDA), where f(T)f(T) is held-out validation perplexity. The SABBO procedure samples candidate TT values (from a Gaussian, rounded to the nearest integer), evaluates their perplexity after running LDA, and targets robust, flat regions via the maximum-in-ball update scheme.

Empirical studies on real-world corpora (20NEWS, AGNEWS, YELP, VAL_OUT) show that SABBO identifies near-optimal topic numbers after essentially a single evaluation, far outpacing evolutionary baselines (GA, ES) and previous neural black-box optimizers in both sample efficiency and time to convergence (Akramov et al., 18 Dec 2025). The sharpness-aware penalty ensures selected TT values are robust against evaluation noise and avoid sharp, overfit minima, reflected in lower variance and improved generalization (see table below):

Dataset GA ES PABBO SABBO
20NEWS 1776 ± 186 2057 ± 173 1810 ± 104 1679 ± 25
AGNEWS 2155 ± 24 3800 ± 360 2185 ± 40 2151 ± 22
VAL_OUT 1653 ± 197 2449 ± 201 1566 ± 27 1558 ± 31
YELP 1379 ± 105 1823 ± 55 1357 ± 32 1351 ± 24

In discrete text and prompt spaces, SABBO variants (e.g., TARE, ATARE) define sharpness with respect to semantic neighborhoods and employ a two-stage process: inner adversarial sampling over paraphrases, followed by robust outer candidate selection. This design substantially improves prompt robustness to paraphrasing and semantic drift, outperforming accuracy-only optimization (Wan et al., 28 Sep 2025).

5. Relationship to Broader Sharpness-Aware and Black-Box Optimization Methods

SABBO generalizes the concept of Sharpness-Aware Minimization (SAM), which was originally devised for differentiable objectives and weights in deep networks, to black-box domains where only function queries are available. By maintaining a search distribution and constructing sharpness penalties either in Euclidean space (continuous) or over semantic neighborhoods (discrete, text), SABBO bridges evolution strategies, black-box search, and robust optimization. In contrast to pointwise optimizers, SABBO explicitly controls for loss sensitivity to local perturbations, aligning solutions with broader, flatter optima and improved generalization.

Empirical ablations confirm that omitting the sharpness-aware step (e.g., using INGO or vanilla ES) systematically worsens downstream generalization and robustness (Ye et al., 2024). Methods such as TARE/ATARE further extend SABBO’s blueprint into non-differentiable, discrete, and semantically-structured spaces, incorporating anisotropy and adaptive neighborhood scaling for greater robustness (Wan et al., 28 Sep 2025).

6. Hyperparameter Choices and Practical Considerations

Critical SABBO hyperparameters include:

  • KK (samples per iteration): Balances gradient estimator variance and function evaluation cost. Typical choices are K=10K=10 for scalar/discrete search, K=50K=50–100 for high dimensions (Akramov et al., 18 Dec 2025, Ye et al., 2024).
  • η\eta (learning rate): Generally decreased as ηt1/t\eta_t \propto 1/t.
  • ρ\rho (sharpness radius): Must be chosen small enough (0.05\approx 0.05) to probe local curvature, but not so small as to negate the sharpness penalty.
  • Initialization: The mean μ1\mu_1 is set at the midpoint of the feasible domain; covariance initialized large to ensure exploration.
  • Gradient Estimator: Either score-function (finite-difference) or pathwise estimators, coupled with Monte Carlo estimation, are used as appropriate.

Ablation studies indicate that the value of ρ\rho, step-size η\eta, and population size KK all impact performance, but the inclusion of the sharpness-aware inner maximization is the principal determinant of generalization and sample efficiency (Ye et al., 2024).

7. Extensions: Textual and Semantic Sharpness-Aware Optimization

In domains where the optimization variables are discrete and structured, such as prompt engineering for LLMs, the sharpness-aware criterion is instantiated via semantic neighborhoods. In TARE/ATARE (Wan et al., 28 Sep 2025), the inner maximization samples paraphrases within a semantic distance ρtext\rho_{\text{text}}, and the outer optimization selects candidates that minimize the worst-case loss among these paraphrases. ATARE extends this by learning anisotropic per-component weights determining the sensitivity of prompt components, yielding ellipsoidal instead of isotropic neighborhoods, and adaptively adjusting the sampling radius.

Experiments demonstrate that minimizing textual sharpness gap produces prompts whose performance is invariant to paraphrasing, robust to adversarial edits, and consistently superior to vanilla prompt search strategies. TARE and ATARE attain average accuracy improvements of 2–7% over pointwise baselines, with further gains (1–3%) from anisotropy and adaptivity (Wan et al., 28 Sep 2025).


SABBO thus constitutes a principled, general blueprint for robust black-box optimization, incorporating both theoretical generalization bounds and concrete empirical gains across continuous, discrete, and semantic settings (Akramov et al., 18 Dec 2025, Ye et al., 2024, Wan et al., 28 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (3)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sharpness-Aware Black-Box Optimization (SABBO).