Sharpness-Aware Black-Box Optimization (SABBO)
- SABBO is a family of methods that applies sharpness-aware minimization to black-box optimization, ensuring solutions reside in flat neighborhoods that generalize well.
- The approach approximates sharpness by maximizing loss within a local perturbation ball using Monte Carlo estimators and adaptive search distributions.
- Empirical studies demonstrate SABBO’s efficiency in hyperparameter tuning and prompt optimization, outperforming traditional methods in robustness and sample efficiency.
Sharpness-Aware Black-Box Optimization (SABBO) refers to a family of methods that merge sharpness-aware minimization (SAM) principles—originating in continuous, gradient-based learning—with black-box optimization paradigms where gradient information is unavailable. The canonical aim is to identify parameters, configurations, or discrete artifacts that not only minimize a black-box objective but do so robustly: optimal decisions are those located in “flat” neighborhoods with respect to the objective landscape, avoiding fragile, high-sharpness minima that generalize poorly or are unstable under small perturbations. SABBO instantiations span continuous, discrete, and semantic domains, and have been empirically validated in settings such as hyperparameter tuning for topic models, prompt optimization for LLMs, and prompt fine-tuning in API-restricted machine learning environments (Akramov et al., 18 Dec 2025, Ye et al., 2024, Wan et al., 28 Sep 2025).
1. Mathematical Foundations of SABBO
The SABBO methodology extends SAM to settings where only function values (not gradients) are accessible, typically through a black-box oracle. The abstract goal is to solve
where is only observable pointwise and may be noisy or stochastic. The solution sought is robust with respect to sharpness: specifically, SABBO algorithms minimize a local max-loss rather than simply the observed loss, ensuring solutions generalize beyond narrow basins of attraction.
Formally, the sharpness-aware objective takes the form
where is a user-specified radius controlling the sharpness neighborhood, and the update aims directly at rather than . In black-box settings, this is approximated using a parameterized search distribution (typically Gaussian) , yielding an objective
which admits efficient Monte Carlo estimators for practical implementation (Akramov et al., 18 Dec 2025, Ye et al., 2024).
2. Algorithmic Implementation and Core Workflow
A canonical SABBO algorithmic step proceeds as follows (see (Akramov et al., 18 Dec 2025)):
- Maintain a search distribution ; initialize with broad coverage of the feasible set.
- At each iteration:
- Sampling: Draw candidates from the current distribution.
- Evaluation: Query for each sample.
- Sharpness Estimation: Identify the sample within a radius of maximizing .
- Gradient Approximation: Estimate via stochastic, black-box-compatible estimators (finite differences or pathwise).
- Parameter Update: Update . A similar update may be made for .
- Output the best candidate found or the final mean after iterations.
For instantiations with discrete output oracles (e.g., evaluating integer topic numbers for LDA), candidates are rounded and clamped to the integer domain before evaluation (Akramov et al., 18 Dec 2025). In discrete semantic settings (e.g., text prompts), the notion of neighborhood and perturbation is operationalized via semantic distances, with adversarial sampling and selection replacing gradient-based steps (Wan et al., 28 Sep 2025).
3. Theoretical Properties and Guarantees
SABBO and its variants possess convergence and generalization guarantees paralleling their continuous-domain progenitors. Under standard regularity assumptions (convexity, smoothness of the reparameterized expected objective, boundedness of the sampling process), SABBO converges in expectation to a neighborhood of a sharpness-aware stationary point at rates
with the number of iterations/queries (zeroth-order rate) (Akramov et al., 18 Dec 2025);
For settings using KL-divergence neighborhoods and full-batch queries, the convergence rate strengthens to
for iterations (Ye et al., 2024). A PAC-Bayes analysis shows that the sharpness-aware expected empirical loss upper-bounds the true population loss, reflecting robustness to overfitting and improved generalization (Ye et al., 2024).
4. Applications in Hyperparameter Search and Discrete Optimization
SABBO has been applied to discrete black-box optimization problems, notably in the automated selection of the number of topics in Latent Dirichlet Allocation (LDA), where is held-out validation perplexity. The SABBO procedure samples candidate values (from a Gaussian, rounded to the nearest integer), evaluates their perplexity after running LDA, and targets robust, flat regions via the maximum-in-ball update scheme.
Empirical studies on real-world corpora (20NEWS, AGNEWS, YELP, VAL_OUT) show that SABBO identifies near-optimal topic numbers after essentially a single evaluation, far outpacing evolutionary baselines (GA, ES) and previous neural black-box optimizers in both sample efficiency and time to convergence (Akramov et al., 18 Dec 2025). The sharpness-aware penalty ensures selected values are robust against evaluation noise and avoid sharp, overfit minima, reflected in lower variance and improved generalization (see table below):
| Dataset | GA | ES | PABBO | SABBO |
|---|---|---|---|---|
| 20NEWS | 1776 ± 186 | 2057 ± 173 | 1810 ± 104 | 1679 ± 25 |
| AGNEWS | 2155 ± 24 | 3800 ± 360 | 2185 ± 40 | 2151 ± 22 |
| VAL_OUT | 1653 ± 197 | 2449 ± 201 | 1566 ± 27 | 1558 ± 31 |
| YELP | 1379 ± 105 | 1823 ± 55 | 1357 ± 32 | 1351 ± 24 |
In discrete text and prompt spaces, SABBO variants (e.g., TARE, ATARE) define sharpness with respect to semantic neighborhoods and employ a two-stage process: inner adversarial sampling over paraphrases, followed by robust outer candidate selection. This design substantially improves prompt robustness to paraphrasing and semantic drift, outperforming accuracy-only optimization (Wan et al., 28 Sep 2025).
5. Relationship to Broader Sharpness-Aware and Black-Box Optimization Methods
SABBO generalizes the concept of Sharpness-Aware Minimization (SAM), which was originally devised for differentiable objectives and weights in deep networks, to black-box domains where only function queries are available. By maintaining a search distribution and constructing sharpness penalties either in Euclidean space (continuous) or over semantic neighborhoods (discrete, text), SABBO bridges evolution strategies, black-box search, and robust optimization. In contrast to pointwise optimizers, SABBO explicitly controls for loss sensitivity to local perturbations, aligning solutions with broader, flatter optima and improved generalization.
Empirical ablations confirm that omitting the sharpness-aware step (e.g., using INGO or vanilla ES) systematically worsens downstream generalization and robustness (Ye et al., 2024). Methods such as TARE/ATARE further extend SABBO’s blueprint into non-differentiable, discrete, and semantically-structured spaces, incorporating anisotropy and adaptive neighborhood scaling for greater robustness (Wan et al., 28 Sep 2025).
6. Hyperparameter Choices and Practical Considerations
Critical SABBO hyperparameters include:
- (samples per iteration): Balances gradient estimator variance and function evaluation cost. Typical choices are for scalar/discrete search, –100 for high dimensions (Akramov et al., 18 Dec 2025, Ye et al., 2024).
- (learning rate): Generally decreased as .
- (sharpness radius): Must be chosen small enough () to probe local curvature, but not so small as to negate the sharpness penalty.
- Initialization: The mean is set at the midpoint of the feasible domain; covariance initialized large to ensure exploration.
- Gradient Estimator: Either score-function (finite-difference) or pathwise estimators, coupled with Monte Carlo estimation, are used as appropriate.
Ablation studies indicate that the value of , step-size , and population size all impact performance, but the inclusion of the sharpness-aware inner maximization is the principal determinant of generalization and sample efficiency (Ye et al., 2024).
7. Extensions: Textual and Semantic Sharpness-Aware Optimization
In domains where the optimization variables are discrete and structured, such as prompt engineering for LLMs, the sharpness-aware criterion is instantiated via semantic neighborhoods. In TARE/ATARE (Wan et al., 28 Sep 2025), the inner maximization samples paraphrases within a semantic distance , and the outer optimization selects candidates that minimize the worst-case loss among these paraphrases. ATARE extends this by learning anisotropic per-component weights determining the sensitivity of prompt components, yielding ellipsoidal instead of isotropic neighborhoods, and adaptively adjusting the sampling radius.
Experiments demonstrate that minimizing textual sharpness gap produces prompts whose performance is invariant to paraphrasing, robust to adversarial edits, and consistently superior to vanilla prompt search strategies. TARE and ATARE attain average accuracy improvements of 2–7% over pointwise baselines, with further gains (1–3%) from anisotropy and adaptivity (Wan et al., 28 Sep 2025).
SABBO thus constitutes a principled, general blueprint for robust black-box optimization, incorporating both theoretical generalization bounds and concrete empirical gains across continuous, discrete, and semantic settings (Akramov et al., 18 Dec 2025, Ye et al., 2024, Wan et al., 28 Sep 2025).