Rank-Based Zeroth-Order Optimization
- Rank-based zeroth-order optimization is a black-box method that relies solely on ordinal ranking information, bypassing the need for explicit function values or gradients.
- It adapts classical zeroth-order methods by replacing value-based queries with comparison feedback, achieving provable non-asymptotic query complexity and convergence guarantees.
- Practical applications include evolutionary strategies, preference-based reinforcement learning, and human-in-the-loop model refinement, demonstrating its versatility and robustness.
Rank-based zeroth-order optimization refers to a broad family of black-box optimization algorithms that rely solely on ordinal information—such as the ranking or pairwise comparisons of function values across a set of query points—rather than explicit objective function values or gradients. This paradigm has emerged as central in settings with human feedback, preference learning, robust evolutionary heuristics, and any scenario where function values are unavailable or unreliable but comparative judgments can be acquired efficiently. Theoretical understanding and practical methodology have matured rapidly in recent years, culminating in explicit, non-asymptotic query complexity bounds and principled frameworks for ranking-based adaptation of classical zeroth-order methods.
1. Definition, Scope, and Fundamental Distinctions
Rank-based zeroth-order optimization (ZOO) is defined by its exclusive use of information from a ranking oracle. In the classical setting of ZOO, the optimizer observes only for queried and cannot access gradients. In the fully rank-based regime, sometimes called comparison-based optimization (CBO), is never revealed; only the relative ordering of over pairs, batches, or subsets of query points is accessible via a comparison or ranking oracle. A canonical example is the pairwise comparison oracle or an -ranking oracle returning the indices of the lowest function values among candidates (Slavin et al., 2022, Tang et al., 2023).
This strict restriction to ordinal information is significant:
- Only rankings are available: neither function values nor differences.
- Any monotonic transformation of leaves the optimization invariant (monotone-invariance).
- These methods generalize evolutionary techniques such as CMA-ES and Natural Evolution Strategies and are directly applicable to preference-based reinforcement learning and human-in-the-loop scenarios.
2. Core Algorithms and Mechanisms
Rank-based ZOO algorithms generally proceed by querying batches of randomized perturbations of the current iterate, receiving rank/ordering feedback, and constructing descent directions or update rules in a manner agnostic to actual function values. A generic structure is as follows (Ye, 22 Dec 2025, Ye, 18 Dec 2025, Tang et al., 2023):
- Sample perturbations (typically from ).
- Query the (possibly stochastic) rank-oracle on to obtain a permutation or ranking.
- Select the top- and/or bottom- directions according to the ranking. Assign positive weights to descent directions (e.g., lowest quartile) and negative weights to ascent directions (e.g., highest quartile).
- Form a surrogate gradient and update .
Precise constructions—e.g., whether to use pairwise differences, quartile-based weighting, or full order statistics—vary but always remain within the comparison-only regime (Ye, 18 Dec 2025, Ye, 22 Dec 2025). In the case of the ZO-RankSGD algorithm (Tang et al., 2023), a general -ranking oracle and a smoothed estimator leveraging the induced directed acyclic graph of pairwise preferences are deployed, reducing to mirror-pairwise difference in .
A useful table summarizes typical rank-based ZOO iteration schemes:
| Step | Typical Operation | Oracle Calls per Iteration |
|---|---|---|
| Probe sampling | random directions | |
| Ranking | Query -oracle | (full sort) |
| Direction selection | Top-/bottom- weights | |
| Update | Weighted sum of probes | - |
The query complexity for the ranking is if a full sort is required; for pairwise comparisons, the cost is unless more efficient sorting is used.
3. Theoretical Guarantees: Query Complexity and Convergence
Recent advances have provided non-asymptotic and explicit query complexity bounds for rank-based ZOO under standard smoothness and convexity assumptions. The results are:
- Strongly convex, -smooth : To obtain with high probability, total queries are sufficient, matching classical value-based ZOO up to logarithmic factors (Ye, 18 Dec 2025).
- Smooth, nonconvex : To reach an -stationary point (), the required queries are (Ye, 18 Dec 2025). In stochastic settings with bounded gradient second moments, (Ye, 22 Dec 2025).
These results establish that ordinal/rank feedback is information-theoretically as powerful as function value access for zeroth-order optimization in these convex and nonconvex settings. The core technical breakthroughs involve the exploitation of order-statistics concentration, Chernoff-type bounds, and new descent lemmas that do not rely on classical drift or information-geometric arguments. In all scenarios, the complexity is linear in and matches value-based ZOO except for factors.
4. Adaptation of Classical ZOO to Rank-based Regime
A major methodological advance is the explicit framework for converting standard ZOO algorithms to comparison/rank-based forms, contingent upon a "comparison-only" property (Slavin et al., 2022). If the original ZOO algorithm:
- Samples probes per iteration,
- Utilizes only or the sorting of for decision making, then all occurrences of or sorting can be replaced by batch queries to a comparison/rank oracle.
Two universal primitives suffice:
- CompMin: Identifies the minimal element via pairwise comparisons.
- CompSort: Implements any comparison-based sorting algorithm (bubble sort, quicksort, tournament tree) via comparison oracles; costs or , respectively.
Converted algorithms (e.g., Stochastic Three-Point, CMA-ES variants) retain per-iteration theoretical rates up to the overhead of the requisite number of oracle queries for sorting/minimization. Hyperparameter regimes (step sizes, probe radii) are typically reused from the ZOO counterparts, with minor practical retuning for increased noise sensitivity in strictly ordinal environments (Slavin et al., 2022).
5. Empirical Performance and Applications
Extensive benchmarking has validated rank-based ZOO on standard unconstrained optimization problems (e.g., CUTEst suite) and in advanced human-in-the-loop scenarios:
- Function optimization: On synthetic and real test suites, converted rank-based algorithms (including CBO-STP, CBO-GLD, CBO-CMA-ES) show performance parity with or improvement over native comparison-based (SignOPT, SCOBO) methods. For instance, under a tight function reduction criterion on the CUTEst suite, approximately 65% of problems are solved by GLD, STP, and SignOPT using queries, with GLD displaying strongest robustness under a gradient-norm criterion (Slavin et al., 2022).
- Policy optimization in RL: ZO-RankSGD achieves convergence rates competitive with, and empirically outperforms, CMA-ES when only episode rankings are available. This is shown in MuJoCo benchmarks such as Reacher, Swimmer, and HalfCheetah (Tang et al., 2023).
- Human-guided model refinement: In diffusion model image generation, optimizing the latent embedding via iterative ranking of image quality by human annotators using a variant of ZO-RankSGD yields clear gains in visual detail and A/B test preference rates (often exceeding 70%) within modest rounds of feedback (Tang et al., 2023).
6. Advantages, Limitations, and Extensions
Advantages:
- Provable convergence and near-optimal query complexity under smoothness and strong convexity/nonconvexity (Ye, 22 Dec 2025, Ye, 18 Dec 2025).
- Invariance to monotonic reparameterizations of ; high robustness to noise and subjective distortions in feedback (Ye, 22 Dec 2025).
- Simple, modular adaptation of a wide array of classical ZOO algorithms to the strict rank-based information model (Slavin et al., 2022).
- Applicability in RLHF, human-in-the-loop, and preference learning scenarios where quantitative rewards are unavailable.
Limitations:
- Oracle query overhead is incurred especially for sorting-based schemes ( per iteration unless more efficient algorithms such as quicksort or tournament trees are used) (Slavin et al., 2022).
- Methods relying on finite-difference gradients or function interpolation (e.g., NEWUOA, Nesterov-Spokoiny) are not immediately convertible to the pure comparison-based regime (Slavin et al., 2022).
- Algorithm performance degrades under high comparison noise, especially in greedy update regimes (Slavin et al., 2022, Tang et al., 2023).
- In human-in-the-loop optimization, practical batch sizes are limited by annotator cognitive load (Tang et al., 2023).
Potential Extensions:
- Efficient (partial-)ranking and selection schemes to further reduce query burden.
- Hybrid frameworks that occasionally request true function values for local model building.
- Adaptive modeling of human or noisy oracle bias and variance.
- Integration with low-rank Hessian estimators and second-order ZOO, especially in high-dimensional settings with structure (see (Liu et al., 2024)).
7. Connections, Generalizations, and Open Directions
Rank-based ZOO unifies and generalizes numerous optimization heuristics employed in evolutionary computation (CMA-ES, NES, genetic algorithms) by providing explicit theoretical guarantees in the strictly ordinal setting. The newfound non-asymptotic rates underpin the empirical effectiveness of these longstanding algorithms and clarify the critical role of order-statistics and statistical concentration in steering optimization.
Open avenues include:
- Extending rank-based frameworks to non-Gaussian perturbations and reinforcement learning with complex, partially observed environments (Ye, 22 Dec 2025, Tang et al., 2023).
- Sharpening sample complexity and robustness bounds under heavy-tailed noise distributions.
- Practical design of human-in-the-loop interfaces where annotator reliability, fatigue, and subjective bias are adaptively modeled and mitigated.
- Synergies with matrix-recovery based ZO second-order methods for problems exhibiting low-rank curvature structure (Liu et al., 2024).
Papers by Slavin, Niles-Weed, Wu, Cai, Wang, and Xie (Slavin et al., 2022, Tang et al., 2023, Ye, 22 Dec 2025, Ye, 18 Dec 2025) have collectively established rank-based zeroth-order optimization as a theoretically principled and practically valuable approach for learning from strictly ordinal feedback.