Quantized Evolution Strategies (QES)
- Quantized Evolution Strategies (QES) are derivative‐free optimization methods designed for discrete or quantized parameter spaces using error accumulation and adaptive multilevel search.
- They enable applications such as direct fine-tuning of quantized large language models and quantum combinatorial optimization with reduced memory and computational overhead.
- Empirical results show QES can achieve 2–3× accuracy improvements in LLM tuning and near-optimal yields in quantum control while maintaining practical runtime performance.
Quantized Evolution Strategies (QES) comprise a family of derivative-free optimization methods specifically designed for discrete or quantized parameter spaces, where standard gradient-based methods are either undefined or practically ineffective. QES unifies several innovations to enable efficient, large-scale black-box optimization in high-dimensional discrete domains. They are critical for modern applications such as direct fine-tuning of post-training quantized LLMs, multi-resolution control in quantum systems, and combinatorial optimization with quantum-inspired natural gradient methods. QES is characterized by: (1) operating directly on quantized parameters, (2) maintaining effective search dynamics on discrete grids via error-accumulation mechanisms or multilevel resolution, and (3) offering practical runtime and memory characteristics suitable for resource-constrained settings (Shir et al., 2020, Xu et al., 3 Feb 2026, Zhao et al., 2020).
1. Motivation and Core Challenges
Optimization on quantized or discrete spaces arises when either parameters are physically constrained (e.g., integer weights in LLMs, pixelated control fields for quantum experiments), or design choices (e.g., post-training quantization) preclude the use of continuous relaxations. In these regimes, backpropagation and gradient-based fine-tuning are often infeasible due to:
- Non-differentiability: The quantization operator is piecewise constant, yielding gradients that are zero almost everywhere. The straight-through estimator (STE) is only an approximation and fails to capture true discrete dynamics.
- Memory Limitations: Quantized hardware prohibits storage of high-precision activations and optimizer state. Full-precision optimizers are incompatible with on-device training or fine-tuning of large models.
- Stagnation and Discretization Error: Small zeroth-order (ZO) steps tend to round to zero, leading to stagnation, and rounding noise can overwhelm gradient signals over time (Xu et al., 3 Feb 2026).
QES methods address these issues through discrete-aware gradient estimation, error accumulation, and level-wise adaptive search.
2. Algorithmic Frameworks
Quantized Evolution Strategies can be instantiated in several domains, each tailored to the geometry of the quantized parameter space and the target application.
2.1 Multi-Level Quantized ES for Black-Box Control
For problems defined on uniform quantization grids , the multi-level mechanism introduces resolution levels, refining the quantization step at each level . Candidate solutions are quantized at each level via a coordinate-wise operator:
A stack of ES runs at successive levels, with promotion from level to triggered by exhaustion of evaluation budget , achieving a target , or detecting stagnation over a window (i.e., ).
The algorithm adapts strategy parameters (center , step-size , covariance or its diagonal) across levels, interpolates and rescales as resolution increases, and uses sphere-model theory to adjust for new dimensions. Detailed update routines track CMA-ES dynamics or single-step “elitist” updates depending on instantiation (Shir et al., 2020).
2.2 QES for Direct Fine-tuning of Quantized LLMs
Given integer-quantized weights , QES estimates ZO gradients by forward evaluations:
where are discrete Gaussian perturbations produced by stochastic (Bernoulli) rounding, followed by integer gating to ensure legal quantized weights. To overcome stagnation, QES accumulates a high-precision error buffer updated via: \begin{align*} u_t &= \alpha \hat{g}t + \gamma e{t-1} \ \Delta W_t &= \mathrm{Round}(u_t) \ e_t &= u_t - \Delta W_t \ W_{t+1} &= W_t + \Delta W_t, \end{align*} with learning rate and decay . High-precision updates accumulate until sufficiently large to trigger a quantized step, mimicking -modulation and high-precision convergence in a discrete system (Xu et al., 3 Feb 2026).
2.3 Quantum Natural Evolution Strategies
In quantum combinatorial optimization, a parameterized quantum state is used to define a search distribution . The natural gradient is computed in the Fubini–Study metric, involving the quantum Fisher matrix . The parameter update is:
driven by samples and cost observables , where (Zhao et al., 2020). This generalizes classical NES to the quantum regime and is effectively a quantized ES in expressive, non-commutative domains.
3. Mathematical Formulation and Pseudocode
The specific mathematical schemes governing QES vary by domain. Representative forms include:
Zeroth-Order ES Gradient for Discrete Spaces
For quantized forward perturbations: \begin{align*} \delta_i &= \lfloor \sigma \epsilon_i \rfloor + b_i,\quad b_{ij} \sim \mathrm{Bernoulli}({\sigma \epsilon_{ij}}), \ \hat{g}t &= \frac{1}{N\sigma} \sum{i=1}N F_i \cdot \delta_i \end{align*} where is reward, is Gaussian noise.
Error-Feedback Accumulation Update (QES for LLMs)
\begin{align*} u_t &= \alpha \hat{g}t + \gamma e{t-1} \ \Delta W_t &= \mathrm{Round}(u_t) \ e_t &= u_t - \Delta W_t \ W_{t+1} &= W_{t} + \Delta W_t \end{align*}
Multi-level QES Promotion (Black-box control)
Upon promotion from resolution to : where is the number of quantized bins at level .
Pseudocode
For QES in quantum combinatorial optimization (Zhao et al., 2020):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
initialize θ = θ₀ for t = 1 ... T: # 1. Sampling Draw M samples {x^(s)} from p_θ(x) = |ψ(x;θ)|^2 # 2. Local statistics For each x^(s): compute E_loc^(s) = f[x^(s)]; for i: O_i^(s) = ∂_{θ_i} log ψ(x^(s);θ) # 3. Estimate moments Compute averages ⟨E⟩, ⟨O_i⟩, ⟨OE_i⟩, ⟨O_iO_j⟩ # 4. Build gradient and Fisher matrix g_i = ⟨OE_i⟩ - ⟨O_i⟩·⟨E⟩ F_ij = ⟨O_iO_j⟩ - ⟨O_i⟩·⟨O_j⟩ # 5. Regularization Solve (F + λ·I) δθ = g # 6. Update θ ← θ - η · δθ |
4. Memory, Computational Complexity, and Runtime
QES techniques are engineered to minimize memory and computational overhead, especially in large-scale or device-constrained deployments.
- Memory: For LLM fine-tuning in INT4, QES stores only for quantized weights and for seed and scalar fitness history (with ), compared to or more for FP32 weights, optimizer state, and activations required by QAT or LoRA (Xu et al., 3 Feb 2026).
- Computation: QES requires forward passes per population per step. Seed-replay adds replay gradient calculations per update; total overhead is typically 20–50% of generation cost but is parallelizable.
- Complexity in Quantum/NES: Each iteration requires samples, statistics calculation, and for Fisher inversion (GPU-accelerated or conjugate-gradient for large ) (Zhao et al., 2020).
- Multi-Level QES: By starting at coarse discretization, achieving rapid convergence on low-dimension grids, and only refining resolution when warranted, the multilevel mechanism yields empirical reduction in function calls to target yield versus direct high-resolution search (Shir et al., 2020).
5. Empirical Performance and Evaluation
Quantized Evolution Strategies have demonstrated clear empirical advantages in multiple domains:
LLM Fine-tuning
On TinyZero arithmetic reasoning, QES achieves 2--3× the accuracy of state-of-the-art ZO fine-tuning (QuZO). For $1.5$B-INT4, QES increases accuracy from 5.25% (QuZO) to 16.00%, and for $3$B-INT8, from 15.85% (QuZO) to 37.40%. The method closely approaches the "oracle" that stores the full error buffer, demonstrating effectiveness of the stateless seed replay mechanism (Xu et al., 3 Feb 2026).
Quantum Control
In quantum control (maximizing two-photon absorption, rotational transfer, or molecular alignment), multi-level QES achieves yields up to the hardware resolution limits while minimizing the number of function calls (Shir et al., 2020). Starting from control discretizations, the method efficiently scales to , transferring adaptive parameters between levels for rapid fine-grained optimization.
Quantum–Combinatorial Optimization
Quantum natural QES, with complex-valued RBM variational ansatz, outperforms classical heuristics on Max-Cut, achieving of the upper-bound SDP value, with larger batch-of-samples conferring higher approximation ratio at increased computational cost (Zhao et al., 2020).
6. Practical Considerations, Limitations, and Future Directions
QES provides the first systematic route to full-parameter, backpropagation-free training and fine-tuning on quantized, high-dimensional systems at inference-level memory usage (Xu et al., 3 Feb 2026). However:
- Supported Quantization: The present QES implementations assume linear symmetric integer (INT4/INT8) or W8A8 quantization; future extensions to non-uniform, mixed-precision, binary, or floating-point (e.g., FP4) formats remain open (Xu et al., 3 Feb 2026).
- Parameter Sensitivity: The performance critically depends on replay horizon and decay ; adaptive schemes might better manage compute–memory–convergence trade-offs.
- Solution Trajectory Fidelity: The error-accumulation paradigm ensures discrete weights deviate by at most $1/2$ quantization unit from an idealized high-precision path.
- Algorithmic Extensions: QES can leverage batch-size ablation, natural gradient/Fisher-aware updates, and multilevel strategies for faster or more robust convergence.
- Computational Burden: In quantum-inspired combinatorial optimization, QES can require significantly longer runtime than SDP or other heuristics for large , although solution quality can be superior (Zhao et al., 2020).
QES methods continue to develop as an essential tool for high-dimensional discrete optimization, scalable quantized LLM deployment, fine-resolution scientific control, and quantum-inspired variational algorithms. Empirical and theoretical foundations support further expansion to broader architectures, quantization schemes, and hardware contexts.