Quantized Evolution Strategies (QES)

Updated 10 February 2026

Quantized Evolution Strategies (QES) are derivative‐free optimization methods designed for discrete or quantized parameter spaces using error accumulation and adaptive multilevel search.
They enable applications such as direct fine-tuning of quantized large language models and quantum combinatorial optimization with reduced memory and computational overhead.
Empirical results show QES can achieve 2–3× accuracy improvements in LLM tuning and near-optimal yields in quantum control while maintaining practical runtime performance.

Quantized Evolution Strategies (QES) comprise a family of derivative-free optimization methods specifically designed for discrete or quantized parameter spaces, where standard gradient-based methods are either undefined or practically ineffective. QES unifies several innovations to enable efficient, large-scale black-box optimization in high-dimensional discrete domains. They are critical for modern applications such as direct fine-tuning of post-training quantized LLMs, multi-resolution control in quantum systems, and combinatorial optimization with quantum-inspired natural gradient methods. QES is characterized by: (1) operating directly on quantized parameters, (2) maintaining effective search dynamics on discrete grids via error-accumulation mechanisms or multilevel resolution, and (3) offering practical runtime and memory characteristics suitable for resource-constrained settings (Shir et al., 2020, Xu et al., 3 Feb 2026, Zhao et al., 2020).

1. Motivation and Core Challenges

Optimization on quantized or discrete spaces arises when either parameters are physically constrained (e.g., integer weights in LLMs, pixelated control fields for quantum experiments), or design choices (e.g., post-training quantization) preclude the use of continuous relaxations. In these regimes, backpropagation and gradient-based fine-tuning are often infeasible due to:

Non-differentiability: The quantization operator is piecewise constant, yielding gradients that are zero almost everywhere. The straight-through estimator (STE) is only an approximation and fails to capture true discrete dynamics.
Memory Limitations: Quantized hardware prohibits storage of high-precision activations and optimizer state. Full-precision optimizers are incompatible with on-device training or fine-tuning of large models.
Stagnation and Discretization Error: Small zeroth-order (ZO) steps tend to round to zero, leading to stagnation, and rounding noise can overwhelm gradient signals over time (Xu et al., 3 Feb 2026).

QES methods address these issues through discrete-aware gradient estimation, error accumulation, and level-wise adaptive search.

2. Algorithmic Frameworks

Quantized Evolution Strategies can be instantiated in several domains, each tailored to the geometry of the quantized parameter space and the target application.

2.1 Multi-Level Quantized ES for Black-Box Control

For problems defined on uniform quantization grids $x \in \{0, \Delta_0, 2\Delta_0, ..., M\Delta_0\}^n$ , the multi-level mechanism introduces $L+1$ resolution levels, refining the quantization step $\Delta_\ell = \Delta_0 / 2^\ell$ at each level $\ell$ . Candidate solutions are quantized at each level via a coordinate-wise operator:

$(Q_\ell(x))_i = \Delta_\ell \cdot \mathrm{round}(x_i/\Delta_\ell).$

A stack of ES runs at successive levels, with promotion from level $\ell$ to $\ell+1$ triggered by exhaustion of evaluation budget $B_\ell$ , achieving a target $f(x^*)\leq\epsilon$ , or detecting stagnation over a window $w_\ell$ (i.e., $L+1$ 0).

The algorithm adapts strategy parameters (center $L+1$ 1, step-size $L+1$ 2, covariance $L+1$ 3 or its diagonal) across levels, interpolates and rescales as resolution increases, and uses sphere-model theory to adjust $L+1$ 4 for new dimensions. Detailed update routines track CMA-ES dynamics or single-step “elitist” updates depending on instantiation (Shir et al., 2020).

2.2 QES for Direct Fine-tuning of Quantized LLMs

Given integer-quantized weights $L+1$ 5, QES estimates ZO gradients by forward evaluations:

$L+1$ 6

where $L+1$ 7 are discrete Gaussian perturbations produced by stochastic (Bernoulli) rounding, followed by integer gating to ensure legal quantized weights. To overcome stagnation, QES accumulates a high-precision error buffer $L+1$ 8 updated via: \begin{align*} u_t &= \alpha \hat{g}t + \gamma e{t-1} \ \Delta W_t &= \mathrm{Round}(u_t) \ e_t &= u_t - \Delta W_t \ W_{t+1} &= W_t + \Delta W_t, \end{align*} with learning rate $L+1$ 9 and decay $\Delta_\ell = \Delta_0 / 2^\ell$ 0. High-precision updates accumulate until sufficiently large to trigger a quantized step, mimicking $\Delta_\ell = \Delta_0 / 2^\ell$ 1-modulation and high-precision convergence in a discrete system (Xu et al., 3 Feb 2026).

2.3 Quantum Natural Evolution Strategies

In quantum combinatorial optimization, a parameterized quantum state $\Delta_\ell = \Delta_0 / 2^\ell$ 2 is used to define a search distribution $\Delta_\ell = \Delta_0 / 2^\ell$ 3. The natural gradient is computed in the Fubini–Study metric, involving the quantum Fisher matrix $\Delta_\ell = \Delta_0 / 2^\ell$ 4. The parameter update is:

$\Delta_\ell = \Delta_0 / 2^\ell$ 5

driven by samples $\Delta_\ell = \Delta_0 / 2^\ell$ 6 and cost observables $\Delta_\ell = \Delta_0 / 2^\ell$ 7, where $\Delta_\ell = \Delta_0 / 2^\ell$ 8 (Zhao et al., 2020). This generalizes classical NES to the quantum regime and is effectively a quantized ES in expressive, non-commutative domains.

3. Mathematical Formulation and Pseudocode

The specific mathematical schemes governing QES vary by domain. Representative forms include:

Zeroth-Order ES Gradient for Discrete Spaces

For quantized forward perturbations: \begin{align*} \delta_i &= \lfloor \sigma \epsilon_i \rfloor + b_i,\quad b_{ij} \sim \mathrm{Bernoulli}({\sigma \epsilon_{ij}}), \ \hat{g}t &= \frac{1}{N\sigma} \sum{i=1}^N F_i \cdot \delta_i \end{align*} where $\Delta_\ell = \Delta_0 / 2^\ell$ 9 is reward, $\ell$ 0 is Gaussian noise.

Error-Feedback Accumulation Update (QES for LLMs)

\begin{align*} u_t &= \alpha \hat{g}t + \gamma e{t-1} \ \Delta W_t &= \mathrm{Round}(u_t) \ e_t &= u_t - \Delta W_t \ W_{t+1} &= W_{t} + \Delta W_t \end{align*}

Multi-level QES Promotion (Black-box control)

Upon promotion from resolution $\ell$ 1 to $\ell$ 2: $\ell$ 3 where $\ell$ 4 is the number of quantized bins at level $\ell$ 5.

Pseudocode

For QES in quantum combinatorial optimization (Zhao et al., 2020):

$\ell$ 7

4. Memory, Computational Complexity, and Runtime

QES techniques are engineered to minimize memory and computational overhead, especially in large-scale or device-constrained deployments.

Memory: For LLM fine-tuning in INT4, QES stores only $\ell$ 6 for quantized weights and $\ell$ 7 for seed and scalar fitness history (with $\ell$ 8), compared to $\ell$ 9 or more for FP32 weights, optimizer state, and activations required by QAT or LoRA (Xu et al., 3 Feb 2026).
Computation: QES requires $(Q_\ell(x))_i = \Delta_\ell \cdot \mathrm{round}(x_i/\Delta_\ell).$ 0 forward passes per population per step. Seed-replay adds $(Q_\ell(x))_i = \Delta_\ell \cdot \mathrm{round}(x_i/\Delta_\ell).$ 1 replay gradient calculations per update; total overhead is typically 20–50% of generation cost but is parallelizable.
Complexity in Quantum/NES: Each iteration requires $(Q_\ell(x))_i = \Delta_\ell \cdot \mathrm{round}(x_i/\Delta_\ell).$ 2 samples, $(Q_\ell(x))_i = \Delta_\ell \cdot \mathrm{round}(x_i/\Delta_\ell).$ 3 statistics calculation, and $(Q_\ell(x))_i = \Delta_\ell \cdot \mathrm{round}(x_i/\Delta_\ell).$ 4 for Fisher inversion (GPU-accelerated or conjugate-gradient for large $(Q_\ell(x))_i = \Delta_\ell \cdot \mathrm{round}(x_i/\Delta_\ell).$ 5) (Zhao et al., 2020).
Multi-Level QES: By starting at coarse discretization, achieving rapid convergence on low-dimension grids, and only refining resolution when warranted, the multilevel mechanism yields empirical $(Q_\ell(x))_i = \Delta_\ell \cdot \mathrm{round}(x_i/\Delta_\ell).$ 6 reduction in function calls to target yield versus direct high-resolution search (Shir et al., 2020).

5. Empirical Performance and Evaluation

Quantized Evolution Strategies have demonstrated clear empirical advantages in multiple domains:

LLM Fine-tuning

On TinyZero arithmetic reasoning, QES achieves 2--3× the accuracy of state-of-the-art ZO fine-tuning (QuZO). For $(Q_\ell(x))_i = \Delta_\ell \cdot \mathrm{round}(x_i/\Delta_\ell).$ 7B-INT4, QES increases accuracy from 5.25% (QuZO) to 16.00%, and for $(Q_\ell(x))_i = \Delta_\ell \cdot \mathrm{round}(x_i/\Delta_\ell).$ 8B-INT8, from 15.85% (QuZO) to 37.40%. The method closely approaches the "oracle" that stores the full error buffer, demonstrating effectiveness of the stateless seed replay mechanism (Xu et al., 3 Feb 2026).

Quantum Control

In quantum control (maximizing two-photon absorption, rotational transfer, or molecular alignment), multi-level QES achieves yields up to the hardware resolution limits while minimizing the number of function calls (Shir et al., 2020). Starting from $(Q_\ell(x))_i = \Delta_\ell \cdot \mathrm{round}(x_i/\Delta_\ell).$ 9 control discretizations, the method efficiently scales to $\ell$ 0, transferring adaptive parameters between levels for rapid fine-grained optimization.

Quantum–Combinatorial Optimization

Quantum natural QES, with complex-valued RBM variational ansatz, outperforms classical heuristics on Max-Cut, achieving $\ell$ 1 of the upper-bound SDP value, with larger batch-of-samples $\ell$ 2 conferring higher approximation ratio at increased computational cost (Zhao et al., 2020).

6. Practical Considerations, Limitations, and Future Directions

QES provides the first systematic route to full-parameter, backpropagation-free training and fine-tuning on quantized, high-dimensional systems at inference-level memory usage (Xu et al., 3 Feb 2026). However:

Supported Quantization: The present QES implementations assume linear symmetric integer (INT4/INT8) or W8A8 quantization; future extensions to non-uniform, mixed-precision, binary, or floating-point (e.g., FP4) formats remain open (Xu et al., 3 Feb 2026).
Parameter Sensitivity: The performance critically depends on replay horizon $\ell$ 3 and decay $\ell$ 4; adaptive schemes might better manage compute–memory–convergence trade-offs.
Solution Trajectory Fidelity: The error-accumulation paradigm ensures discrete weights deviate by at most $\ell$ 5 quantization unit from an idealized high-precision path.
Algorithmic Extensions: QES can leverage batch-size ablation, natural gradient/Fisher-aware updates, and multilevel strategies for faster or more robust convergence.
Computational Burden: In quantum-inspired combinatorial optimization, QES can require significantly longer runtime than SDP or other heuristics for large $\ell$ 6, although solution quality can be superior (Zhao et al., 2020).

QES methods continue to develop as an essential tool for high-dimensional discrete optimization, scalable quantized LLM deployment, fine-resolution scientific control, and quantum-inspired variational algorithms. Empirical and theoretical foundations support further expansion to broader architectures, quantization schemes, and hardware contexts.

Markdown Report Issue Upgrade to Chat

References (3)

Multi-Level Evolution Strategies for High-Resolution Black-Box Control (2020)

Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost (2026)

Natural evolution strategies and variational Monte Carlo (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Quantized Evolution Strategies (QES).