Papers
Topics
Authors
Recent
Search
2000 character limit reached

Soft QD Using Approximated Diversity (SQUAD)

Updated 7 December 2025
  • The paper introduces SQUAD—a gradient-based, continuous quality-diversity optimization method that leverages a differentiable lower bound surrogate to maximize aggregate illumination over behavior space.
  • It employs kernel-based interactions and pairwise repulsion to balance quality rewards with diversity constraints, effectively scaling to high-dimensional and large-population problems.
  • Empirical evaluations on tasks such as LP, IC, and LSI demonstrate that SQUAD outperforms several state-of-the-art QD benchmarks with superior metrics like QVS and QD-Score.

Soft QD Using Approximated Diversity (SQUAD) is a differentiable, population-based optimization algorithm that reframes Quality-Diversity (QD) as continuous attraction-repulsion in behavior space. SQUAD circumvents the need for explicit discretization of the behavior space, scaling efficiently to high dimensions and large populations while preserving or outperforming state-of-the-art QD benchmarks. The approach formalizes QD objectives as maximization of aggregate "illumination" from a set of solutions over an abstract behavior space, using kernel-based interactions and a tractable differentiable approximation amenable to gradient-based optimization (Hedayatian et al., 30 Nov 2025).

1. Soft QD Objective: Definition and Intuition

Let Θ={θ1,...,θN}\Theta = \{\theta_1, ..., \theta_N\} denote a population of parameter vectors, where f(θ)R+f(\theta) \in \mathbb{R}_+ is a differentiable quality (objective) function and desc(θ)BRd\mathrm{desc}(\theta) \in \mathcal{B} \subseteq \mathbb{R}^d a differentiable behavior descriptor. With fn=f(θn)f_n = f(\theta_n) and bn=desc(θn)b_n = \mathrm{desc}(\theta_n), each solution is treated as an isotropic Gaussian "light source" in behavior space, its "brightness" fnf_n decaying by bandwidth σ>0\sigma > 0.

The induced behavior-value field is

vΘ(b)=max1nNfnexp(bbn22σ2),v_{\Theta}(b) = \max_{1 \leq n \leq N} f_n \exp\left(-\frac{\|b - b_n\|^2}{2 \sigma^2}\right),

and the Soft QD Score is defined as the total illumination:

S(Θ)=bRdvΘ(b)db.S(\Theta) = \int_{b \in \mathbb{R}^d} v_{\Theta}(b) \, db.

Direct optimization of S(Θ)S(\Theta) is intractable, so SQUAD proceeds via a tractable lower bound. By applying inclusion-exclusion, truncating at pairwise terms, and bounding min(x,y)\min(x, y) by xy\sqrt{xy}, one obtains

S~(Θ)=n=1Nfn1i<jNfifjexp(bibj2γ2),\widetilde{S}(\Theta) = \sum_{n=1}^N f_n - \sum_{1 \leq i < j \leq N} \sqrt{f_i f_j} \exp \left( -\frac{ \| b_i - b_j \|^2 }{ \gamma^2 } \right),

where γ2=8σ2\gamma^2 = 8\sigma^2.

  • The sum of fnf_n rewards high-quality solutions.
  • The pairwise repulsion term, exponentially decaying with behavioral distance and weighted by fifj\sqrt{f_i f_j}, enforces diversity.

2. Derivation, Differentiability, and Limit Properties

The lower bound S~(Θ)\widetilde{S}(\Theta) derives from the inclusion-exclusion form:

maxngn(b)db=igidbi<jmin(gi,gj)db+\int \max_n g_n(b) db = \sum_i \int g_i db - \sum_{i<j} \int \min(g_i, g_j) db + \ldots

For gn(b)=fnexp(bbn2/(2σ2))g_n(b) = f_n \exp(-\|b - b_n\|^2 / (2\sigma^2)), the min\min term is tightly upper bounded by the geometric mean, which admits Gaussian integral solutions.

The resulting objective, dropping constant factors and using γ2=8σ2\gamma^2=8\sigma^2:

S~(Θ)=n=1Nfn1i<jNfifjexp(bibj2γ2)\widetilde{S}(\Theta) = \sum_{n=1}^N f_n - \sum_{1 \leq i < j \leq N} \sqrt{f_i f_j} \exp \left( -\frac{ \| b_i - b_j \|^2 }{ \gamma^2 } \right)

If ff and desc\mathrm{desc} are differentiable, so is S~(Θ)\widetilde{S}(\Theta). The per-solution gradient is: S~θn=fnθn12jn[fjfnebnbj2γ2fnθn+fnfjebnbj2γ22γ2(bnbj)Tbnθn]\frac{\partial \widetilde{S}}{\partial \theta_n} = \frac{\partial f_n}{\partial \theta_n} - \frac{1}{2} \sum_{j \neq n} \left[ \sqrt{ \frac{f_j}{f_n} } e^{- \frac{ \| b_n - b_j \|^2 }{ \gamma^2 } } \frac{ \partial f_n }{\partial \theta_n } + \sqrt{ f_n f_j } e^{- \frac{ \| b_n - b_j \|^2 }{ \gamma^2 } } \frac{ -2 }{ \gamma^2 } (b_n - b_j)^T \frac{ \partial b_n }{ \partial \theta_n } \right]

Appendix proofs confirm key theoretical properties:

  • S(Θ)S(\Theta) is nondecreasing under addition of new solutions or increase in fnf_n.
  • S(Θ)S(\Theta) is submodular; marginal gains diminish as the population grows.
  • In the limit σ0\sigma \to 0 (γ20\gamma^2 \to 0), S(Θ)/(2πσ2)d/2nfnS(\Theta)/(2\pi\sigma^2)^{d/2} \to \sum_n f_n, i.e., soft QD recovers the canonical QD-Score on a fine grid.

3. SQUAD Algorithm and Optimization

The SQUAD algorithm optimizes the lower bound S~(Θ)\widetilde{S}(\Theta) using mini-batch stochastic gradient ascent. The procedure is as follows:

  • Inputs: population size NN, batch size MM, neighbor count KK, iterations TT, diversity bandwidth γ2\gamma^2, optimizer OO (e.g., Adam with learning rate η\eta).
  • Initialization: sample Θ={θ1,...,θN}\Theta = \{\theta_1, ..., \theta_N\}; compute qualities F=(fi)F = (f_i) and behaviors B=(bi)B = (b_i); initialize optimizer state SS.
  • Loop: For t=1,...,Tt = 1, ..., T:

    • Choose batch I{1,,N}I \subset \{1,\ldots,N\} of size MM.
    • For each iIi \in I:
    • Identify KK nearest neighbors NiN_i in behavior space.
    • Compute

    S~I=iIfi12iI,jNififjexp(bibj2/γ2)\widetilde{S}_I = \sum_{i \in I} f_i - \frac{1}{2} \sum_{i \in I, j \in N_i} \sqrt{f_i f_j} \exp(-\|b_i-b_j\|^2/\gamma^2 ) - Update {θi:iI}\{\theta_i: i\in I\} using θIS~I\nabla_{\theta_I} \widetilde{S}_I via OO; re-evaluate (FI,BI)(F_I,B_I).

  • Termination: Return Θ\Theta at final iteration.

Key hyperparameters: N,M,K,γ2,ηN, M, K, \gamma^2, \eta, and the Gaussian kernel width σ\sigma if computing S(Θ)S(\Theta) directly.

4. Theoretical Properties and Scalability

SQUAD inherits the following properties:

  • Monotonicity: S(Θ)S(\Theta) is nondecreasing as new solutions are added or fnf_n is increased.
  • Submodularity: Diminishing returns property enables approximate optimality under cardinality constraints.
  • Limiting Behavior: For γ0\gamma \to 0, SQUAD converges to standard QD-Score maximization over a grid.
  • Curse-of-Dimensionality Avoidance: Does not require discretization or archives, and uses continuous, kernel-based repulsion, resulting in memory requirements invariant to behavior-space dimension dd.
  • Approximation Error: Contributions from neglected higher-order overlaps in the inclusion-exclusion expansion are bounded and decay as behavioral coverage increases.

These properties collectively underpin SQUAD’s ability to scale to high-dimensional behavior spaces and large solution populations.

5. Empirical Evaluation

SQUAD's performance was benchmarked on diverse tasks and compared to existing methods including CMA-MEGA, CMA-MAEGA, Sep-CMA-MAE, GA-ME, DNS, and DNS-G.

Tasks and Metrics

  • LP: Linear-Projection Rastrigin, solution dim n=1024n=1024, behavior dims d=4,8,16d=4,8,16.
  • IC: Image Composition (1024 circles, 5-d behavior).
  • LSI: Latent-Space Illumination via StyleGAN2+CLIP, d=2d=2 and d=7d=7.

Metrics included QD-Score (sum of best-in-cell over CVT), coverage (number of occupied CVT cells), Vendi Score (VS, effective number of clusters), QVS (mean-quality × VS), mean and max objective.

Outcome Summary

  • LP: At d=4d=4, CMA-MAEGA/CMA-MEGA slightly outperform SQUAD, but SQUAD surpasses all baselines in both QVS and QD-Score at d=8d=8 and d=16d=16 (for example, at d=16d=16 QVS: SQUAD 6.6\approx 6.6, CMA-MAEGA 4.6\approx 4.6, CMA-MEGA 3.8\approx 3.8). Gradient-based methods, including SQUAD, outperform mutation-only approaches at higher dimensionalities.
  • IC: SQUAD achieves highest mean objective (83.4±0.0283.4\pm0.02), highest max objective (93.6±0.1093.6\pm0.10), and best VS (5.49±0.005.49\pm0.00). Coverage is slightly below CMA-MAEGA (5.68 vs 5.85), but VS more accurately reflects true diversity.
  • Quality-Diversity Trade-Off: Varying γ2\gamma^2 in [103,50][10^{-3}, 50] shows a tunable trade-off—higher diversity (VS) for larger γ2\gamma^2 at the expense of mean objective.
  • LSI: In base (d=2d=2), SQUAD QD-Score 13.4×103\approx 13.4 \times 10^3, QVS 177\approx177, surpassing CMA-MEGA (8.7×103/1408.7\times10^3/140), CMA-MAEGA (6.8×103/1226.8\times10^3/122). In hard (d=7d=7), SQUAD (2.55×103/1512.55\times10^3/151) vs best baseline (0.39×103/990.39\times10^3/99). Other methods often experience failure (negative mean objectives).

6. Implementation and Practical Considerations

Default Hyperparameters

  • IC/LP: N=1024N=1024, M=64M=64, K=16K=16, η=0.05\eta=0.05, γ2\gamma^2 tuned to domain (IC: γ2=1\gamma^2=1; LP: easy/medium/hard γ2=0.1/0.5/1.0\gamma^2=0.1/0.5/1.0).
  • LSI: N=256N=256, M=8M=8, K=16K=16, η=0.1\eta=0.1, γ2=0.010.1\gamma^2=0.01 \ldots 0.1.

Algorithmic Details

  • If B=[0,1]d\mathcal{B} = [0,1]^d, behaviors should be mapped via $\logit(b)$ to Rd\mathbb{R}^d; ablation indicates this is critical.
  • Each batch update computes O(MK)O(MK) pairwise and O(M)O(M) quality terms, with overall iteration cost O(NK)O(NK).
  • Mini-batching and limited nearest neighbors optimize memory efficiency.

Implementation Tips

  • Automatic differentiation frameworks (JAX, PyTorch, etc.) are recommended for both objective and descriptor.
  • Precompute and cache k-NN structures in behavior space.
  • Annealing γ2\gamma^2 or adapting KK to local density can improve performance.
  • Monitor VS, coverage, and mean objective during optimization; early stopping is often effective (e.g., \sim200 iterations for IC/LSI).
  • Ensure f(θ)f(\theta) is nonnegative for meaningful QVS evaluation.

Computational Cost

  • Simple tasks (LP) complete in under 1 minute.
  • IC requires \sim190 minutes for 1000 iterations (RTX 4090), but high performance is typically attained in fewer than 200 iterations.
  • LSI (base/hard): \sim730/1300 minutes, with convergence in substantially less than the full budget.

7. Significance in Quality-Diversity Optimization

SQUAD provides an alternative to archive-based QD: it offers a smooth, differentiable objective and adaptively balances quality and diversity through a tunable, analytically tractable surrogate. This formulation permits large-scale, high-dimensional QD optimization previously infeasible with grid-based methods. Empirical evidence demonstrates competitiveness and often superiority versus established QD algorithms on standard benchmarks, with additional robustness and scalability. These features make SQUAD a theoretically well-founded and practically effective approach for large-scale, high-diversity optimization tasks (Hedayatian et al., 30 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Soft QD Using Approximated Diversity (SQUAD).