Soft QD Using Approximated Diversity (SQUAD)

Updated 7 December 2025

The paper introduces SQUAD—a gradient-based, continuous quality-diversity optimization method that leverages a differentiable lower bound surrogate to maximize aggregate illumination over behavior space.
It employs kernel-based interactions and pairwise repulsion to balance quality rewards with diversity constraints, effectively scaling to high-dimensional and large-population problems.
Empirical evaluations on tasks such as LP, IC, and LSI demonstrate that SQUAD outperforms several state-of-the-art QD benchmarks with superior metrics like QVS and QD-Score.

Soft QD Using Approximated Diversity (SQUAD) is a differentiable, population-based optimization algorithm that reframes Quality-Diversity (QD) as continuous attraction-repulsion in behavior space. SQUAD circumvents the need for explicit discretization of the behavior space, scaling efficiently to high dimensions and large populations while preserving or outperforming state-of-the-art QD benchmarks. The approach formalizes QD objectives as maximization of aggregate "illumination" from a set of solutions over an abstract behavior space, using kernel-based interactions and a tractable differentiable approximation amenable to gradient-based optimization (Hedayatian et al., 30 Nov 2025).

1. Soft QD Objective: Definition and Intuition

Let $\Theta = \{\theta_1, ..., \theta_N\}$ denote a population of parameter vectors, where $f(\theta) \in \mathbb{R}_+$ is a differentiable quality (objective) function and $\mathrm{desc}(\theta) \in \mathcal{B} \subseteq \mathbb{R}^d$ a differentiable behavior descriptor. With $f_n = f(\theta_n)$ and $b_n = \mathrm{desc}(\theta_n)$ , each solution is treated as an isotropic Gaussian "light source" in behavior space, its "brightness" $f_n$ decaying by bandwidth $\sigma > 0$ .

The induced behavior-value field is

$v_{\Theta}(b) = \max_{1 \leq n \leq N} f_n \exp\left(-\frac{\|b - b_n\|^2}{2 \sigma^2}\right),$

and the Soft QD Score is defined as the total illumination:

$S(\Theta) = \int_{b \in \mathbb{R}^d} v_{\Theta}(b) \, db.$

Direct optimization of $S(\Theta)$ is intractable, so SQUAD proceeds via a tractable lower bound. By applying inclusion-exclusion, truncating at pairwise terms, and bounding $\min(x, y)$ by $\sqrt{xy}$ , one obtains

$\widetilde{S}(\Theta) = \sum_{n=1}^N f_n - \sum_{1 \leq i < j \leq N} \sqrt{f_i f_j} \exp \left( -\frac{ \| b_i - b_j \|^2 }{ \gamma^2 } \right),$

where $\gamma^2 = 8\sigma^2$ .

The sum of $f_n$ rewards high-quality solutions.
The pairwise repulsion term, exponentially decaying with behavioral distance and weighted by $\sqrt{f_i f_j}$ , enforces diversity.

2. Derivation, Differentiability, and Limit Properties

The lower bound $\widetilde{S}(\Theta)$ derives from the inclusion-exclusion form:

$\int \max_n g_n(b) db = \sum_i \int g_i db - \sum_{i<j} \int \min(g_i, g_j) db + \ldots$

For $g_n(b) = f_n \exp(-\|b - b_n\|^2 / (2\sigma^2))$ , the $\min$ term is tightly upper bounded by the geometric mean, which admits Gaussian integral solutions.

The resulting objective, dropping constant factors and using $\gamma^2=8\sigma^2$ :

$\widetilde{S}(\Theta) = \sum_{n=1}^N f_n - \sum_{1 \leq i < j \leq N} \sqrt{f_i f_j} \exp \left( -\frac{ \| b_i - b_j \|^2 }{ \gamma^2 } \right)$

If $f$ and $\mathrm{desc}$ are differentiable, so is $\widetilde{S}(\Theta)$ . The per-solution gradient is: $\frac{\partial \widetilde{S}}{\partial \theta_n} = \frac{\partial f_n}{\partial \theta_n} - \frac{1}{2} \sum_{j \neq n} \left[ \sqrt{ \frac{f_j}{f_n} } e^{- \frac{ \| b_n - b_j \|^2 }{ \gamma^2 } } \frac{ \partial f_n }{\partial \theta_n } + \sqrt{ f_n f_j } e^{- \frac{ \| b_n - b_j \|^2 }{ \gamma^2 } } \frac{ -2 }{ \gamma^2 } (b_n - b_j)^T \frac{ \partial b_n }{ \partial \theta_n } \right]$

Appendix proofs confirm key theoretical properties:

$S(\Theta)$ is nondecreasing under addition of new solutions or increase in $f_n$ .
$S(\Theta)$ is submodular; marginal gains diminish as the population grows.
In the limit $\sigma \to 0$ ( $\gamma^2 \to 0$ ), $S(\Theta)/(2\pi\sigma^2)^{d/2} \to \sum_n f_n$ , i.e., soft QD recovers the canonical QD-Score on a fine grid.

3. SQUAD Algorithm and Optimization

The SQUAD algorithm optimizes the lower bound $\widetilde{S}(\Theta)$ using mini-batch stochastic gradient ascent. The procedure is as follows:

Inputs: population size $N$ , batch size $M$ , neighbor count $K$ , iterations $T$ , diversity bandwidth $\gamma^2$ , optimizer $O$ (e.g., Adam with learning rate $\eta$ ).
Initialization: sample $\Theta = \{\theta_1, ..., \theta_N\}$ ; compute qualities $F = (f_i)$ and behaviors $B = (b_i)$ ; initialize optimizer state $S$ .
Loop: For $t = 1, ..., T$ $t = 1, ..., T$ :
- Choose batch $I \subset \{1,\ldots,N\}$ of size $M$ .
- For each $i \in I$ :
- Identify $K$ nearest neighbors $N_i$ in behavior space.
- Compute
$\widetilde{S}_I = \sum_{i \in I} f_i - \frac{1}{2} \sum_{i \in I, j \in N_i} \sqrt{f_i f_j} \exp(-\|b_i-b_j\|^2/\gamma^2 )$ - Update $\{\theta_i: i\in I\}$ using $\nabla_{\theta_I} \widetilde{S}_I$ via $O$ ; re-evaluate $(F_I,B_I)$ .
Termination: Return $\Theta$ at final iteration.

Key hyperparameters: $N, M, K, \gamma^2, \eta$ , and the Gaussian kernel width $\sigma$ if computing $S(\Theta)$ directly.

4. Theoretical Properties and Scalability

SQUAD inherits the following properties:

Monotonicity: $S(\Theta)$ is nondecreasing as new solutions are added or $f_n$ is increased.
Submodularity: Diminishing returns property enables approximate optimality under cardinality constraints.
Limiting Behavior: For $\gamma \to 0$ , SQUAD converges to standard QD-Score maximization over a grid.
Curse-of-Dimensionality Avoidance: Does not require discretization or archives, and uses continuous, kernel-based repulsion, resulting in memory requirements invariant to behavior-space dimension $d$ .
Approximation Error: Contributions from neglected higher-order overlaps in the inclusion-exclusion expansion are bounded and decay as behavioral coverage increases.

These properties collectively underpin SQUAD’s ability to scale to high-dimensional behavior spaces and large solution populations.

5. Empirical Evaluation

SQUAD's performance was benchmarked on diverse tasks and compared to existing methods including CMA-MEGA, CMA-MAEGA, Sep-CMA-MAE, GA-ME, DNS, and DNS-G.

Tasks and Metrics

LP: Linear-Projection Rastrigin, solution dim $n=1024$ , behavior dims $d=4,8,16$ .
IC: Image Composition (1024 circles, 5-d behavior).
LSI: Latent-Space Illumination via StyleGAN2+CLIP, $d=2$ and $d=7$ .

Metrics included QD-Score (sum of best-in-cell over CVT), coverage (number of occupied CVT cells), Vendi Score (VS, effective number of clusters), QVS (mean-quality × VS), mean and max objective.

Outcome Summary

LP: At $d=4$ , CMA-MAEGA/CMA-MEGA slightly outperform SQUAD, but SQUAD surpasses all baselines in both QVS and QD-Score at $d=8$ and $d=16$ (for example, at $d=16$ QVS: SQUAD $\approx 6.6$ , CMA-MAEGA $\approx 4.6$ , CMA-MEGA $\approx 3.8$ ). Gradient-based methods, including SQUAD, outperform mutation-only approaches at higher dimensionalities.
IC: SQUAD achieves highest mean objective ( $83.4\pm0.02$ ), highest max objective ( $93.6\pm0.10$ ), and best VS ( $5.49\pm0.00$ ). Coverage is slightly below CMA-MAEGA (5.68 vs 5.85), but VS more accurately reflects true diversity.
Quality-Diversity Trade-Off: Varying $\gamma^2$ in $[10^{-3}, 50]$ shows a tunable trade-off—higher diversity (VS) for larger $\gamma^2$ at the expense of mean objective.
LSI: In base ( $d=2$ ), SQUAD QD-Score $\approx 13.4 \times 10^3$ , QVS $\approx177$ , surpassing CMA-MEGA ( $8.7\times10^3/140$ ), CMA-MAEGA ( $6.8\times10^3/122$ ). In hard ( $d=7$ ), SQUAD ( $2.55\times10^3/151$ ) vs best baseline ( $0.39\times10^3/99$ ). Other methods often experience failure (negative mean objectives).

6. Implementation and Practical Considerations

Default Hyperparameters

IC/LP: $N=1024$ , $M=64$ , $K=16$ , $\eta=0.05$ , $\gamma^2$ tuned to domain (IC: $\gamma^2=1$ ; LP: easy/medium/hard $\gamma^2=0.1/0.5/1.0$ ).
LSI: $N=256$ , $M=8$ , $K=16$ , $\eta=0.1$ , $\gamma^2=0.01 \ldots 0.1$ .

Algorithmic Details

If $\mathcal{B} = [0,1]^d$ , behaviors should be mapped via $\logit(b)$ to $\mathbb{R}^d$ ; ablation indicates this is critical.
Each batch update computes $O(MK)$ pairwise and $O(M)$ quality terms, with overall iteration cost $O(NK)$ .
Mini-batching and limited nearest neighbors optimize memory efficiency.

Implementation Tips

Automatic differentiation frameworks (JAX, PyTorch, etc.) are recommended for both objective and descriptor.
Precompute and cache k-NN structures in behavior space.
Annealing $\gamma^2$ or adapting $K$ to local density can improve performance.
Monitor VS, coverage, and mean objective during optimization; early stopping is often effective (e.g., $\sim$ 200 iterations for IC/LSI).
Ensure $f(\theta)$ is nonnegative for meaningful QVS evaluation.

Computational Cost

Simple tasks (LP) complete in under 1 minute.
IC requires $\sim$ 190 minutes for 1000 iterations (RTX 4090), but high performance is typically attained in fewer than 200 iterations.
LSI (base/hard): $\sim$ 730/1300 minutes, with convergence in substantially less than the full budget.

7. Significance in Quality-Diversity Optimization

SQUAD provides an alternative to archive-based QD: it offers a smooth, differentiable objective and adaptively balances quality and diversity through a tunable, analytically tractable surrogate. This formulation permits large-scale, high-dimensional QD optimization previously infeasible with grid-based methods. Empirical evidence demonstrates competitiveness and often superiority versus established QD algorithms on standard benchmarks, with additional robustness and scalability. These features make SQUAD a theoretically well-founded and practically effective approach for large-scale, high-diversity optimization tasks (Hedayatian et al., 30 Nov 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Soft Quality-Diversity Optimization (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Soft QD Using Approximated Diversity (SQUAD).

Soft QD Using Approximated Diversity (SQUAD)

1. Soft QD Objective: Definition and Intuition

2. Derivation, Differentiability, and Limit Properties

3. SQUAD Algorithm and Optimization

4. Theoretical Properties and Scalability

5. Empirical Evaluation

Tasks and Metrics

Outcome Summary

6. Implementation and Practical Considerations

Default Hyperparameters

Algorithmic Details

Implementation Tips

Computational Cost

7. Significance in Quality-Diversity Optimization

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Soft QD Using Approximated Diversity (SQUAD)

1. Soft QD Objective: Definition and Intuition

2. Derivation, Differentiability, and Limit Properties

3. SQUAD Algorithm and Optimization

4. Theoretical Properties and Scalability

5. Empirical Evaluation

Tasks and Metrics

Outcome Summary

6. Implementation and Practical Considerations

Default Hyperparameters

Algorithmic Details

Implementation Tips

Computational Cost

7. Significance in Quality-Diversity Optimization

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research