Soft QD Using Approximated Diversity (SQUAD)
- The paper introduces SQUAD—a gradient-based, continuous quality-diversity optimization method that leverages a differentiable lower bound surrogate to maximize aggregate illumination over behavior space.
- It employs kernel-based interactions and pairwise repulsion to balance quality rewards with diversity constraints, effectively scaling to high-dimensional and large-population problems.
- Empirical evaluations on tasks such as LP, IC, and LSI demonstrate that SQUAD outperforms several state-of-the-art QD benchmarks with superior metrics like QVS and QD-Score.
Soft QD Using Approximated Diversity (SQUAD) is a differentiable, population-based optimization algorithm that reframes Quality-Diversity (QD) as continuous attraction-repulsion in behavior space. SQUAD circumvents the need for explicit discretization of the behavior space, scaling efficiently to high dimensions and large populations while preserving or outperforming state-of-the-art QD benchmarks. The approach formalizes QD objectives as maximization of aggregate "illumination" from a set of solutions over an abstract behavior space, using kernel-based interactions and a tractable differentiable approximation amenable to gradient-based optimization (Hedayatian et al., 30 Nov 2025).
1. Soft QD Objective: Definition and Intuition
Let denote a population of parameter vectors, where is a differentiable quality (objective) function and a differentiable behavior descriptor. With and , each solution is treated as an isotropic Gaussian "light source" in behavior space, its "brightness" decaying by bandwidth .
The induced behavior-value field is
and the Soft QD Score is defined as the total illumination:
Direct optimization of is intractable, so SQUAD proceeds via a tractable lower bound. By applying inclusion-exclusion, truncating at pairwise terms, and bounding by , one obtains
where .
- The sum of rewards high-quality solutions.
- The pairwise repulsion term, exponentially decaying with behavioral distance and weighted by , enforces diversity.
2. Derivation, Differentiability, and Limit Properties
The lower bound derives from the inclusion-exclusion form:
For , the term is tightly upper bounded by the geometric mean, which admits Gaussian integral solutions.
The resulting objective, dropping constant factors and using :
If and are differentiable, so is . The per-solution gradient is:
Appendix proofs confirm key theoretical properties:
- is nondecreasing under addition of new solutions or increase in .
- is submodular; marginal gains diminish as the population grows.
- In the limit (), , i.e., soft QD recovers the canonical QD-Score on a fine grid.
3. SQUAD Algorithm and Optimization
The SQUAD algorithm optimizes the lower bound using mini-batch stochastic gradient ascent. The procedure is as follows:
- Inputs: population size , batch size , neighbor count , iterations , diversity bandwidth , optimizer (e.g., Adam with learning rate ).
- Initialization: sample ; compute qualities and behaviors ; initialize optimizer state .
- Loop: For :
- Choose batch of size .
- For each :
- Identify nearest neighbors in behavior space.
- Compute
- Update using via ; re-evaluate .
- Termination: Return at final iteration.
Key hyperparameters: , and the Gaussian kernel width if computing directly.
4. Theoretical Properties and Scalability
SQUAD inherits the following properties:
- Monotonicity: is nondecreasing as new solutions are added or is increased.
- Submodularity: Diminishing returns property enables approximate optimality under cardinality constraints.
- Limiting Behavior: For , SQUAD converges to standard QD-Score maximization over a grid.
- Curse-of-Dimensionality Avoidance: Does not require discretization or archives, and uses continuous, kernel-based repulsion, resulting in memory requirements invariant to behavior-space dimension .
- Approximation Error: Contributions from neglected higher-order overlaps in the inclusion-exclusion expansion are bounded and decay as behavioral coverage increases.
These properties collectively underpin SQUAD’s ability to scale to high-dimensional behavior spaces and large solution populations.
5. Empirical Evaluation
SQUAD's performance was benchmarked on diverse tasks and compared to existing methods including CMA-MEGA, CMA-MAEGA, Sep-CMA-MAE, GA-ME, DNS, and DNS-G.
Tasks and Metrics
- LP: Linear-Projection Rastrigin, solution dim , behavior dims .
- IC: Image Composition (1024 circles, 5-d behavior).
- LSI: Latent-Space Illumination via StyleGAN2+CLIP, and .
Metrics included QD-Score (sum of best-in-cell over CVT), coverage (number of occupied CVT cells), Vendi Score (VS, effective number of clusters), QVS (mean-quality × VS), mean and max objective.
Outcome Summary
- LP: At , CMA-MAEGA/CMA-MEGA slightly outperform SQUAD, but SQUAD surpasses all baselines in both QVS and QD-Score at and (for example, at QVS: SQUAD , CMA-MAEGA , CMA-MEGA ). Gradient-based methods, including SQUAD, outperform mutation-only approaches at higher dimensionalities.
- IC: SQUAD achieves highest mean objective (), highest max objective (), and best VS (). Coverage is slightly below CMA-MAEGA (5.68 vs 5.85), but VS more accurately reflects true diversity.
- Quality-Diversity Trade-Off: Varying in shows a tunable trade-off—higher diversity (VS) for larger at the expense of mean objective.
- LSI: In base (), SQUAD QD-Score , QVS , surpassing CMA-MEGA (), CMA-MAEGA (). In hard (), SQUAD () vs best baseline (). Other methods often experience failure (negative mean objectives).
6. Implementation and Practical Considerations
Default Hyperparameters
- IC/LP: , , , , tuned to domain (IC: ; LP: easy/medium/hard ).
- LSI: , , , , .
Algorithmic Details
- If , behaviors should be mapped via $\logit(b)$ to ; ablation indicates this is critical.
- Each batch update computes pairwise and quality terms, with overall iteration cost .
- Mini-batching and limited nearest neighbors optimize memory efficiency.
Implementation Tips
- Automatic differentiation frameworks (JAX, PyTorch, etc.) are recommended for both objective and descriptor.
- Precompute and cache k-NN structures in behavior space.
- Annealing or adapting to local density can improve performance.
- Monitor VS, coverage, and mean objective during optimization; early stopping is often effective (e.g., 200 iterations for IC/LSI).
- Ensure is nonnegative for meaningful QVS evaluation.
Computational Cost
- Simple tasks (LP) complete in under 1 minute.
- IC requires 190 minutes for 1000 iterations (RTX 4090), but high performance is typically attained in fewer than 200 iterations.
- LSI (base/hard): 730/1300 minutes, with convergence in substantially less than the full budget.
7. Significance in Quality-Diversity Optimization
SQUAD provides an alternative to archive-based QD: it offers a smooth, differentiable objective and adaptively balances quality and diversity through a tunable, analytically tractable surrogate. This formulation permits large-scale, high-dimensional QD optimization previously infeasible with grid-based methods. Empirical evidence demonstrates competitiveness and often superiority versus established QD algorithms on standard benchmarks, with additional robustness and scalability. These features make SQUAD a theoretically well-founded and practically effective approach for large-scale, high-diversity optimization tasks (Hedayatian et al., 30 Nov 2025).