Parallel SPSA (PSPO) Optimization
- PSPO is a stochastic optimization algorithm that uses multiple simultaneous perturbations to approximate gradients for noisy, expensive black-box functions.
- It leverages parallel computing to reduce estimator variance, resulting in faster convergence compared to traditional SPSA.
- Its applications include complex simulations like epidemiological model calibration, where it efficiently balances computational cost and accuracy.
Parallel Simultaneous Perturbation Optimization (PSPO) is a stochastic gradient-based optimization algorithm tailored for maximizing expected values of expensive, noisy black-box functions. PSPO generalizes the classical Simultaneous Perturbation Stochastic Approximation (SPSA) method by leveraging multiple simultaneous perturbations in each iteration and exploiting parallel computing architectures. Its design addresses high-variance gradient estimates that arise when optimizing complex stochastic systems—such as those found in epidemiology or simulation-driven sciences—where function evaluations are computationally intensive and noise is significant (Alaeddini et al., 2017, Alaeddini et al., 2017).
1. Formulation of the Stochastic Optimization Problem
PSPO is applied to problems of the form: where is a continuously differentiable mean function and observed outputs are: The function is accessed only through noisy and costly queries, for example, calls to a Monte Carlo simulator. The optimization is fully derivative-free, using only noisy function-value queries to approximate gradients and (optionally) Hessians.
2. Review of Simultaneous Perturbation Stochastic Approximation (SPSA)
SPSA constructs a first-order estimate of the gradient at each iteration by evaluating two points, perturbed from the current parameter by a random vector : with a perturbation parameter. The stochastic gradient estimate is: where is the element-wise inverse. Parameter updates use a Robbins–Monro style diminishing step sequence : SPSA estimator is unbiased but can suffer from high variance when function noise is substantial, resulting in many required iterations for convergence (Alaeddini et al., 2017, Alaeddini et al., 2017).
3. PSPO Algorithmic Structure
PSPO trades SPSA's minimal iteration cost for reduced estimator variance via -way parallelism. In each iteration, independent random Bernoulli perturbations are sampled; for each direction, two function values are computed in parallel: For and linearly independent , the minimum-variance unbiased gradient estimator is: where is the vector of finite-difference ratios . For , a minimum-norm solution is used: This estimator is unbiased: ; its variance decreases rapidly with and with the invertibility and conditioning of .
PSPO incorporates a nonlinear conjugate-gradient (NCG) update as its outer loop. The residual and daggered direction are updated by: A reduced-Hessian estimate in direction is constructed for line search: The step size is: yielding the updated parameters: (Alaeddini et al., 2017).
4. Error Tolerance, Step Size, and Theoretical Guarantees
PSPO allows explicit control of the stochastic gradient error per iteration. To enforce , the required number of parallel perturbations is: Step-size sequences for diminish as and with and ; stabilizes initial steps. Under these conditions, PSPO converges almost surely to a local maximum of (Alaeddini et al., 2017).
The variance of the PSPO gradient estimator is: With the recommended construction of , for , implying variance scales as (Alaeddini et al., 2017).
5. Computational Complexity and Parallel Scalability
Each PSPO iteration incurs function evaluations but, when parallel compute resources are available, the wall-clock time per iteration matches that of SPSA's two serial function calls. The reduced variance of the PSPO estimator yields fewer required outer iterations, providing a speedup proportional to up to the regime where gradient noise no longer limits convergence. Communication and data aggregation overheads are negligible for moderate (e.g., ) and . As grows and the aggregation of becomes significant, additional efficiency analysis is necessary (Alaeddini et al., 2017).
6. Applications and Benchmarks
PSPO's performance has been demonstrated on both synthetic and applied benchmarks:
- Quadratic Toy Problem: For with and , PSPO reduced the number of required optimization iterations by approximately 50% compared to SPSA, with near-linear wall-clock speedup for up to 8 (Alaeddini et al., 2017).
- Stochastic Epidemiological Model Calibration: PSPO was used to fit a two-parameter SIR model to 1861 Hagelloch measles outbreak data (188 cases). PSPO with achieved convergence in about 10 iterations, compared to SPSA's 24, corresponding to a reduction in iteration count and near-linear speedup in wall-clock time given 8 parallel cores. The final parameter estimates and uncertainty quantification (via Monte Carlo averaged Hessian) matched those of SPSA, with day, day, and 95% confidence intervals [0.43,0.47] and [0.14,0.16] respectively (Alaeddini et al., 2017, Alaeddini et al., 2017).
| Algorithm | Per-iteration evals | Minimum wall-time (with workers) | Convergence (iterations) |
|---|---|---|---|
| SPSA | 2 | 24–26 | |
| PSPO () | 5 | 10–12 |
7. Practical Considerations and Extensions
PSPO's effectiveness depends on balancing , , and the observed noise . The Chebyshev-based lower bound for ensures that the gradient estimator achieves a prescribed error tolerance without excessive cost. Step-size and perturbation schedule parameters () are best tuned via grid search on a coarse surrogate model.
PSPO lends itself to second-order variants by estimating the Hessian matrix efficiently in parallel, and by projecting the estimate onto the negative-definite cone to guarantee robust Newton-like steps. It can deliver natural Fisher-matrix-based uncertainty quantification with little additional computational overhead (Alaeddini et al., 2017).
The approach scales nearly linearly in for and modest, further supported by advances in high-performance cloud computing infrastructures.
References
- "Parallel Simultaneous Perturbation Optimization" (Alaeddini et al., 2017)
- "Application of a Second-order Stochastic Optimization Algorithm for Fitting Stochastic Epidemiological Models" (Alaeddini et al., 2017)