Grey Wolf Optimization (GWO) Overview

Updated 10 February 2026

Grey Wolf Optimization (GWO) is a swarm intelligence algorithm that mimics the hierarchical and cooperative hunting behavior of grey wolves for global optimization.
It uses a social hierarchy with alpha, beta, and delta roles to update candidate solutions, balancing exploration with exploitation of the search space.
GWO is enhanced through adaptive parameterization and hybrid approaches, enabling effective application in engineering, machine learning, and complex system optimization.

Grey Wolf Optimization (GWO) is a stochastic, population-based metaheuristic algorithm that models the social hierarchy and cooperative hunting strategy of grey wolf packs in nature. Introduced by Mirjalili et al. (2014), GWO has become a foundational algorithm within swarm intelligence, frequently serving as a competitive baseline and as a building block for hybrid and domain-specific optimizers. The algorithm's structure, theoretical properties, practical parameterization, and extensive empirical evaluations make it a central topic in global optimization, especially for high-dimensional, nonlinear, and multimodal landscapes.

1. Algorithmic Foundations and Mathematical Model

The canonical GWO algorithm maintains a population of candidate solutions termed "wolves," stratified into a social hierarchy: α (leader, best solution), β (second best), δ (third best), and ω (remaining agents). Optimization proceeds by simulating three stages of wolf hunting for prey (the optima): encircling, hunting, and attacking.

Position Update Mechanism:

At iteration $t$ , each wolf $X_i(t)\in\mathbb{R}^d$ updates its position according to the influence of the three leaders: $\mathbf{A}_k = 2\,a(t)\,\mathbf{r}_{1,k} - a(t), \quad \mathbf{C}_k = 2\,\mathbf{r}_{2,k} \qquad (k=1,2,3)$ where $\mathbf{r}_{1,k}, \mathbf{r}_{2,k}$ are random vectors in $[0,1]^d$ and $a(t)$ decreases linearly from $2$ to $0$: $a(t) = 2\left(1-\frac{t}{T_{\max}}\right)$ The position with respect to each leader is: $\mathbf{D}_k = \left|\mathbf{C}_k \odot \mathbf{X}_k - \mathbf{X}_i\right|$

$X_i(t)\in\mathbb{R}^d$ 0

The final update for each agent is the centroid: $X_i(t)\in\mathbb{R}^d$ 1 where $X_i(t)\in\mathbb{R}^d$ 2 denotes elementwise multiplication. This construct allows a balance between exploration ( $X_i(t)\in\mathbb{R}^d$ 3) and exploitation ( $X_i(t)\in\mathbb{R}^d$ 4) of the search space (Wang et al., 2022, Wang et al., 2022, Prasad et al., 21 May 2025).

2. Theoretical Analysis: Sampling, Stability, and Global Convergence

Rigorous mathematical analysis of GWO has led to several key results, particularly under the stagnation assumption (elite positions fixed for analysis tractability) (Wang et al., 2022, Wang et al., 2022):

Sampling Distribution: The update law induces a symmetric, single-peaked, heavy-tailed distribution around the mean of the three leaders. The variance of the update decays with $X_i(t)\in\mathbb{R}^d$ 5, focusing the search as convergence proceeds.
Stability: For any dimension $X_i(t)\in\mathbb{R}^d$ 6, the sequence of means $X_i(t)\in\mathbb{R}^d$ 7 stabilizes at the centroid of the three leaders (order-1 stability), while the variance $X_i(t)\in\mathbb{R}^d$ 8 decays to zero as $X_i(t)\in\mathbb{R}^d$ 9 (order-2 stability).
Global Convergence: Under regularity conditions and the stagnation abstraction, with probability one, GWO agents eventually sample the global neighborhood of any box containing the optimum. When the leader positions are permitted to improve, convergence is even more robust due to the moving centroid's attraction toward the global optimum.

These findings underpin GWO's empirical resilience against local minima and inform schedule design for $\mathbf{A}_k = 2\,a(t)\,\mathbf{r}_{1,k} - a(t), \quad \mathbf{C}_k = 2\,\mathbf{r}_{2,k} \qquad (k=1,2,3)$ 0 and the randomization mechanisms.

3. Parameterization, Interpretation, and Practical Guidelines

Key parameters:

Population size $\mathbf{A}_k = 2\,a(t)\,\mathbf{r}_{1,k} - a(t), \quad \mathbf{C}_k = 2\,\mathbf{r}_{2,k} \qquad (k=1,2,3)$ 1: Governs diversity and coverage. Typical choices: $\mathbf{A}_k = 2\,a(t)\,\mathbf{r}_{1,k} - a(t), \quad \mathbf{C}_k = 2\,\mathbf{r}_{2,k} \qquad (k=1,2,3)$ 2– $\mathbf{A}_k = 2\,a(t)\,\mathbf{r}_{1,k} - a(t), \quad \mathbf{C}_k = 2\,\mathbf{r}_{2,k} \qquad (k=1,2,3)$ 3 agents, with $\mathbf{A}_k = 2\,a(t)\,\mathbf{r}_{1,k} - a(t), \quad \mathbf{C}_k = 2\,\mathbf{r}_{2,k} \qquad (k=1,2,3)$ 4 problem dimension / 2 for robustness (Taghizadeh et al., 2024, Niu et al., 2024).
Maximum iterations $\mathbf{A}_k = 2\,a(t)\,\mathbf{r}_{1,k} - a(t), \quad \mathbf{C}_k = 2\,\mathbf{r}_{2,k} \qquad (k=1,2,3)$ 5: Set by expected convergence rate and problem complexity (commonly $\mathbf{A}_k = 2\,a(t)\,\mathbf{r}_{1,k} - a(t), \quad \mathbf{C}_k = 2\,\mathbf{r}_{2,k} \qquad (k=1,2,3)$ 6– $\mathbf{A}_k = 2\,a(t)\,\mathbf{r}_{1,k} - a(t), \quad \mathbf{C}_k = 2\,\mathbf{r}_{2,k} \qquad (k=1,2,3)$ 7).
Control parameter $\mathbf{A}_k = 2\,a(t)\,\mathbf{r}_{1,k} - a(t), \quad \mathbf{C}_k = 2\,\mathbf{r}_{2,k} \qquad (k=1,2,3)$ 8: The linear decay $\mathbf{A}_k = 2\,a(t)\,\mathbf{r}_{1,k} - a(t), \quad \mathbf{C}_k = 2\,\mathbf{r}_{2,k} \qquad (k=1,2,3)$ 9 from $\mathbf{r}_{1,k}, \mathbf{r}_{2,k}$ 0 to $\mathbf{r}_{1,k}, \mathbf{r}_{2,k}$ 1 is standard, as it ensures smooth shift from exploration to exploitation.

Empirical sensitivity studies consistently show that moderate population sizes and $\mathbf{r}_{1,k}, \mathbf{r}_{2,k}$ 2 provide convergence competitive with or superior to parameter-heavy alternatives. Proper scaling of $\mathbf{r}_{1,k}, \mathbf{r}_{2,k}$ 3 is crucial: too rapid decay induces premature exploitation; too slow causes excessive global search and slower refinement (Taghizadeh et al., 2024, Niu et al., 2024, Wang et al., 2022).

4. Hybrid Algorithms and Enhancements

Although basic GWO offers remarkable versatility, a substantial literature focuses on addressing its intrinsic limitations, such as premature convergence, lack of memory, or insufficient local refinement. Key lines include:

Elite Inheritance and Memory: Maintaining an elite archive across generations increases solution quality and guards against loss of high-fitness regions. The enhanced EBGWO variant uses a combined elite archive and a stochastic switch ("search tendency") between exploitation and exploration, yielding statistically significant improvements across benchmark functions and engineering problems (Jiang et al., 2024).
Clustering or Partitioning: Integration of clustering (e.g., K-means clustering in KMGWO) segments the population, enabling both intensified local search and broad exploration, especially effective on multimodal/complex landscapes (Mohammed et al., 2021).
Adaptive Control Schedules: Adaptive or non-linear decay laws (e.g. sigmoid/inverse-S in ACGWO) improve the transition between global search and local exploitation, maintaining diversity and preventing stagnation (Niu et al., 2024).
Chaotic and Levy-driven Randomness: Substituting conventional uniform random vectors with chaotic sequences or Levy flights diversifies exploration and aids in escaping deep local minima (CGWO, GMPA) (Mehrotra et al., 2018, Dehkordi et al., 18 May 2025).
Algorithmic Hybridization: GWO is frequently hybridized at the algorithmic layer. Integration with Differential Evolution (GWO-DE) (Bougas et al., 2 Jul 2025), Particle Swarm Optimization (PSO-GWO) (Prasad et al., 21 May 2025), Whale Optimization Algorithm (WOAGWO) (Mohammed et al., 2020), and Marine Predators Algorithm (GMPA) (Dehkordi et al., 18 May 2025) systematically alternates search strategies or uses detection of stagnation to trigger refinement with alternative operators. These hybrids demonstrate compelling results in large-scale, high-dimensional, and domain-specific problems.

5. Applications Across Problem Domains

GWO and its variants have achieved widespread adoption in numerous domains:

Engineering Optimization: Standard benchmarks (e.g., CEC14, CEC19, CEC17), pressure vessel design, gear-train, and welded beam design highlight both the canonical GWO and enhanced versions' proficiency compared to other metaheuristics (Mohammed et al., 2021, Jiang et al., 2024, Mohammed et al., 2020).
Cloud and Fog Scheduling: In vehicular fog computing, GWO has been used in a multi-objective scheduling context, encoding allocation matrices and using a weighted normalized objective to jointly minimize cost and makespan. A greedy post-processing step refines the GWO’s allocation respecting static/dynamic server priorities (Taghizadeh et al., 2024). Hybrid GWO–PSO schemes further support adaptive scheduling in modern, NP-hard cloud environments (Prasad et al., 21 May 2025).
Machine Learning: GWO-driven neural network training provides a derivative-free global search for weights and biases, outperforming traditional gradient-based methods in regression and classification settings, such as solar irradiance prediction and clinical risk modeling (Claywell et al., 2020, Niu et al., 2024).
Time Series and Model Fitting: GWO is employed to tune grey models in time series forecasting, such as continuous conformable fractional nonlinear grey Bernoulli models, yielding order-of-magnitude reductions in out-of-sample prediction error relative to classical statistical models (Xie et al., 2020).
Complex System Optimization: Interplanetary trajectory design utilizes GWO and marine-predator hybridizations to efficiently locate globally optimal paths subject to nonlinear, high-dimensional constraints (Dehkordi et al., 18 May 2025).

6. Quantitative Performance and Comparative Evaluation

Comparative studies across extensive benchmarks show that standard and enhanced GWO variants consistently outperform reference algorithms in solution quality, convergence speed, and robustness. Table-based comparisons (see below for selected empirical benchmarks) confirm statistically significant improvements of advanced GWO derivatives over established alternatives:

Problem / Function	GWO Best / Mean	Enhanced GWO Variant	Best / Mean	Benchmark Reference
CEC19 F5 (30D)	2.3949	KMGWO	2.2715	(Mohammed et al., 2021)
Pressure Vessel Design	6303.0	EBGWO	6059.9	(Jiang et al., 2024)
Sphere Function (30D)	7.84E-06	ACGWO	6.01E-219	(Niu et al., 2024)
Cassini-1 ΔV (km/s)	12.5	GMPA	9.62	(Dehkordi et al., 18 May 2025)

Across these and further domains, ablation studies demonstrate that the addition of memory, adaptive schedules, hybridized mutation/crossover, and clustering mechanisms systematically reduce error, increase reliability, and accelerate convergence.

7. Open Problems, Limitations, and Future Directions

Despite its empirical and theoretical strengths, challenges remain:

Premature Convergence: In high-dimensional or highly multimodal problems, standard GWO may still concentrate too early. Hence, maintenance of diversity, parameter adaptation, and judicious algorithmic switching remain important open questions (Wang et al., 2022, Jiang et al., 2024).
Parameter Tuning in Large Scale: Large-scale optimization may demand adaptive or problem-specific tuning of $\mathbf{r}_{1,k}, \mathbf{r}_{2,k}$ 4, $\mathbf{r}_{1,k}, \mathbf{r}_{2,k}$ 5, and hybridization thresholds.
Memory and Archive Scalability: Elite archives and balance mechanisms introduce nontrivial memory and computational costs as dimensions scale beyond several thousands.
Theoretical Extension Beyond Stagnation Assumption: Current proofs of global convergence hold primarily under the assumption of fixed elites; rigorous convergence rates and scaling laws for the real population-dynamics regime are an active domain.
Hybridization Complexity: While algorithmic hybrids outperform vanilla GWO in practice, they introduce code complexity, more hyperparameters, and potential inefficiencies if hybrid-switching is not properly calibrated (Bougas et al., 2 Jul 2025).

Prospective research directions include adaptive archive management, problem-specific local search hybridization, real-world multi-objective applications, and rigorous scaling studies in ultra-high dimensional settings.

References (arXiv IDs):

(Taghizadeh et al., 2024) (Vehicular fog scheduling), (Claywell et al., 2020) (MLP-GWO for solar prediction), (Wang et al., 2022) (GWO theory I: PDF and stability), (Wang et al., 2022) (GWO theory II: convergence proof), (Xie et al., 2020) (Grey model tuning), (Mehrotra et al., 2018) (Chaotic GWO), (Dehkordi et al., 18 May 2025) (GWO–MPA interplanetary trajectory), (Mohammed et al., 2021) (K-means GWO), (Niu et al., 2024) (Adaptive-curve GWO), (Mohammed et al., 2020) (WOA–GWO hybrid), (Jiang et al., 2024) (Elite-inheritance GWO), (Prasad et al., 21 May 2025) (PSO-GWO hybrid, cloud scheduling), (Bougas et al., 2 Jul 2025) (GWO–DE hybrid).