Glicko-2 Rating System
- Glicko-2 is an advanced probabilistic model that estimates performance using dynamic rating deviation and a volatility parameter.
- It updates player ratings after head-to-head encounters by applying Bayesian methods and functions like g(ϕ) to weigh match outcomes.
- The system is applied in competitive gaming, machine learning benchmarks, and sports analytics to ensure fair, precise performance assessments.
The Glicko-2 rating system is an advanced probabilistic model for estimating and updating the latent performance of agents—players in games or classifiers in algorithmic competitions—through repeated pairwise encounters. It generalizes the Elo system to allow dynamic confidence intervals (rating deviation, RD) and a dedicated volatility parameter (σ), providing empirical estimates not only of comparative skill but also of both prediction reliability and consistency across changing contexts. It has become increasingly central to high-stakes analytics in domains ranging from competitive gaming to machine learning benchmarking and team sports ranking, with notable deployments in classifier tournaments (Cardoso et al., 2021, Bober-Irizar et al., 2024, Cardoso et al., 13 Apr 2025), football club analytics (Shelopugin et al., 2023), and esports matchmaking (Bober-Irizar et al., 2024).
1. Parameterization and Internal Scaling
Every agent (player, classifier, team) is represented at time t by three primary values:
- (rating): a point estimate of current ability, typically initialized to 1500.
- (rating deviation): a standard deviation describing uncertainty in , initialized to 350.
- (volatility): quantifies temporal instability of skill, initialized to 0.06.
Glicko-2 operates on a rescaled, centered space:
- ,
- ,
with ensuring interpretability and comparability to Elo-derived systems. The volatility-change parameter controls the responsiveness of volatility updates (typical range: $0.3$–$1.2$), and serves as the logistic base constant (Cardoso et al., 2021, Bober-Irizar et al., 2024, Cardoso et al., 13 Apr 2025).
2. Per-Period Update Mechanism
Agent ratings are updated at the end of discrete "periods," such as a set of games, a dataset-based tournament, or a seasonal batch. Each agent plays matches against opponents ; results indicate loss, draw, or win.
The update flow for agent proceeds:
- Impact Function ():
Opponents with high RD are down-weighted in the update.
- Expected Score:
- Variance and Delta:
- Volatility Update ():
Find such that Typically solved via Brent’s or Illinois method.
- Preliminary Deviation:
- New RD and Rating:
- Conversion Back:
These equations rigorously propagate observed outcomes and inferred uncertainty through the system, supporting both head-to-head and round-robin competitive structures (Cardoso et al., 2021, Bober-Irizar et al., 2024, Cardoso et al., 13 Apr 2025).
3. Algorithmic and Statistical Rationale
Glicko-2’s statistical underpinnings reflect Bayesian updating where the prior for each agent is a normal with variance , and actual match outcomes provide a sequence of (possibly noisy) observations. The function is a reliability dampener, assuring that outcomes against poorly measured opponents do not induce outsized rating changes. The term aggregates "surprise"—deviation between observed and expected scores—scaled by the information quality . Volatility update is driven by the likelihood of the observed under the prior, ensuring compatibility with rating shocks (unexpected performance swings) (Bober-Irizar et al., 2024, Cardoso et al., 13 Apr 2025).
4. Application Domains and Adaptations
Glicko-2 is implemented as follows across representative domains:
- Machine Learning Classifier Benchmarking:
Each dataset in the benchmark is a rating period. Item Response Theory (IRT) estimates classifier ability per instance difficulty; classifiers "compete" via head-to-head matches (S=1/0/0.5) based on true-score comparisons. The Glicko-2 update sequence refines per classifier after each dataset benchmark (Cardoso et al., 2021, Cardoso et al., 13 Apr 2025). Table: Mapping of Glicko-2 Entities in Classifier Benchmarking | Glicko-2 Term | Competition Context | Update Basis | |---------------|--------------------|---------------------------| | Player | Classifier | Dataset-based tournaments | | Match | Head-to-head eval | IRT true-score comparison | | Period | Dataset | One round-robin session |
- Football League Analytics:
Each match is a rating period. Modifications include: explicit draw probabilities via a Poisson/LightGBM model; home-field advantage via added shift; league transitions via preseason parameter resets; rating inflation control via league average renormalization. The expected-score formula is adapted to model triplets (Shelopugin et al., 2023).
- Esports (CS:GO):
Glicko-2 outperformed both Elo and TrueSkill at predicting non-draw professional match outcomes, with systematic gains in accuracy across training horizons and consistent parameterization using canonical defaults (, , , ). No domain-dependent tuning is required for or to attain robust skill separation (Bober-Irizar et al., 2024).
5. Pseudocode and Update Example
A canonical Glicko-2 update cycle (as used in classifier benchmarking):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
def Glicko2_Update(R, RD, sigma, tau, opponents): mu = (R - 1500) / 173.7178 phi = RD / 173.7178 q = log(10)/400 for j in opponents: mu_j = (R_j - 1500) / 173.7178 phi_j = RD_j / 173.7178 g_j = 1 / sqrt(1 + 3 * q**2 * phi_j**2 / pi**2) E_j = 1 / (1 + exp(-g_j * (mu - mu_j))) v = 1 / sum([ g_j**2 * E_j * (1 - E_j ) for each j ]) Delta = v * sum([ g_j * (S_j - E_j ) for each j ]) a = log(sigma**2) # Define f(x) as in update equations x_star = root_find(f(x)=0, initial=a) sigma_prime = exp(x_star / 2) phi_star = sqrt(phi**2 + sigma_prime**2) phi_prime = 1 / sqrt(1 / (phi_star**2) + 1 / v) mu_prime = mu + phi_prime**2 * sum([ g_j * (S_j - E_j) for each j ]) R_prime = 1500 + 173.7178 * mu_prime RD_prime = 173.7178 * phi_prime return (R_prime, RD_prime, sigma_prime) |
A worked numerical example is provided in (Cardoso et al., 13 Apr 2025): two agents both begin with , , , ; one wins a head-to-head match, increasing its rating to and decreasing significantly, while volatility shows minute adjustment.
6. Interpretability and Domain-Specific Metrics
The triplet supports a nuanced interpretation:
- : best estimate of ability integrated over observed performance.
- : reflects rating confidence; smaller corresponds to higher certainty. A 95% interval may be reported as .
- : captures inconsistency across periods; low implies stable performance.
Glicko-2 provides statistically grounded, compact summary metrics for ability, reliability, and volatility, thereby enabling fine-grained decisions in both algorithmic benchmarking and competitive analytics (Cardoso et al., 2021, Bober-Irizar et al., 2024, Shelopugin et al., 2023, Cardoso et al., 13 Apr 2025).
7. Extensions, Modifications, and Practical Considerations
In football analytics, enhancements include probabilistic draws, explicit home/away modeling, league transitions, and log-loss-based hyperparameter optimization for improved predictive performance ((Shelopugin et al., 2023), github.com/andreyshelopugin/GlickoSoccer). In machine learning benchmarking, Glicko-2 facilitates fairer classifier comparison by integrating statistical difficulty (via IRT) and standardizing head-to-head ability assessment (Cardoso et al., 2021, Cardoso et al., 13 Apr 2025).
Parameter selection is generally robust; typical defaults yield effective results, and only targeted domains (football, multi-player esports) warrant extensive hyperparameter tuning. Insufficiently informative matches (low ) do little to reduce uncertainty, which is reflected in persistently high . Systematic sensitivity analyses have validated the stability of Glicko-2 under standard settings compared to alternative rating models (Bober-Irizar et al., 2024).
For full algorithmic details, practical deployment strategies, and curated numerical examples, see (Cardoso et al., 2021, Bober-Irizar et al., 2024, Shelopugin et al., 2023), and (Cardoso et al., 13 Apr 2025).