Glicko-2 Rating System
- Glicko-2 is a probabilistic rating system that quantifies skill, uncertainty, and volatility for dynamic performance evaluation.
- It employs iterative scaling and opponent-based functions to compute win probabilities and update ratings after each match.
- Extensions for draws, home advantage, and league transitions improve its prediction accuracy in eSports, chess, and team sports.
The Glicko-2 rating system is a probabilistic model for inferring latent skill and outcome likelihoods in competitive two-player (or team-versus-team) environments. It extends the original Glicko system by integrating dynamic measures of rating uncertainty and skill volatility, enabling adaptive inference in the presence of noisy or time-varying performances. Glicko-2 is widely used in electronic sports, chess federations, and, with recent extensions, as a foundation for comparative analytics in team sports and multi-league scenarios (Bober-Irizar et al., 2024, Shelopugin et al., 2023).
1. Formal Structure and Mathematical Machinery
Each competitor in Glicko-2 is parameterized by three real-valued quantities: a skill rating (on an open, arbitrary scale), a rating deviation (interpreted as credibility/confidence in the current ), and a volatility parameter (interpreted as expected temporal variation in latent skill).
The key latent quantities are updated after each rating period or match as follows (Bober-Irizar et al., 2024, Shelopugin et al., 2023):
- Scaling transformation: Convert user-facing ratings and deviations to the canonical Glicko-2 scale, e.g., , .
- Opponent scaling function: .
- Win probability: .
- Variance of rating estimate: .
- Rating improvement ("Delta"): .
- Volatility update: The updated volatility is found via iterative solution to , where , with and fixed .
- Posterior RD and rating: ; ; .
- Return to user scale: , .
Standard Glicko-2 only natively handles binary outcomes (win/loss). Several enhanced models introduced in later literature extend this framework to multi-way outcomes (including draws), additive context effects (home advantage), and dynamic rating transitions across organizational boundaries (league changes) (Shelopugin et al., 2023).
2. Parameter Choices and Initialization
Empirical best practices, as recommended by Glickman and validated in recent studies, are encoded in default settings: initial , , , and (Bober-Irizar et al., 2024, Shelopugin et al., 2023). These settings deliver high initial uncertainty (rapid learning), gradual volatility adaptation, and stability for established players or teams.
The volatility system constant controls the rate at which can drift, with representing a generic compromise between reactivity and over-smoothing. and are typically initialized to reflect maximum plausible uncertainty and minor expected skill fluctuations, respectively.
Most systems maintain fixed hyperparameters, but validation-driven sweeps over within bounded ranges are recommended for use-cases with a priori unknown skill volatility or where competition formats significantly deviate from canonical 1v1 encounters (Bober-Irizar et al., 2024).
3. Model Extensions: Draws, Home Advantage, and League Transition
Advanced models for practical deployment in team sports or tournaments frequently extend Glicko-2 to accommodate multi-outcome (win/draw/loss) settings and contextual structural changes (Shelopugin et al., 2023).
- Draw extension: The outcome model is reparameterized via a three-way softmax:
where is the sum of a global draw-adjustment and match-specific log-odds from a Poisson-Skellam model.
- Home-field advantage: A location-dependent offset is added to the home team's latent before computing .
- League transition: Per-league intercepts and transition penalties/boosts are applied to ratings and deviations to encode performance shifts due to promotions, relegations, or between-league comparisons, with normalization to prevent inflation of the league average.
- Seasonal adjustment: At the start of each new season, is increased by to encode uncertainty due to team roster or organizational changes.
All hyperparameters for contextual effects are estimated by minimizing negative log-likelihoods for match outcomes.
4. Empirical Results and Comparative Analyses
In large-scale empirical evaluations, such as those performed on ≈10,000 professional Counter-Strike: Global Offensive matches, Glicko-2 demonstrates superior or at least comparable predictive accuracy to both vanilla Elo and canonical TrueSkill (without per-player decomposition) (Bober-Irizar et al., 2024).
Results show, for example, that after 2,000 training matches with an optimized acquisition function, Glicko-2 attains 63.1% accuracy in head-to-head non-draw match prediction, outperforming Elo (62.8%) and matching or slightly exceeding standard team-level TrueSkill (62.9%). Only models that fully exploit team composition granularity (e.g., “TrueSkillPlayers”) attain higher accuracy in this regime (≈64.1%).
On European and South American soccer leagues, Glicko-2 with draw and home/league extensions achieves lower log-loss on outcome prediction than Poisson regression-based methods (0.5832 vs. 0.5896–0.5949 across various baselines and settings) (Shelopugin et al., 2023).
5. Practical Usage, Calibration, and Data-Driven Recommendations
Glicko-2 is architected for rapid adaptation when uncertainty is high and stabilization when sufficient match history accrues. The rating deviation mechanism functions as a data-efficient early estimator, while the volatility parameter allows the system to accommodate abrupt skill changes (“unexpected” match results) without labor-intensive manual tuning. In practical terms, it surpasses Elo in both data efficiency and the principled handling of "outlier" outcomes (Bober-Irizar et al., 2024).
However, in its unextended form, Glicko-2 is not suited for free-for-all or multi-team competitions requiring higher-order factor graphs (as in TrueSkill). Studies recommend supplementing with context-driven extensions and/or decomposing team ratings to finer granularity (e.g., per-player) if inference on team composition, multi-way fixtures, or cross-domain transitions is required (Bober-Irizar et al., 2024, Shelopugin et al., 2023).
Recommended practical steps include:
- Lightweight hyperparameter sweeps over , , ;
- Application of draw and home/league adjustment offsets in settings where data supports substantial outcome heterogeneity;
- Utilization of acquisition functions to drive the scheduling of “most informative” matches, thereby optimizing learning efficiency;
- Decomposition of team ratings by assigning ratings to individual participants, summing or otherwise aggregating for team-based prediction.
6. Interpretability, Limitations, and Generalization
Glicko-2, especially with contextual augmentations, offers interpretable skill estimates that can be mapped or normalized across heterogeneous competitive domains (Shelopugin et al., 2023). Key limitations include its restriction to binary outcomes without extension, lack of native support for non-dyadic (multi-team) matchups, and the potential need for parameter retuning in high-volatility or structurally evolving environments.
Augmentation patterns established for draws, venue effects, and cross-league transitions are broadly generalizable, preserving the Bayesian core of Glicko-2 while expanding its applicability to other professional sports, electronic games, and even hybrid domains where performance modeling is dynamic and context-dependent.
References:
- [Skill Issues: An Analysis of CS:GO Skill Rating Systems, (Bober-Irizar et al., 2024)]
- [Ratings of European and South American Football Leagues Based on Glicko-2 with Modifications, (Shelopugin et al., 2023)]