Papers
Topics
Authors
Recent
Search
2000 character limit reached

Glicko-2 Rating System

Updated 25 January 2026
  • Glicko-2 is a probabilistic rating system that quantifies skill, uncertainty, and volatility for dynamic performance evaluation.
  • It employs iterative scaling and opponent-based functions to compute win probabilities and update ratings after each match.
  • Extensions for draws, home advantage, and league transitions improve its prediction accuracy in eSports, chess, and team sports.

The Glicko-2 rating system is a probabilistic model for inferring latent skill and outcome likelihoods in competitive two-player (or team-versus-team) environments. It extends the original Glicko system by integrating dynamic measures of rating uncertainty and skill volatility, enabling adaptive inference in the presence of noisy or time-varying performances. Glicko-2 is widely used in electronic sports, chess federations, and, with recent extensions, as a foundation for comparative analytics in team sports and multi-league scenarios (Bober-Irizar et al., 2024, Shelopugin et al., 2023).

1. Formal Structure and Mathematical Machinery

Each competitor in Glicko-2 is parameterized by three real-valued quantities: a skill rating rr (on an open, arbitrary scale), a rating deviation ϕ\phi (interpreted as credibility/confidence in the current rr), and a volatility parameter σ\sigma (interpreted as expected temporal variation in latent skill).

The key latent quantities are updated after each rating period or match as follows (Bober-Irizar et al., 2024, Shelopugin et al., 2023):

  • Scaling transformation: Convert user-facing ratings and deviations to the canonical Glicko-2 scale, e.g., μ=(r1500)/173.7178\mu = (r-1500)/173.7178, ϕ=RD/173.7178\phi = \mathrm{RD}/173.7178.
  • Opponent scaling function: g(ϕj)=(1+3ϕj2π2)1/2g(\phi_j) = \left(1 + \tfrac{3\phi_j^2}{\pi^2}\right)^{-1/2}.
  • Win probability: Eij=[1+exp(g(ϕj)(μiμj))]1E_{ij} = \left[1 + \exp\left(-g(\phi_j)(\mu_i - \mu_j)\right)\right]^{-1}.
  • Variance of rating estimate: v1=jg(ϕj)2Eij(1Eij)v^{-1} = \sum_j g(\phi_j)^2 E_{ij} (1-E_{ij}).
  • Rating improvement ("Delta"): Δ=vjg(ϕj)[sjEij]\Delta = v \sum_j g(\phi_j)[s_j - E_{ij}].
  • Volatility update: The updated volatility σi\sigma_i' is found via iterative solution to f(x)=0f(x) = 0, where f(x)=[ex(Δ2ϕi2vex)]/[2(ϕi2+v+ex)2](xlnσi2)/τ2f(x)=\left[e^x (\Delta^2 - \phi_i^2 - v - e^x)\right]/[2(\phi_i^2 + v + e^x)^2] - (x-\ln\sigma_i^2)/\tau^2, with x=ln(σi2)x^*=\ln(\sigma_i'^2) and fixed τ\tau.
  • Posterior RD and rating: ϕi=ϕi2+σi2\phi_i^* = \sqrt{\phi_i^2 + \sigma_i'^2}; ϕi=1/1/ϕi2+1/v\phi_i' = 1/\sqrt{1/\phi_i^{*2} + 1/v}; μi=μi+ϕi2jg(ϕj)(sjEij)\mu_i' = \mu_i + \phi_i'^2 \sum_j g(\phi_j)(s_j - E_{ij}).
  • Return to user scale: ri=1500+173.7178μir_i' = 1500 + 173.7178 \mu_i', RDi=173.7178ϕi\mathrm{RD}_i' = 173.7178 \phi_i'.

Standard Glicko-2 only natively handles binary outcomes (win/loss). Several enhanced models introduced in later literature extend this framework to multi-way outcomes (including draws), additive context effects (home advantage), and dynamic rating transitions across organizational boundaries (league changes) (Shelopugin et al., 2023).

2. Parameter Choices and Initialization

Empirical best practices, as recommended by Glickman and validated in recent studies, are encoded in default settings: initial r0=1500r_0=1500, ϕ0=350\phi_0=350, σ0=0.06\sigma_0=0.06, and τ=0.5\tau=0.5 (Bober-Irizar et al., 2024, Shelopugin et al., 2023). These settings deliver high initial uncertainty (rapid learning), gradual volatility adaptation, and stability for established players or teams.

The volatility system constant τ\tau controls the rate at which σ\sigma can drift, with τ=0.5\tau=0.5 representing a generic compromise between reactivity and over-smoothing. ϕ\phi and σ\sigma are typically initialized to reflect maximum plausible uncertainty and minor expected skill fluctuations, respectively.

Most systems maintain fixed hyperparameters, but validation-driven sweeps over {τ,ϕ0,σ0}\{\tau, \phi_0, \sigma_0\} within bounded ranges are recommended for use-cases with a priori unknown skill volatility or where competition formats significantly deviate from canonical 1v1 encounters (Bober-Irizar et al., 2024).

3. Model Extensions: Draws, Home Advantage, and League Transition

Advanced models for practical deployment in team sports or tournaments frequently extend Glicko-2 to accommodate multi-outcome (win/draw/loss) settings and contextual structural changes (Shelopugin et al., 2023).

  • Draw extension: The outcome model is reparameterized via a three-way softmax:

P(wini,j)=exp(g(ϕj)Δμ)1+exp(g(ϕj)Δμ)+exp() P(drawi,j)=exp()1+exp(g(ϕj)Δμ)+exp() P(lossi,j)=11+exp(g(ϕj)Δμ)+exp()\begin{aligned} P(\mathrm{win}_{i,j}) &= \frac{\exp(g(\phi_j)\Delta\mu)}{1+\exp(g(\phi_j)\Delta\mu)+\exp(\ell)} \ P(\mathrm{draw}_{i,j}) &= \frac{\exp(\ell)}{1+\exp(g(\phi_j)\Delta\mu)+\exp(\ell)} \ P(\mathrm{loss}_{i,j}) &= \frac{1}{1+\exp(g(\phi_j)\Delta\mu)+\exp(\ell)} \end{aligned}

where \ell is the sum of a global draw-adjustment and match-specific log-odds from a Poisson-Skellam model.

  • Home-field advantage: A location-dependent offset hh is added to the home team's latent μ\mu before computing EijE_{ij}.
  • League transition: Per-league intercepts and transition penalties/boosts are applied to ratings and deviations to encode performance shifts due to promotions, relegations, or between-league comparisons, with normalization to prevent inflation of the league average.
  • Seasonal adjustment: At the start of each new season, ϕ\phi is increased by ϕs\phi_s to encode uncertainty due to team roster or organizational changes.

All hyperparameters for contextual effects are estimated by minimizing negative log-likelihoods for match outcomes.

4. Empirical Results and Comparative Analyses

In large-scale empirical evaluations, such as those performed on ≈10,000 professional Counter-Strike: Global Offensive matches, Glicko-2 demonstrates superior or at least comparable predictive accuracy to both vanilla Elo and canonical TrueSkill (without per-player decomposition) (Bober-Irizar et al., 2024).

Results show, for example, that after 2,000 training matches with an optimized acquisition function, Glicko-2 attains 63.1% accuracy in head-to-head non-draw match prediction, outperforming Elo (62.8%) and matching or slightly exceeding standard team-level TrueSkill (62.9%). Only models that fully exploit team composition granularity (e.g., “TrueSkillPlayers”) attain higher accuracy in this regime (≈64.1%).

On European and South American soccer leagues, Glicko-2 with draw and home/league extensions achieves lower log-loss on outcome prediction than Poisson regression-based methods (0.5832 vs. 0.5896–0.5949 across various baselines and settings) (Shelopugin et al., 2023).

5. Practical Usage, Calibration, and Data-Driven Recommendations

Glicko-2 is architected for rapid adaptation when uncertainty is high and stabilization when sufficient match history accrues. The rating deviation mechanism functions as a data-efficient early estimator, while the volatility parameter allows the system to accommodate abrupt skill changes (“unexpected” match results) without labor-intensive manual tuning. In practical terms, it surpasses Elo in both data efficiency and the principled handling of "outlier" outcomes (Bober-Irizar et al., 2024).

However, in its unextended form, Glicko-2 is not suited for free-for-all or multi-team competitions requiring higher-order factor graphs (as in TrueSkill). Studies recommend supplementing with context-driven extensions and/or decomposing team ratings to finer granularity (e.g., per-player) if inference on team composition, multi-way fixtures, or cross-domain transitions is required (Bober-Irizar et al., 2024, Shelopugin et al., 2023).

Recommended practical steps include:

  • Lightweight hyperparameter sweeps over τ[0.1,1.5]\tau \in [0.1, 1.5], σ0[0.03,0.15]\sigma_0 \in [0.03, 0.15], ϕ0[150,600]\phi_0 \in [150, 600];
  • Application of draw and home/league adjustment offsets in settings where data supports substantial outcome heterogeneity;
  • Utilization of acquisition functions to drive the scheduling of “most informative” matches, thereby optimizing learning efficiency;
  • Decomposition of team ratings by assigning ratings to individual participants, summing or otherwise aggregating for team-based prediction.

6. Interpretability, Limitations, and Generalization

Glicko-2, especially with contextual augmentations, offers interpretable skill estimates that can be mapped or normalized across heterogeneous competitive domains (Shelopugin et al., 2023). Key limitations include its restriction to binary outcomes without extension, lack of native support for non-dyadic (multi-team) matchups, and the potential need for parameter retuning in high-volatility or structurally evolving environments.

Augmentation patterns established for draws, venue effects, and cross-league transitions are broadly generalizable, preserving the Bayesian core of Glicko-2 while expanding its applicability to other professional sports, electronic games, and even hybrid domains where performance modeling is dynamic and context-dependent.


References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Glicko2 Rating System.