Papers
Topics
Authors
Recent
Search
2000 character limit reached

Glicko-2 Rating System

Updated 1 February 2026
  • Glicko-2 is an advanced probabilistic model that estimates performance using dynamic rating deviation and a volatility parameter.
  • It updates player ratings after head-to-head encounters by applying Bayesian methods and functions like g(ϕ) to weigh match outcomes.
  • The system is applied in competitive gaming, machine learning benchmarks, and sports analytics to ensure fair, precise performance assessments.

The Glicko-2 rating system is an advanced probabilistic model for estimating and updating the latent performance of agents—players in games or classifiers in algorithmic competitions—through repeated pairwise encounters. It generalizes the Elo system to allow dynamic confidence intervals (rating deviation, RD) and a dedicated volatility parameter (σ), providing empirical estimates not only of comparative skill but also of both prediction reliability and consistency across changing contexts. It has become increasingly central to high-stakes analytics in domains ranging from competitive gaming to machine learning benchmarking and team sports ranking, with notable deployments in classifier tournaments (Cardoso et al., 2021, Bober-Irizar et al., 2024, Cardoso et al., 13 Apr 2025), football club analytics (Shelopugin et al., 2023), and esports matchmaking (Bober-Irizar et al., 2024).

1. Parameterization and Internal Scaling

Every agent (player, classifier, team) is represented at time t by three primary values:

  • RR (rating): a point estimate of current ability, typically initialized to 1500.
  • RDRD (rating deviation): a standard deviation describing uncertainty in RR, initialized to 350.
  • σ\sigma (volatility): quantifies temporal instability of skill, initialized to 0.06.

Glicko-2 operates on a rescaled, centered space:

  • μ=R1500173.7178\mu = \frac{R - 1500}{173.7178},
  • ϕ=RD173.7178\phi = \frac{RD}{173.7178},

with 173.7178=400/ln10173.7178 = 400/\ln 10 ensuring interpretability and comparability to Elo-derived systems. The volatility-change parameter τ>0\tau > 0 controls the responsiveness of volatility updates (typical range: $0.3$–$1.2$), and q=ln(10)/4000.0057565q = \ln(10)/400 \approx 0.0057565 serves as the logistic base constant (Cardoso et al., 2021, Bober-Irizar et al., 2024, Cardoso et al., 13 Apr 2025).

2. Per-Period Update Mechanism

Agent ratings are updated at the end of discrete "periods," such as a set of games, a dataset-based tournament, or a seasonal batch. Each agent ii plays mm matches against opponents j=1..mj=1..m; results sij{0,0.5,1}s_{ij} \in \{0, 0.5, 1\} indicate loss, draw, or win.

The update flow for agent ii proceeds:

  1. Impact Function (gg):

g(ϕj)=11+3ϕj2/π2g(\phi_j) = \frac{1}{\sqrt{1 + 3\phi_j^2/\pi^2}} Opponents with high RD are down-weighted in the update.

  1. Expected Score:

E(μi,μj,ϕj)=11+exp[g(ϕj)(μiμj)]E(\mu_i, \mu_j, \phi_j) = \frac{1}{1 + \exp[-g(\phi_j)(\mu_i - \mu_j)]}

  1. Variance and Delta: v=[jg(ϕj)2E(μi,μj,ϕj)(1E(μi,μj,ϕj))]1v = \left[ \sum_j g(\phi_j)^2 \cdot E(\mu_i, \mu_j, \phi_j) \cdot (1 - E(\mu_i, \mu_j, \phi_j)) \right]^{-1}

Δ=vjg(ϕj)[sijE(μi,μj,ϕj)]\Delta = v \sum_j g(\phi_j)[s_{ij} - E(\mu_i, \mu_j, \phi_j)]

  1. Volatility Update (σ\sigma'):

Find xx such that f(x)=xστ2Δ2ϕ2+v+x+ϕ2ϕ2+v+x=0f(x) = \frac{x - \sigma}{\tau^2} - \frac{\Delta^2}{\phi^2 + v + x} + \frac{\phi^2}{\phi^2 + v + x} = 0 Typically solved via Brent’s or Illinois method.

  1. Preliminary Deviation:

ϕ=ϕ2+σ2\phi^* = \sqrt{\phi^2 + {\sigma'}^2}

  1. New RD and Rating: ϕ=11/(ϕ)2+1/v\phi' = \frac{1}{\sqrt{1/(\phi^*)^2 + 1/v}}

μ=μ+ϕ2jg(ϕj)[sijE(μi,μj,ϕj)]\mu' = \mu + {\phi'}^2 \sum_j g(\phi_j)[s_{ij} - E(\mu_i, \mu_j, \phi_j)]

  1. Conversion Back:

R=173.7178μ+1500R' = 173.7178\,\mu' + 1500 RD=173.7178ϕRD' = 173.7178\,\phi'

σ=σ\sigma' = \sigma'

These equations rigorously propagate observed outcomes and inferred uncertainty through the system, supporting both head-to-head and round-robin competitive structures (Cardoso et al., 2021, Bober-Irizar et al., 2024, Cardoso et al., 13 Apr 2025).

3. Algorithmic and Statistical Rationale

Glicko-2’s statistical underpinnings reflect Bayesian updating where the prior for each agent is a normal with variance ϕ2+σ2\phi^2 + \sigma^2, and actual match outcomes provide a sequence of (possibly noisy) observations. The g(ϕ)g(\phi) function is a reliability dampener, assuring that outcomes against poorly measured opponents do not induce outsized rating changes. The Δ\Delta term aggregates "surprise"—deviation between observed and expected scores—scaled by the information quality vv. Volatility update is driven by the likelihood of the observed Δ\Delta under the prior, ensuring compatibility with rating shocks (unexpected performance swings) (Bober-Irizar et al., 2024, Cardoso et al., 13 Apr 2025).

4. Application Domains and Adaptations

Glicko-2 is implemented as follows across representative domains:

  • Machine Learning Classifier Benchmarking:

Each dataset in the benchmark is a rating period. Item Response Theory (IRT) estimates classifier ability per instance difficulty; classifiers "compete" via head-to-head matches (S=1/0/0.5) based on true-score comparisons. The Glicko-2 update sequence refines (R,RD,σ)(R, RD, \sigma) per classifier after each dataset benchmark (Cardoso et al., 2021, Cardoso et al., 13 Apr 2025). Table: Mapping of Glicko-2 Entities in Classifier Benchmarking | Glicko-2 Term | Competition Context | Update Basis | |---------------|--------------------|---------------------------| | Player | Classifier | Dataset-based tournaments | | Match | Head-to-head eval | IRT true-score comparison | | Period | Dataset | One round-robin session |

  • Football League Analytics:

Each match is a rating period. Modifications include: explicit draw probabilities via a Poisson/LightGBM model; home-field advantage via added μ\mu shift; league transitions via preseason parameter resets; rating inflation control via league average renormalization. The expected-score formula is adapted to model (win,draw,loss)(\text{win}, \text{draw}, \text{loss}) triplets (Shelopugin et al., 2023).

  • Esports (CS:GO):

Glicko-2 outperformed both Elo and TrueSkill at predicting non-draw professional match outcomes, with systematic gains in accuracy across training horizons and consistent parameterization using canonical defaults (r0=1500r_0=1500, RD0=350RD_0=350, σ0=0.06\sigma_0=0.06, τ=0.5\tau=0.5). No domain-dependent tuning is required for τ\tau or σ0\sigma_0 to attain robust skill separation (Bober-Irizar et al., 2024).

5. Pseudocode and Update Example

A canonical Glicko-2 update cycle (as used in classifier benchmarking):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def Glicko2_Update(R, RD, sigma, tau, opponents):
    mu  = (R - 1500) / 173.7178
    phi = RD / 173.7178
    q = log(10)/400
    for j in opponents:
        mu_j = (R_j - 1500) / 173.7178
        phi_j = RD_j / 173.7178
        g_j = 1 / sqrt(1 + 3 * q**2 * phi_j**2 / pi**2)
        E_j = 1 / (1 + exp(-g_j * (mu - mu_j)))
    v = 1 / sum([ g_j**2 * E_j * (1 - E_j ) for each j ])
    Delta = v * sum([ g_j * (S_j - E_j ) for each j ])
    a = log(sigma**2)
    # Define f(x) as in update equations
    x_star = root_find(f(x)=0, initial=a)
    sigma_prime = exp(x_star / 2)
    phi_star = sqrt(phi**2 + sigma_prime**2)
    phi_prime = 1 / sqrt(1 / (phi_star**2) + 1 / v)
    mu_prime = mu + phi_prime**2 * sum([ g_j * (S_j - E_j) for each j ])
    R_prime = 1500 + 173.7178 * mu_prime
    RD_prime = 173.7178 * phi_prime
    return (R_prime, RD_prime, sigma_prime)
(Cardoso et al., 13 Apr 2025)

A worked numerical example is provided in (Cardoso et al., 13 Apr 2025): two agents both begin with R=1500R=1500, RD=350RD=350, σ=0.06\sigma=0.06, τ=0.5\tau=0.5; one wins a head-to-head match, increasing its rating to 1676\approx1676 and decreasing RDRD significantly, while volatility σ\sigma shows minute adjustment.

6. Interpretability and Domain-Specific Metrics

The triplet (R,RD,σ)(R, RD, \sigma) supports a nuanced interpretation:

  • RR: best estimate of ability integrated over observed performance.
  • RDRD: reflects rating confidence; smaller RDRD corresponds to higher certainty. A 95% interval may be reported as [R2RD,R+2RD][R-2RD, R+2RD].
  • σ\sigma: captures inconsistency across periods; low σ\sigma implies stable performance.

Glicko-2 provides statistically grounded, compact summary metrics for ability, reliability, and volatility, thereby enabling fine-grained decisions in both algorithmic benchmarking and competitive analytics (Cardoso et al., 2021, Bober-Irizar et al., 2024, Shelopugin et al., 2023, Cardoso et al., 13 Apr 2025).

7. Extensions, Modifications, and Practical Considerations

In football analytics, enhancements include probabilistic draws, explicit home/away modeling, league transitions, and log-loss-based hyperparameter optimization for improved predictive performance ((Shelopugin et al., 2023), github.com/andreyshelopugin/GlickoSoccer). In machine learning benchmarking, Glicko-2 facilitates fairer classifier comparison by integrating statistical difficulty (via IRT) and standardizing head-to-head ability assessment (Cardoso et al., 2021, Cardoso et al., 13 Apr 2025).

Parameter selection is generally robust; typical defaults yield effective results, and only targeted domains (football, multi-player esports) warrant extensive hyperparameter tuning. Insufficiently informative matches (low vv) do little to reduce uncertainty, which is reflected in persistently high RDRD. Systematic sensitivity analyses have validated the stability of Glicko-2 under standard settings compared to alternative rating models (Bober-Irizar et al., 2024).


For full algorithmic details, practical deployment strategies, and curated numerical examples, see (Cardoso et al., 2021, Bober-Irizar et al., 2024, Shelopugin et al., 2023), and (Cardoso et al., 13 Apr 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Glicko-2 Rating System.