- The paper introduces a hierarchical Bayesian Bradley-Terry model that overcomes MLE pitfalls in estimating team strengths.
- It employs Gaussian and Beta priors along with log-space transformation to facilitate robust MCMC sampling.
- The approach enhances early season predictions and stable rankings by integrating hyperpriors from past season data.
Hierarchical Bayesian Bradley-Terry for Applications in Major League Baseball
Introduction
This paper introduces a hierarchical Bayesian version of the Bradley-Terry model tailored for Major League Baseball (MLB) applications. The authors propose a probabilistic framework utilizing hierarchical Bayesian inference to improve upon traditional maximum likelihood estimation (MLE) approaches for ranking and predicting team performance. By leveraging a Bayesian approach, the model seeks to overcome the limitations of MLE, such as non-existence or pathological estimates, while providing interpretable hierarchical structures for enhanced inference and prediction tasks.
Bradley-Terry Model and Bayesian Inference
The Bradley-Terry model formulates the probability of one team defeating another based on assigned strengths to each team. In its standard form, weaknesses in MLE—such as producing infinite or zero probabilities—are addressed through a Bayesian framework that introduces prior distributions over the team strengths. This Bayesian perspective enforces consistency and avoids MLE's pitfalls, especially pertinent in MLB, where data volume can alleviate but not entirely resolve these issues.
Model Specification
The paper advocates for priors that respect certain desiderata: invariance under team interchange, winner/loser interchange, and team elimination. The paper explores two main classes of priors: a Gaussian distribution over log-strengths and a Beta distribution over the probability of defeating an imaginary unit-strength opponent. A transformation of the latter into log-space allows consistent comparison and facilitates MCMC sampling, a crucial part of Bayesian computation.

Figure 1: $p(\lambda_i \given I_{\beta})$ distribution in the form of a type III generalized logistic distribution.
Hierarchical Modeling and Hyperparameters
The authors recommend hierarchical modeling over direct hyperparameter tuning, embedding hyperpriors into the Bayesian framework. Specifically, past season data provides a basis for defining hyperpriors on team's strength variability (σ), using Gamma distributions informed by Gaussian approximations of posterior estimates from previous data.
Figure 2: The hyperprior p(σ) used to model the 2017 MLB season.
Implementation and Applications
Ranking Systems
The model supports constructing robust team ranking systems beyond basic win-loss metrics, leveraging the integrated team strengths over the season to provide a nuanced view of team performance. This Bayesian approach inherently regularizes estimates, serving as a safeguard against overfitting, and produces stabilized rankings reflecting true team capabilities across the season.
Predictive Modeling
Predictive performance is enhanced by utilizing the posterior predictive distribution to forecast future game outcomes. The model displays superior predictive accuracy early in the season, where limited data traditionally hampers MLE-based predictions. Throughout the season, Bayesian predictions maintain competitiveness, confirming the model's utility in dynamically uncertain settings.

Figure 3: Predictions based on $\mathbb{E}[\tilde{\mathbf{V}}^{\text{test}} \given \mathbf{V}^{\text{train}}]$ show improved accuracy, particularly in data-scarce environments.
Conclusion
The proposed hierarchical Bayesian Bradley-Terry model offers a statistically rigorous and computationally feasible method to enhance MLB team analysis through ranking and prediction. By exploiting hierarchical structures and Bayesian inference, the model provides resilient and interpretable outputs, mitigating the limitations of traditional MLE frameworks. Future research may extend this approach to other sports and scenarios, emphasizing its adaptability and potential for broader applications.