Papers
Topics
Authors
Recent
Search
2000 character limit reached

Predicting the Popularity of Games on Steam

Published 6 Oct 2021 in cs.LG | (2110.02896v1)

Abstract: The video game industry has seen rapid growth over the last decade. Thousands of video games are released and played by millions of people every year, creating a large community of players. Steam is a leading gaming platform and social networking site, which allows its users to purchase and store games. A by-product of Steam is a large database of information about games, players, and gaming behavior. In this paper, we take recent video games released on Steam and aim to discover the relation between game popularity and a game's features that can be acquired through Steam. We approach this task by predicting the popularity of Steam games in the early stages after their release and we use a Bayesian approach to understand the influence of a game's price, size, supported languages, release date, and genres on its player count. We implement several models and discover that a genre-based hierarchical approach achieves the best performance. We further analyze the model and interpret its coefficients, which indicate that games released at the beginning of the month and games of certain genres correlate with game popularity.

Summary

  • The paper presents a Bayesian modeling framework that predicts Steam game popularity by analyzing release features such as price, supported languages, and past player counts.
  • It employs hierarchical and heteroscedastic models to account for genre-specific effects and feature-dependent variance, ensuring nuanced predictions.
  • Results emphasize the dominant impact of median past player counts while also highlighting the significant contributions of other game attributes for accurate forecasting.

Predicting the Popularity of Games on Steam

This essay provides an analysis of the research conducted in the paper "Predicting the Popularity of Games on Steam" (2110.02896). The paper tackles the challenge of predicting video game popularity on the Steam platform by employing a Bayesian modeling approach, utilizing various features observable at the time of a game's release.

Introduction

The researchers focus on predicting game popularity, operationalized as the median player count in the second month post-release. They utilize a range of game attributes, including price, release date, supported languages, and storage requirements, to model this popularity metric. Detailed statistical modeling and Bayesian inference techniques underpin the approach, emphasizing uncertainty estimations which are crucial for business decision-making.

Data Collection and Processing

The dataset is derived from Steam, SteamSpy, and SteamDB, capturing games released after 2015. The data includes information on game price, genres, supported languages, and past player counts. Rigorous preprocessing was carried out, converting prices to Euro and extracting usable data from system requirements specified in unstructured text. Figure 1

Figure 1: Support of the most popular languages, we can see that English is by far the most supported language. Note that one game can support more languages.

Feature Engineering and Key Insights

Key features engineered include past and predicted player counts, and a log transformation is utilized to manage skew in features like price and player counts. Genre, release day, and supported languages are prominent features in understanding a game's market reach and potential popularity.

The research highlights the minimal linear correlation between most features and the target except the main predictor, the past median player count, showcasing the necessity for complex models. Figure 2

Figure 2: Network plot of game genres. The nodes represent genres while the edges represent games that are in both genres. The opacity of an edge represents the proportion of games the connected genres share. The size of a node represents the number of games in a genre.

Methodology: Bayesian Modeling

Normal and Folded Normal Models

The study initially employs a normal distribution model which proves unstable due to the non-negative nature of the target variable. This leads to the adoption of a folded normal distribution model more suitable for non-negative targets.

Hierarchical Model

The hierarchical folded normal model introduces genre-specific intercepts, allowing differentiation in prediction based on game genres, enhancing the model's adaptability.

Heteroscedastic Models

To refine variance estimations, heteroscedastic models considering feature-dependent variance are developed. This allows for more nuanced predictions, better reflecting the variability in player counts across different games.

Results and Model Comparison

Performance is evaluated using PSIS-LOOIC, revealing a gradual decrease in model performance as prediction extends into further future months. The heteroscedastic hierarchical model consistently outperforms others due to its adaptability and detailed variance handling. Figure 3

Figure 3: LOOIC estimated for multiple predicted months. The performance of the models expectedly drops over time, with the greatest decrease between the 2nd and 3rd month.

Feature Contribution and Implications

The analysis suggests the median player count as a dominant feature. Additional features, such as the number of supported languages and seasonal release trends, exhibit significant yet complex contributions to model predictions. Figure 4

Figure 4: Parameter contribution plot for three of the four main features. The median player count feature is omitted for easier visualization.

Conclusion

The paper establishes a robust methodological framework for predicting game popularity on Steam, leveraging Bayesian models to incorporate uncertainty into predictions—vital for informed decision-making in game development and marketing strategies. Future enhancements could involve integrating additional data sources or optimizing feature transformations for even greater predictive accuracy. This research opens pathways for using similar methodologies in other domains of digital content popularity prediction, where comprehensive and reliable metrics are necessary.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.