- The paper introduces a unified Large Event Model that forecasts soccer match events with enhanced accuracy by leveraging sequential deep learning techniques.
- It employs a three-layer neural network with 512 neurons per layer, using ordinal encoding and WyScout data to simulate continuous event sequences.
- The model improves inference speed and accuracy, offering applications in match simulation, tactical analysis, and quantitative metrics like expected goals.
Forecasting Events in Soccer Matches Through Language
This paper proposes a novel approach for predicting and simulating soccer match events using a model inspired by LLMs. By conceptualizing soccer matches as sequences of events analogous to sentences in natural language processing, the research introduces Large Event Models (LEMs) aimed at capturing the complex dynamics of soccer. This approach simplifies previous methods by integrating multi-faceted event data into a single predictive model, allowing for improved accuracy and efficiency in event forecasting.
Introduction and Background
Soccer, a global game with significant data analytics potential, has seen slower adoption of advanced AI techniques compared to other sports. The complexity of soccer, with its continuous play and large number of interacting variables, presents challenges for traditional data models. LEMs harness sequential deep learning models, similar to those in NLP frameworks like LLMs, to model and predict events based on a continuous stream of contextual input.
Methodology
The methodology revolves around utilizing a single neural network model to predict soccer match events. Unlike previous approaches that rely on multiple models for different event aspects, this approach uses a unified model inspired by LLMs that can simulate entire event sequences. The input data consists of event attributes from publicly available datasets, encoded using a simplified ordinal encoding scheme.
Figure 1: A schematic representation of our proposal. In blue, we have the set of inputs used to build the input vector, passed through the LEM model to infer the probabilities of each token. To make a prediction, the probabilities go through a sampler with restrictions to avoid hallucinations, i.e., predicting unrealistic values.
Implementation
The model employs a deep learning architecture consisting of three layers with 512 neurons each, optimized with Adam optimizer and trained via BCELoss over 50 epochs. Training utilizes the WyScout dataset, split into training and testing sets, to predict events such as pass, shot, and movement coordinates. One key feature is the ability to simulate complete soccer matches by iteratively generating probabilistic forecasts for each event token within the sequence.
Results and Analysis
The results demonstrate marked improvement in prediction accuracy and inference speed over previous models. Notably, the K=3 variant of the model — leveraging information from three previous events — provides superior predictions.
Figure 2: The probability of transitioning from current location x,y to the next location x,y. The pattern contains two behaviors: (1) the positive correlation between the current coordinates and the next coordinate, as the next event performed by the same team is expected to be close to the current event, and (2) a negative correlation caused by when the next event is performed by the opposite team, as the coordinate axis changes to the opposition's perspective.
The model learns effective transition patterns between event locations, accurately modeling the inherent structure and sequential nature of soccer events.
Applications and Implications
LEMs offer extensive analytic capabilities, applicable to areas such as match prediction, player performance analysis, and strategic planning. By simulating entire matches, stakeholders can derive insights into various strategic scenarios.





Figure 3: The situational expected goals maps calculated across the different models. For each case, we simulated 1,000,000 shots for each input. Then, we calculate the percentage of shots leading to a goal for each location, which is used to plot the figures.
Furthermore, LEMs can generate detailed metrics such as xG (expected goals), aiding in the quantitative assessment of opportunities.
Conclusion
The introduction of LEMs marks a significant step forward in the application of AI to sports analytics, particularly in soccer. By leveraging a single neural network framework influenced by LLM techniques, LEMs effectively predict and simulate the dynamic sequences of soccer events. Future research could explore incorporating more advanced deep learning architectures to capture broader contexts and refine predictive accuracy further. The integration of LEMs into the analysis and strategy development processes promises considerable advancements in understanding and enhancing soccer performance.