Bayesian multilevel step-and-turn models for evaluating player movement in American football

Published 18 Mar 2026 in stat.AP and stat.ME | (2603.17866v1)

Abstract: In sports analytics, player tracking data have driven significant advancements in the task of player evaluation. We present a novel generative framework for evaluating the observed frame-by-frame player positioning against a distribution of hypothetical alternatives. We illustrate our approach by modeling the within-play movement of an individual ball carrier in the National Football League (NFL). Specifically, we develop Bayesian multilevel models for frame-level player movement based on two components: step length (distance between successive locations) and turn angle (change in direction between successive steps). Using the step-and-turn models, we perform posterior predictive simulation to generate hypothetical ball carrier steps at each frame during a play. This enables comparison of the observed player movement with a distribution of simulated alternatives using common valuation measures in American football. We apply our framework to tracking data from the first nine weeks of the 2022 NFL season and derive novel player performance metrics based on hypothetical evaluation.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper introduces a Bayesian multilevel framework that simulates frame-level player movements by decomposing them into step length and turn angle components.
It leverages high-resolution tracking data using multilevel Gaussian and von Mises models to effectively capture temporal and directional uncertainties.
Posterior predictive simulation generates hypothetical movement alternatives, enabling robust evaluation of player performance through expected yards metrics.

Bayesian Multilevel Step-and-Turn Models for Evaluating Player Movement in American Football

Overview and Motivation

This work presents a generative statistical framework for player movement analysis within American football using high-resolution player tracking data. The authors develop Bayesian multilevel models for characterizing and simulating frame-level movement of ball carriers during NFL run plays. Their approach decomposes player movement into step length and turn angle attributes, enabling hypothetical evaluation by comparing the observed trajectory at every frame against a distribution of alternative, simulated steps. The methodology leverages recent advances in sports analytics, notably in multi-agent tracking and probabilistic evaluation (ghosting), but makes a crucial contribution by explicitly modeling uncertainty in movement and aggregating over entire plays rather than single time points.

Data Processing and Feature Construction

Player tracking data from the 2022 NFL season were used, comprising 5400 run plays over nine weeks, with position, velocity, orientation, and event tags for all 22 players and the football at 10 Hz. Frames from handoff to the termination event (tackle, out-of-bounds, touchdown) for each ball carrier were extracted. Spatial features were constructed for three groups: the ball carrier, the offense (excluding the ball carrier), and the defense, including distances and angular relationships, anchored on the ball carrier. Covariates for modeling included ball carrier features, nearest defender features, and aggregate directional groupings.

Generative Movement Models

Step Length Modeling

Step length (Euclidean distance traveled in one frame) is first normalized, then transformed with a scaled arcsine function to improve fit, overcoming deficiencies of standard positive-valued distributions (Gamma, log-normal). A multilevel Gaussian model is adopted, including random intercepts for each player and defensive team, and conditioning on previous frame's step length to capture temporal dependence.

Formally:

$\tilde s_{ijt} \sim \mathcal{N}(\mu_{ijt}, \sigma^2); \qquad \mu_{ijt} = \alpha_0 + \mathbf{X}_{ijt} \boldsymbol{\beta} + u_j + v_k$

with appropriate weak priors and MCMC inference.

Turn Angle Modeling

Turn angle (the change in direction between successive steps) is modeled with a von Mises distribution, employing tan-half link for the mean and log-link for concentration, conditioned on step length and prior turn angle to capture coupled movement dynamics and directional persistence.

Formally:

$\varphi_{ijt} \sim \mathrm{vonMises}(\mu_{ijt}, \kappa_{ijt})$

with random effect $w_j$ for each player reflecting individual change-of-direction variability.

Model Fitting and Assessments

Bayesian inference is performed using Stan and brms with weakly informative priors and extensive MCMC diagnostics. Posterior predictive checks demonstrate the superiority of the proposed transformation for step length over conventional distributions.

Simulation and Hypothetical Evaluation

Posterior predictive simulation generates distributions of hypothetical steps at each frame, conditioned on the observed spatial environment and covariates. The simulation framework synthesizes concepts from step selection analysis, common in animal movement literature, and ghosting in sports analytics. For each time point, multiple hypothetical steps are sampled for a prototypical ball carrier, holding all other players' positions fixed.

These simulated steps are mapped to hypothetical future player positions, and spatial features are re-extracted.

Player performance is evaluated by comparing observed movement at each frame to the simulated alternatives using a CatBoost regression model for expected yards gained—trained with full spatial features and cross-validation to optimize predictive RMSE. Key evaluation metrics include:

$\delta_{ijt}^{(h)} = \hat \ell_{ijt} - \hat \ell_{ijt}^{(h)}$ : difference between observed and hypothetical expected yards per frame.
$\bar{\delta}_{ijt}$ : average difference over all hypothetical steps at a frame.
Aggregates over frames provide per-play summaries.

Results: Player Ratings and Performance Metrics

Posterior distributions of player-specific concentration random effects reveal substantial heterogeneity in directional variability among NFL running backs, with the highest concentration observed for runners like Jonathan Taylor, consistent with scouting reports indicating linear, speed-oriented running styles. Joint posterior means for step length and turn angle concentration illuminate diverse movement profiles—some runners excel with longer strides and low-turn variability, others with shorter steps and greater angular flexibility.

Hypothetical evaluation at play and season level enables comparison between observed player performance and simulated baselines. For example, yards success rate (fraction of simulations where the observed movement yields higher expected yards than hypothetical alternatives) and explosiveness (probability observed yards exceed the 95th percentile of hypothetical distributions) are constructed. Notably, these metrics provide strong discriminative power that aligns with qualitative assessments of player style and impact; the authors identify Josh Jacobs and Travis Etienne as leaders in these measures, consistent with known running profiles.

Practical and Theoretical Implications

The step-and-turn Bayesian generative framework advances movement modeling in sports analytics by:

Providing multilevel structure with explicit uncertainty propagation, crucial for robust hypothetical comparisons.
Allowing microanalysis of in-play movement and macroaggregation into performance metrics.
Enabling flexible player evaluation across positions, contexts, and time horizons.
Facilitating potential downstream integration with expected points and win probability models for more granular game valuation.

The approach is directly extensible to other sports with high-frequency tracking data (e.g., soccer, basketball, hockey) through modular substitution of value functions and movement context. The framework’s decomposability paves the way for integrating discrete action selection models, iterative trajectory forward simulation, and fully multi-agent simulation (contingent on modeling all on-field player movements and stochastic play termination events).

Limitations and Future Directions

Current limitations include simplified fixed-effects structure for covariates, simulation of only the ball carrier (with other player trajectories held fixed), and evaluation restricted to one-step ahead rather than full-play trajectory simulation. Key challenges remain in forward simulation (requiring well-calibrated tackle models and multi-player dynamics) and richer feature extraction to encapsulate complex spatial relationships.

Future research should focus on:

Expanding covariate modeling via flexible ML or hierarchical architectures.
Implementing full trajectory simulation with probabilistic play termination.
Incorporating multi-agent models for coordinated team movement.
Extending to action selection and decision-making contexts in other sports.

Conclusion

This paper introduces a robust Bayesian multilevel generative approach for frame-level player movement evaluation in American football, synthesizing step-and-turn modeling, posterior predictive simulation, and hypothetical baseline comparison. Empirical evidence highlights substantial player heterogeneity and strong discriminative ability of the proposed metrics. The framework provides a template for future sports analytics research, enabling granular, uncertainty-aware evaluation of individual and team performance in spatially dynamic, multi-agent environments.

Markdown Report Issue

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

What is this paper about?

This paper shows a new way to judge how well a football player moves during a play, moment by moment, using GPS-like tracking data from NFL games. The authors build computer models that learn how a running back usually moves from one tiny time frame to the next. Then, at every instant of a real play, they generate many “what-if” versions of the next step (like a set of ghost players) and compare what the player actually did to these realistic alternatives.

What questions were they trying to answer?

Can we fairly evaluate a ball carrier’s movement at every moment of a play by comparing it to many reasonable, alternative moves they could have made?
Can we model player movement in a simple, natural way (like “how far did they step” and “how much did they turn”) that works well with tracking data?
Can these models tell us useful things about different players—like who takes longer strides or who changes direction more?

How did they do it?

The data (what they looked at)

They used NFL “player tracking” data, which records everyone’s position 10 times per second. They focused on run plays from the first 9 weeks of the 2022 season—about 5,400 plays. For each play, they looked closely at the frames from the handoff to when the runner was tackled, went out of bounds, or scored.

The movement recipe: “step and turn”

To describe movement in simple terms, they broke it into two parts—think of how you walk:

Step length: how far you move between two frames (like the length of a single stride).
Turn angle: how much you change direction between one step and the next (like turning your shoulders left or right).

This “step-and-turn” idea is common in animal movement studies and works well here, too.

The models (how the computer learns)

They used a Bayesian multilevel approach. In everyday language:

Bayesian means the model doesn’t just give one answer; it also tells you how uncertain it is, which is important because sports are unpredictable.
Multilevel means it learns overall patterns (what most players usually do) and also player-specific tendencies (how a particular running back tends to move).

They built:

A step-length model that predicts how far the runner will move next, considering the runner’s speed, the nearest defenders, and where everyone is.
A turn-angle model that predicts how much the runner will turn next. It also considers the just-predicted step length because long strides and sharp turns usually don’t happen together.

Both models include “player effects,” so they can learn, for example, that one back tends to take long, straight steps while another makes shorter, quicker cuts.

Generating “ghost” steps (the what-if comparison)

At each frame in a play, they:

Used the models to simulate many plausible next steps for an average-level player in the same on-field situation.
Kept all other players fixed (only the ball carrier’s next step changes) to make a fair comparison.
Converted each simulated step into a new location on the field.

Turning movement into value (does it help the team?)

To judge if a step was good, they predicted how many yards the team was likely to gain from that location onward. They trained a machine-learning model (CatBoost) that takes positions and movements of all players and predicts expected yards gained.

For each frame, they:

Predicted yards gained from the real step.
Predicted yards gained from each ghost step.
Compared the real step to the distribution of ghost steps. If the real step’s expected yards were higher than most ghosts, that was a positive sign.

What did they find?

The step-and-turn models produced realistic, varied “ghost” steps, giving a fair baseline to compare against at every moment of a play.
They could measure player tendencies:
- Some running backs (like Jonathan Taylor) tend to take longer, straighter strides with less turning—more “power and straight ahead.”
- Others (like Christian McCaffrey) tend to take shorter steps with more directional variability—more “shifty and agile.”
In an example play with Javonte Williams, at the moment of first contact, his real step led to higher expected yards than many of the ghost steps. That means, at that instant, his choice looked better than a typical alternative.
The approach turns complicated movement data into simple numbers—like “how much better than average was that step?”—that can be tracked over frames, plays, and games.

Why does this matter?

Better player evaluation: Coaches and analysts can see not just what happened, but how good each tiny decision and movement was compared to realistic alternatives.
Player development: Strengths and habits (like “long, straight strides” vs. “quick direction changes”) can be measured and trained.
Smarter strategy: Teams can identify when certain movement styles work best against certain defenses or situations.
Clearer communication: Translating movement into expected yards makes it easier to explain value to coaches, players, and fans.
Beyond football: The same step-and-turn plus ghosting idea can be adapted to other sports like basketball or soccer, where positioning and movement shape the game.

In short, the paper introduces a simple-to-understand movement language (“step” and “turn”), uses it to generate fair, uncertain-aware “ghost” comparisons at every moment, and turns those comparisons into a clear measure of value (expected yards). This helps everyone see not just what players did, but how good those choices were.

View Paper Prompt View All Prompts

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a concise, actionable list of the main uncertainties and unexplored areas left by the paper that future research could address.

One-step, single-agent simulations only: the framework simulates only the next ball-carrier step while holding all other players fixed, precluding multi-agent feedback and longer-horizon trajectory generation; developing stable, iterative, frame-by-frame joint simulation for all 22 players remains open.
Counterfactual realism: because defenders and blockers do not react to the simulated step, the hypothetical evaluation may be biased; methods for modeling opponent/teammate responses (e.g., coupled generative models, game-theoretic dynamics, or learned reaction policies) are needed.
Baseline choice and interpretation: simulated “average player” baselines draw random effects from N(0, τ²); alternative baselines (role-/scheme-/situation-conditioned, player-specific “self-ghosts,” opponent-conditioned) and their impact on evaluation are not explored.
Dependence structure between step length and turn angle: the models are conditionally linked only through κ(s) in the turn-angle model; joint circular-linear models (e.g., bivariate distributions, copulas, or shared latent states) could capture residual dependence and improve generative fidelity.
Temporal dynamics beyond first-order: both models include only one-lag memory (previous step/turn); longer-memory structures, switching regimes (e.g., congested vs open-field), or hidden Markov/state-space formulations are not investigated.
Feature parsimony limits: the movement models use a simplified covariate set (ball carrier, nearest defender, coarse left/right–front/back counts); richer interactions (multiple nearest defenders/teammates, blocking angles, pursuit angles, local congestion fields, flow/optical crowding metrics) are not incorporated.
Context omission: formation, run concept, gap assignments, down/distance, score/time, field surface, weather, and sideline proximity are not modeled and could confound movement behavior and evaluation.
Defensive and offensive unit effects: only a defensive-team random intercept is included for step-length mean; varying effects for offensive team/unit, play/drive/game, coaching scheme, and opponent-specific interactions are not modeled.
Random-effects structure: only random intercepts are used; random slopes (e.g., player-specific responses to congestion or speed) could capture heterogeneous decision rules but are untested.
Turn-angle distributional flexibility: the von Mises model is unimodal; mixture or asymmetric circular distributions could capture left/right turn modes and skewness not addressed by a single-component von Mises.
Kinematic/biomechanical constraints: explicit constraints linking speed, acceleration, and feasible curvature (e.g., speed–turn tradeoffs, maximum angular velocity) are not modeled; current models may permit physically implausible steps.
Field and collision constraints: simulations do not explicitly prevent steps that cross out-of-bounds or intrude through player bodies; incorporating boundary-aware and collision-avoidance constraints is unaddressed.
Step-length transformation and scaling: the scaled–arcsine transform requires normalization to [0,1], but the normalization scheme (global vs context-specific) is unspecified; portability across datasets/seasons and sensitivity to scaling choices remain unclear.
Response-distribution alternatives for step length: the paper dismisses Gamma/log-normal/Weibull but does not test mixtures, zero-inflation near very small steps, or heteroskedastic error models that might fit without ad hoc transforms.
Measurement error and smoothing: tracking noise at 10 Hz and jitter in orientation/bearing are not modeled; smoothing or measurement-error models to de-noise step/turn estimates are not considered.
Truncation near boundaries: the support of feasible step/turn combinations near the sideline or end line is effectively truncated, but models do not account for truncation or censoring.
Validation of generative realism: beyond supplementary assessments, there is no thorough posterior predictive checking of step/turn distributions conditioned on context, nor validation that simulated local steps align with observed micro-trajectories.
Sensitivity to the number of simulations (H): the stability of evaluation metrics as a function of H and the Monte Carlo error propagation to player rankings are not examined.
Yards-gained model uncertainty: the CatBoost model provides point predictions only; uncertainty in yards-gained predictions is not propagated into the evaluation (δ), breaking full Bayesian accounting.
Potential covariate shift: the yards-gained model is trained on observed states but is applied to perturbed hypothetical locations; robustness to local covariate shifts is untested.
Choice of valuation target: evaluation is limited to expected yards gained; integration with expected points or win probability (and their uncertainties) is not demonstrated.
Joint modeling of movement and outcomes: the evaluation decouples movement generation from play outcomes; integrated models that jointly learn movement and value (e.g., micro/macro transition frameworks) are not explored.
Generalization beyond RB rushing: applicability to scrambles, run-pass options, after-catch movement, defensive pursuit, and other positions/sports is proposed but not empirically validated.
Player-usage confounding: interpreting w_j (turn-angle concentration) as “change-of-direction variability” may conflate skill with role/usage/context; causal disentanglement or adjustment strategies are not provided.
Cross-season and sample representativeness: models are trained on weeks 1–9 of 2022 (Big Data Bowl subsample); robustness across full seasons, multiple years, and non-BDB datasets is untested.
Computational scalability: MCMC with brms on frame-level data could strain resources at league scale; approximate inference (e.g., variational, INLA), parallelization, and runtime benchmarks are not discussed.
Priors and sensitivity: prior choices (e.g., for κ’s log-link) and sensitivity analyses are not reported; robustness to priors and hyperparameters remains unknown.
Reproducibility and deployment: code, model weights, and practical guidance for integrating the framework with team pipelines are not provided; reproducibility and operationalization questions remain open.
Ethical and interpretive risks: using simulated baselines may influence player evaluation and contracts; guidance on uncertainty communication and decision thresholds is absent.

View Paper Prompt View All Prompts

Practical Applications

Immediate Applications

Below are actionable uses you can deploy today with the paper’s models and simulation workflow, assuming access to player-tracking data and standard analytics infrastructure.

Player movement profiling for scouting and roster decisions (sector: sports/finance)
- What it delivers: Stable player “movement fingerprints” via random effects (e.g., longer/shorter typical step length; higher/lower turn-angle variability) to compare RBs across the league and over time.
- Tools/products/workflows: Scouting dashboards that rank players on step-length and turn-angle concentration; automated scouting reports; alerts for profile drift across weeks.
- Assumptions/dependencies: Availability of high-frequency tracking data and event tags; profiles are partly skill and partly scheme/context—interpretation must consider team usage; models require periodic recalibration to new seasons and contexts.
Play design review and film-room diagnostics using local hypothetical alternatives (sector: coaching/strategy)
- What it delivers: Frame-by-frame “what-if” evaluation of a ball carrier’s observed step versus a distribution of simulated next steps, summarized as expected yards gained deltas at decision points (handoff, first contact, cut attempts).
- Tools/products/workflows: Video-analysis plugins that overlay ghosted step distributions and show “value of the chosen step” vs. baseline; post-game cut-up workflows to grade pathing and hole selection by situation.
- Assumptions/dependencies: The evaluation hinges on a calibrated yards-gained model and keeps all other players fixed; applies best to post-hoc analysis rather than real time.
Opponent scouting and game planning (sector: coaching/analytics)
- What it delivers: Defensive plans keyed to ball carriers’ directional variability; identification of run fits and leverage strategies exploiting runners with low/high turn-angle variability; offensive play calls targeted to RB profiles.
- Tools/products/workflows: Scouting briefs mapping RB profiles to gap assignments, contain strategies, and preferred fronts; game-plan matrices linking player movement traits to play families.
- Assumptions/dependencies: Movement profiles are influenced by OL performance and scheme; need contextual splits (e.g., personnel, formation, motion) to avoid overgeneralization.
Broadcast and fan engagement “ghost distributions” (sector: media/broadcast)
- What it delivers: On-air or digital replays that show the observed step vs. a distribution of plausible alternatives and the corresponding yards-gained distribution; clearer storytelling around “how much was on the table.”
- Tools/products/workflows: AR overlays displaying step clouds and median/quantiles of hypothetical yards; interactive web clips for social and OTT platforms.
- Assumptions/dependencies: Rights to use tracking data; latency is acceptable for near-live or post-play segments.
Sports betting model enhancement with micro-value features (sector: finance/gaming)
- What it delivers: Features like per-frame expected yards delta and player movement profiles for pricing same-game props and live markets; scenario testing for in-play hedging.
- Tools/products/workflows: Feature pipelines that feed micro-movement signals into existing pricing engines; post-event model diagnostics using ghost baselines.
- Assumptions/dependencies: Timely access to tracking or derived data; regulatory compliance; careful avoidance of overfitting to limited sample windows.
Player development and sports performance KPIs (sector: healthcare/performance)
- What it delivers: Training targets to modify step-length tendencies or increase/decrease directional variability based on role; drills and progress monitoring aligned to on-field movement outcomes.
- Tools/products/workflows: Practice-tracking reports using wearables to mirror step-and-turn metrics; individualized COD (change of direction) progression plans; pre/post intervention comparisons.
- Assumptions/dependencies: Training and game-tracking comparability; differences in surface, footwear, and loads can change metrics; integration with strength and conditioning context.
Academic instruction and reproducible research templates (sector: academia/education)
- What it delivers: A ready case study in Bayesian multilevel and circular models with posterior predictive simulation; an exemplar for step-selection-style analyses in human movement contexts.
- Tools/products/workflows: Course modules, notebooks, and labs teaching brms/Stan modeling and posterior predictive checks on tracking data.
- Assumptions/dependencies: Access to public samples (e.g., Big Data Bowl datasets); ethical use and data-sharing constraints.
Analytics vendor API for next-step ghosting and valuation (sector: software/services)
- What it delivers: A service that returns simulated next steps and expected-yard deltas for any frame; plug-in for team platforms and media partners.
- Tools/products/workflows: Batch endpoints, SDKs, and dashboards; automated job queues to process games overnight.
- Assumptions/dependencies: MCMC compute resources; model versioning and periodic recalibration; SLAs aligned to non-real-time use.

Long-Term Applications

These opportunities require additional modeling, data access, or engineering—e.g., multi-agent simulation, real-time inference, or broader validation.

Full multi-agent forward simulation for counterfactual replays (sector: sports analytics/software)
- What it could deliver: Iterative, frame-by-frame simulation that updates all 22 players jointly to recreate entire alternative plays; testing play designs and assignments end-to-end.
- Tools/products/workflows: Generative multi-agent simulators; coach-facing “re-script the play” tools; scenario engines for playbook QA.
- Assumptions/dependencies: Robust co-evolution models for all positions; calibration and identifiability; significant compute; validation against out-of-sample plays.
Real-time in-game decision support (sector: coaching/edge compute)
- What it could deliver: On-sideline suggestions of high-value lanes or cut angles with sub-second latency; adaptive play-calling based on opponent reaction profiles.
- Tools/products/workflows: Edge inference on dedicated hardware; low-latency data feeds; human-in-the-loop interfaces.
- Assumptions/dependencies: League rules on in-game tech; latency and reliability; ergonomic UI; minimizing distraction and information overload.
Injury risk and safety analytics from micro-movements (sector: healthcare/policy)
- What it could deliver: Links between extreme COD loads, step-length spikes, and soft-tissue injury risk; insights around first-contact mechanics and collision exposure.
- Tools/products/workflows: Integrated models combining step-and-turn metrics with workload, surface, and prior injury; medical dashboards; practice guidelines.
- Assumptions/dependencies: High-quality injury labels; causal inference beyond correlation; confounding from scheme and usage; ethics and privacy.
Cross-sport toolkits for basketball, soccer, and hockey (sector: sports/software)
- What it could deliver: Generalized step-and-turn or drive-and-turn kits for ball/possession carriers; instant evaluation of on-ball actions with sport-specific valuation functions (EPV, xT, xG).
- Tools/products/workflows: Sport-specific SDKs; plug-ins for standard tracking vendors; pre-trained models with fine-tuning.
- Assumptions/dependencies: Access to sport-specific tracking covariates; redefinition of valuation targets; variable sampling rates and coordinate systems.
Contract valuation and WAR-like metrics incorporating hypothetical evaluation (sector: sports finance/front office)
- What it could deliver: Player value added above a simulated baseline, aggregated across frames/plays and mapped to wins and dollars; robustness checks against context bias.
- Tools/products/workflows: Season-long longitudinal models; market-to-performance conversion; arbitration and extension support tools.
- Assumptions/dependencies: Stable linkage from micro-value to team wins; multi-year consistency; handling scheme and teammate effects.
Robotics and autonomous-agent navigation inspired by human evasive movement (sector: robotics/AI)
- What it could deliver: Path planners that incorporate step-length/turn-angle distributions to navigate dynamic, adversarial environments; improved micro-maneuver realism.
- Tools/products/workflows: Simulation-to-real pipelines; RL agents trained on posterior predictive “human-like” baselines.
- Assumptions/dependencies: Domain adaptation from sports fields to real-world clutter; safety assurances; sensing and localization fidelity.
Public safety and crowd movement modeling (sector: policy/urban planning)
- What it could deliver: Micro-level step-and-turn analogs for pedestrian flows to stress-test evacuation plans; uncertainty-aware ghost baselines for route choices.
- Tools/products/workflows: Agent-based models for facilities and events; policy simulations under various layouts and guidance cues.
- Assumptions/dependencies: Availability of fine-grained crowd tracking; transferability from athletic to pedestrian contexts; ethics and privacy constraints.
Integrity monitoring and market surveillance (sector: policy/regulation)
- What it could deliver: Anomaly detection by comparing observed movement against predicted distributions to flag atypical patterns for review (e.g., suspicious non-effort).
- Tools/products/workflows: Compliance dashboards; forensic analytics post-event.
- Assumptions/dependencies: Access to raw or derived tracking metrics; strict false-positive controls; governance and due process.
AR training and on-field feedback systems (sector: hardware/software/performance)
- What it could deliver: Heads-up visualization of higher-value lanes and “ghost lanes” during controlled drills; individualized cueing to shape movement tendencies.
- Tools/products/workflows: Wearable AR with precise localization; drill scripting engines.
- Assumptions/dependencies: Near-centimeter tracking in practice; safety considerations; cognitive load management.
Open benchmarks and MOOCs for Bayesian movement modeling (sector: academia/education)
- What it could deliver: Standardized datasets and leaderboards for step-and-turn modeling; open curricula for multilevel and circular modeling in sports.
- Tools/products/workflows: Data repositories, notebooks, grading rubrics, and competitions.
- Assumptions/dependencies: Data licensing and anonymization; community governance.

Cross-cutting assumptions and dependencies

Data access and quality: Requires high-frequency, labeled tracking data (10 Hz), accurate event tags, and consistent field standardization.
Model scope: Current simulations are single-step and hold other players fixed; full multi-agent dynamics need additional modeling.
Valuation dependence: Hypothetical evaluation relies on a well-calibrated yards-gained (or EP/WP) model; miscalibration propagates to evaluations.
Computation and MLOps: Bayesian MCMC (via brms/Stan) entails non-trivial compute; productionization needs model versioning, monitoring, and retraining workflows.
Context and fairness: Movement metrics are confounded by scheme, OL quality, and opponent strength; use context-aware splits and hierarchical controls to mitigate bias.
Governance and ethics: Privacy, licensing, and competitive integrity concerns govern sharing and use of tracking-derived analyses.

View Paper Prompt View All Prompts

Glossary

Anchoring strategy: A feature-engineering method that uses a focal entity (e.g., the ball carrier) as a reference and orders other agents by their distance to it. "We use the anchoring strategy as described in \citet{horton2020learning} and \citet{yurko2020going} to derive the dynamic within-play features of interest."
Bayesian multilevel models: Hierarchical Bayesian models that include parameters at multiple levels (e.g., player and team) to capture group-level variability. "Specifically, we develop Bayesian multilevel models for frame-level player movement"
Bearing angle: The direction of movement relative to the field axes, computed from displacement vectors. "the bearing angle $b_t \in [-\pi, \pi]$ is defined as $b_t = \text{atan}2 (y_{t+1} - y_t, x_{t+1} - x_t)$ "
Big Data Bowl: The NFL’s annual public analytics competition that releases tracking data for research. "analytics competition known as the Big Data Bowl"
brms: An R package that interfaces with Stan for fitting Bayesian regression models using high-level formula syntax. "via the brms package in R"
CatBoost: A gradient boosting machine learning library particularly effective with categorical features. "we train a gradient boosting model using the CatBoost library"
Completion probability model: A predictive model estimating the probability that a pass is completed. "and perform evaluation using a completion probability model."
Conditional density estimation: Methods for estimating the probability distribution of a response given covariates. "Using conditional density estimation, this work evaluates defensive pass coverage"
Credible interval: A Bayesian interval estimate that contains the parameter with a specified posterior probability. "For each player, the posterior mean and corresponding 95\% credible interval are depicted."
Deep latent variable models: Generative models with latent (hidden) variables used for complex sequence or trajectory prediction. "\citet{felsen2018where}, \citet{gu2023deep}, and \citet{fassmeyer2025interactive} use deep latent variable models to forecast multi-agent movements in team sports."
Directional persistence: The tendency of an agent to continue turning in the same direction across successive time steps. "This captures the notion of directional persistence, reflecting a player's tendency to make consecutive turns in a similar direction."
Effective sample size: An MCMC diagnostic indicating the amount of independent information in correlated samples. "We also observe no problematic effective sample size for each parameter"
Expected points: An expected value metric mapping game state to the average points a team can expect to score. "expected points—a commonly-used, interpretable utility function for play valuation and in-game decision making in the NFL"
Expected possession value: The expected value of a possession at a given instant, integrating over future outcomes. "to estimate expected possession value."
Expected yards gained: The expected number of yards a ball carrier will gain from a given frame based on current spatial context. "we illustrate using the expected yards gained by ball carriers at different time points within a play."
Ghosting: A technique that compares observed player positioning to a baseline “ghost” trajectory drawn from typical or role-based behavior. "commonly known as ghosting."
Gradient boosting: An ensemble learning method that builds predictive models by sequentially adding trees to correct prior errors. "we train a gradient boosting model"
Imitation learning: Learning policies by mimicking expert behavior from trajectory data. "a coordinated multi-agent imitation learning framework"
Macrotransition model: A higher-level model capturing discrete possession events (e.g., passes, shots) over coarser time scales. "a macrotransition model for possession-level events like passes, shots, and turnovers."
Markov chain Monte Carlo (MCMC): A class of algorithms that sample from complex posterior distributions via Markov chains. "posterior distributions are estimated using Markov chain Monte Carlo (MCMC)"
Microtransition movement model: A fine-grained model capturing frame-to-frame player motions within a possession. "a microtransition movement model for all players"
Next Gen Stats: The NFL’s player and ball tracking system using embedded sensors to capture spatiotemporal data. "Next Gen Stats system"
No-U-Turn Sampler: An adaptive Hamiltonian Monte Carlo variant that avoids inefficient backtracking. "through a no-U-turn sampler"
Posterior mean: The average of a parameter’s posterior distribution, used as a point estimate. "posterior mean estimates"
Posterior predictive distributions: The distributions of new data obtained by integrating the likelihood over the posterior of the parameters. "we sample from the posterior predictive distributions"
Posterior predictive simulation: Generating simulated outcomes from the posterior predictive distribution to assess model implications. "we perform posterior predictive simulation to generate hypothetical ball carrier steps"
Random effects: Model terms representing group-specific deviations from population-level effects. "random effects for race context, jockeys, and horses."
R-hat statistic: A convergence diagnostic comparing within- and between-chain variance in MCMC. " $\hat R$ values close to 1"
Role-conditioned ghosts: Baseline trajectories conditioned on a player’s tactical role and context. "role-conditioned ghosts"
Scaled arcsine transformation: A transformation mapping bounded data to the real line via normalization and arcsine function. "we choose a scaled arcsine transformation."
Step-and-turn models: Movement models that decompose motion into step lengths and turn angles at each frame. "Using the step-and-turn models, we perform posterior predictive simulation"
Step length: The distance traveled between successive frames or locations. "step length (distance between successive locations)"
Step selection analysis: A framework comparing observed steps to available alternatives to infer movement preferences. "We adopt a step selection analysis perspective"
Stochastic process model: A probabilistic model describing system evolution over time with inherent randomness. "propose a stochastic process model for the evolution of a basketball possession"
Tan-half link function: A link function mapping real-valued predictors to circular means by transforming angles via tangent half-angle. "we use a tan-half link function"
Turn angle: The change in movement direction between successive steps. "turn angle (change in direction between successive steps)"
Von Mises distribution: A circular probability distribution used for modeling angles. "we use a von Mises response distribution"
Win probability: The probability that a team will win from a given game state. "expected points and win probability models"
Yards gained model: A predictive model estimating yards gained (and thus ending field position) from current frame features. "many NFL teams and vendors have their own version of the yards gained model"

Bayesian multilevel step-and-turn models for evaluating player movement in American football

Summary

Bayesian Multilevel Step-and-Turn Models for Evaluating Player Movement in American Football

Overview and Motivation

Data Processing and Feature Construction

Generative Movement Models

Step Length Modeling

Turn Angle Modeling

Model Fitting and Assessments

Simulation and Hypothetical Evaluation

Results: Player Ratings and Performance Metrics

Practical and Theoretical Implications

Limitations and Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

What questions were they trying to answer?

How did they do it?

The data (what they looked at)

The movement recipe: “step and turn”

The models (how the computer learns)

Generating “ghost” steps (the what-if comparison)

Turning movement into value (does it help the team?)

What did they find?

Why does this matter?

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Practical Applications

Immediate Applications

Long-Term Applications

Cross-cutting assumptions and dependencies

Glossary

Open Problems

Continue Learning

Collections

Tweets