Generalization of V-JEPA 2 Action Anticipation Beyond Kitchen Environments
Determine how well the V-JEPA 2 model for human action anticipation, evaluated in this paper on the Epic-Kitchens-100 benchmark, generalizes to environments outside kitchens by assessing performance on datasets drawn from non-kitchen domains and reporting comparable metrics (e.g., mean-class recall-at-5 for verb, noun, and action).
References
Third, the EK100 benchmark is limited to kitchen environments, with a closed well-defined vocabulary, and we do not know how well V-JEPA 2 generalizes to other environments.
— V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
(2506.09985 - Assran et al., 11 Jun 2025) in Section: Prediction: Probe-based Action Anticipation, Limitations