Automatic Edge Error Judgment in Figure Skating Using 3D Pose Estimation from a Monocular Camera and IMUs

Published 26 Oct 2023 in cs.MM | (2310.17193v1)

Abstract: Automatic evaluating systems are fundamental issues in sports technologies. In many sports, such as figure skating, automated evaluating methods based on pose estimation have been proposed. However, previous studies have evaluated skaters' skills in 2D analysis. In this paper, we propose an automatic edge error judgment system with a monocular smartphone camera and inertial sensors, which enable us to analyze 3D motions. Edge error is one of the most significant scoring items and is challenging to automatically judge due to its 3D motion. The results show that the model using 3D joint position coordinates estimated from the monocular camera as the input feature had the highest accuracy at 83% for unknown skaters' data. We also analyzed the detailed motion analysis for edge error judgment. These results indicate that the monocular camera can be used to judge edge errors automatically. We will provide the figure skating single Lutz jump dataset, including pre-processed videos and labels, at https://github.com/ryota-takedalab/JudgeAI-LutzEdge.

Abstract PDF Upgrade to Chat

Citations (3)

View on Semantic Scholar

Summary

The paper introduces an automated edge error judgment system using supervised learning based on 3D pose estimation.
It compares data from a monocular smartphone camera and IMUs, with the camera approach achieving an average accuracy of 83.56% and F-measure of 81.01%.
The study finds that downsampling to 12 fps enhances performance and highlights the left foot’s 3D joint position as a key indicator of edge errors.

This paper (2310.17193) addresses the challenge of automatically judging edge errors in figure skating, specifically for the single Lutz jump. Edge errors, which relate to the correct use of the skate blade's inside or outside edge during take-off, are a significant scoring item but are difficult to judge accurately, even visually, due to the requirement for analyzing subtle 3D movements. Traditional automatic systems in figure skating have often relied on 2D analysis or focused on other aspects like jump classification or under-rotation detection.

The authors propose an automatic edge error judgment system using supervised learning classification based on 3D pose estimation. They compare two different data sources for 3D pose: a monocular smartphone camera and Inertial Measurement Units (IMUs). The goal is to determine which method is more effective for this specific judgment task and to analyze the key pose features contributing to the decision.

The proposed framework involves collecting synchronized video data from a monocular smartphone camera and 3D motion data from IMUs attached to the skater.

Data Collection:
- Monocular Camera: An iPhone 13 is used to capture videos at 240 fps.
- IMUs: Perception Neuron 3 sensors (17 attached to the body) capture 3D joint positions and Euler angles at 60 fps.
3D Pose Estimation and Feature Creation:
- From Monocular Video: The system automatically detects and tracks the skater using YOLOv3 (Redmon et al., 2018) and SORT (Bewley et al., 2016). Jump sequences are automatically identified and trimmed based on bounding box velocity changes. StridedTransformer-Pose3D (Li et al., 2021), trained on Human3.6M [human3.6M], is then used to estimate 17 3D joint positions from the trimmed video frames.
- From IMUs: The PN3 data is processed using the vendor's Axis Studio software to obtain 3D joint position coordinates (17 joints) and the pose angle of the left skate blade.
- Feature Processing: For both modalities, data is time-aligned based on the jump take-off moment. 3D joint positions are normalized (hip at origin, feet z-coordinate at ground level). Various input features are created by selecting modalities (camera/IMU), pose data types (joint position/left foot angle/combined), and downsampling rates (12 fps and 60 fps). This results in ten different feature sets for comparison.
Classification:
- Logistic Regression is used as the classification model. This choice is motivated by the small dataset size and the interpretability of regression coefficients for feature importance analysis.
- The model performs binary classification: predicting whether a jump has an "edge error" (positive class) or a "correct edge" (negative class).

Experiments and Results:

The study involved 6 university figure skaters performing single Lutz jumps. A total of 232 valid jump samples (100 edge errors, 132 correct edges), judged by an official referee, were collected.

Player-specific cross-validation (leave-one-skater-out) was used to evaluate the generalization performance of the models to unknown skaters. Accuracy and F-measure were the primary metrics.

The key findings from the experiments are:

The model using 3D joint position coordinates estimated from the monocular camera (StridedTransformer-Pose3D), downsampled to 12 fps, achieved the highest average accuracy of 83.56% and an F-measure of 81.01%.
Features derived from IMUs, including the left skate pose angle which directly corresponds to the judging rule, showed lower accuracy (Joint pos PN3: ~74%, Left foot angle PN3: ~60%). This suggests that 3D joint positions from monocular vision were more effective in capturing the distinguishing features for edge error judgment in this setup.
Downsampling the camera-based 3D pose features from 60 fps to 12 fps slightly improved accuracy, possibly due to preventing overfitting on the limited dataset.
Player-specific cross-validation revealed significant variation in accuracy depending on the skater used for testing (ranging from 68.75% to 100%). This highlights differences in how individual skaters execute edge errors or correct jumps.
Analysis of feature importance in the best-performing camera-based model showed high importance for the left foot's 3D position, suggesting that the relative movement and position of the left foot, even without explicitly using blade angle, are strong indicators captured by the monocular pose estimation.
Visualization of left foot trajectories demonstrated clearer visual separation between error and correct jumps using camera-based estimation compared to IMUs for some skaters, correlating with model performance and feature importance.
Discrepancies were observed between actual blade tilt (visible in video) and the left foot angle reported by IMUs for one skater, suggesting potential limitations or sensitivity to setup/calibration issues with the IMUs in a figure skating context.

Practical Implications and Future Work:

The results indicate that 3D pose estimation from a single smartphone camera can be a practical and sufficient method for automated edge error judgment in figure skating, offering higher convenience and lower equipment costs compared to IMUs or multi-camera systems. This could enable athletes and coaches to use readily available technology for training feedback.

However, the study identified limitations: the dataset is relatively small and collected from a limited number of skaters, leading to variability in cross-validation results. Future work should involve collecting data from more skaters to improve model generalization. While the current best model uses joint positions effectively, explicitly estimating the ice skate's pose angle from monocular video would align more directly with official judging rules and is suggested as a direction for future research. Addressing the observed inconsistencies with IMU data collection is also necessary if that modality is to be reliably used. The authors plan to release the collected dataset to facilitate further research in this area.

Markdown Report Issue