GaitSTR: Gait Recognition with Sequential Two-stream Refinement

Published 2 Apr 2024 in cs.CV | (2404.02345v1)

Abstract: Gait recognition aims to identify a person based on their walking sequences, serving as a useful biometric modality as it can be observed from long distances without requiring cooperation from the subject. In representing a person's walking sequence, silhouettes and skeletons are the two primary modalities used. Silhouette sequences lack detailed part information when overlapping occurs between different body segments and are affected by carried objects and clothing. Skeletons, comprising joints and bones connecting the joints, provide more accurate part information for different segments; however, they are sensitive to occlusions and low-quality images, causing inconsistencies in frame-wise results within a sequence. In this paper, we explore the use of a two-stream representation of skeletons for gait recognition, alongside silhouettes. By fusing the combined data of silhouettes and skeletons, we refine the two-stream skeletons, joints, and bones through self-correction in graph convolution, along with cross-modal correction with temporal consistency from silhouettes. We demonstrate that with refined skeletons, the performance of the gait recognition model can achieve further improvement on public gait recognition datasets compared with state-of-the-art methods without extra annotations.

Abstract PDF HTML Upgrade to Chat

References (58)

Citations (4)

View on Semantic Scholar

Summary

The paper introduces a novel two-stream framework that refines skeleton data using silhouette integration to significantly boost gait recognition accuracy.
It employs intra-modal and inter-modal refinements to correct jittery joint and bone positions, ensuring temporal coherence across frames.
Experimental results on CASIA-B and OUMVLP datasets demonstrate notable Rank-1 improvements, highlighting its potential in robust biometric systems.

The paper "GaitSTR: Gait Recognition with Sequential Two-stream Refinement" provides a novel framework for gait recognition, a biometric modality that identifies individuals based on their walking patterns. This framework employs a two-stream representation combining skeletons and silhouettes, significantly improving on previous state-of-the-art methodologies by integrating structural corrections and temporal consistencies. This essay covers the methodology, technical advancements, and implications of the GaitSTR framework for gait recognition.

Methodology Overview

GaitSTR introduces a sophisticated model for improving gait recognition accuracy through enhanced skeleton representation and integration with silhouettes. The primary innovation lies in refining the two-stream skeletons comprising joints and bones, subsequently fusing them with silhouettes for robust gait recognition.

Figure 1: Visualization of the (a) silhouette and (b) skeleton sequence used for gait recognition. Silhouettes show different contours with different clothes and carried-on objects, while the skeletons suffer from jittery detection results in the video.

The framework's architecture comprises several critical components:

Skeleton Correction Network: This module is responsible for rectifying inconsistencies and jitteriness in skeleton joint and bone positioning.
Cross-Modal Adapter (CMA): This component enhances coordination and information flow between skeletons and silhouettes, facilitating cross-modal feature integration and refinement.

The skeleton correction process is optimized by two refinements:

Intra-Modal Refinement: Corrects joint and bone discrepancies using internal multi-layer feature aggregation.
Inter-Modal Refinement: Utilizes silhouette features to further correct skeleton inconsistencies, ensuring temporal consistency and coherence across frames.
Figure 2: Our proposed architecture for GaitSTR. Trapezoids consists of trainable modules, and modules of the same color and fill-in patterns in the same model share the weights.

Implementation and Experimental Results

The implementation leveraged public datasets, including CASIA-B and OUMVLP, to benchmark GaitSTR's gait recognition capabilities. The framework's ability to accurately recognize individuals based on walking patterns surpasses prior models, as demonstrated by results on these datasets. Key metrics such as Rank-1 accuracy illustrate significant improvements, particularly in scenarios affected by factors like occlusions and varying viewpoints.

Datasets:
- CASIA-B: GaitSTR shows improvements in recognition rates by refining skeleton sequences subjected to normal (NM), carrying (BG), and clothing variation (CL) conditions.
- OUMVLP: The large-scale evaluation further validates the robustness of GaitSTR, achieving higher average recognition scores compared to existing methods.
- Figure 3: Architecture of the skeleton correction network. F_J and F_B represent the joint and bone frame-wise features encoded from J (joints) and B (bones), respectively.

Trade-offs and Considerations

The integration of skeleton refinement complicates the processing pipeline but is offset by increased recognition accuracy and robustness against environmental variations. Potential limitations include additional computational overhead due to the two-stream architecture and potential susceptibility to errors in silhouette extraction, which could affect skeleton refinement accuracy.

Implications and Future Directions

GaitSTR introduces a refined method for processing and recognizing human gait patterns, paving the way for more accurate and reliable biometric systems suitable for high-security environments. Future work could explore the integration of this framework with other biometric modalities, such as facial recognition, to further enhance identity verification systems.

Figure 4: Visualization of successful and failed refined skeletons with GaitSTR. For each example, from left to right, we have original skeletons, refined skeletons and its neighbor frames.

GaitSTR's innovative cross-modal approach to gait recognition represents a significant step forward in the application of deep learning techniques to biometrics, offering a robust solution to identity recognition challenges involving diverse environmental factors and occlusions.

Conclusion

In conclusion, the GaitSTR framework presents a significant advancement in gait recognition technology, demonstrating the viability and effectiveness of integrating skeletal and silhouette data. By refining skeleton predictions using silhouettes and introducing a cross-modal refinement strategy, GaitSTR sets a new standard for gait recognition systems, making it a pivotal contribution to the field of biometric identity recognition.