Papers
Topics
Authors
Recent
Search
2000 character limit reached

GaitSTR: Gait Recognition with Sequential Two-stream Refinement

Published 2 Apr 2024 in cs.CV | (2404.02345v1)

Abstract: Gait recognition aims to identify a person based on their walking sequences, serving as a useful biometric modality as it can be observed from long distances without requiring cooperation from the subject. In representing a person's walking sequence, silhouettes and skeletons are the two primary modalities used. Silhouette sequences lack detailed part information when overlapping occurs between different body segments and are affected by carried objects and clothing. Skeletons, comprising joints and bones connecting the joints, provide more accurate part information for different segments; however, they are sensitive to occlusions and low-quality images, causing inconsistencies in frame-wise results within a sequence. In this paper, we explore the use of a two-stream representation of skeletons for gait recognition, alongside silhouettes. By fusing the combined data of silhouettes and skeletons, we refine the two-stream skeletons, joints, and bones through self-correction in graph convolution, along with cross-modal correction with temporal consistency from silhouettes. We demonstrate that with refined skeletons, the performance of the gait recognition model can achieve further improvement on public gait recognition datasets compared with state-of-the-art methods without extra annotations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Y. He, J. Zhang, H. Shan, and L. Wang, “Multi-task gans for view-specific feature learning in gait recognition,” TIFS, vol. 14, no. 1, pp. 102–113, 2018.
  2. C. Song, Y. Huang, Y. Huang, N. Jia, and L. Wang, “Gaitnet: An end-to-end network for gait based human identification,” PR, vol. 96, p. 106988, 2019.
  3. Z. Wu, Y. Huang, L. Wang, X. Wang, and T. Tan, “A comprehensive study on cross-view gait based human identification with deep cnns,” TPAMI, vol. 39, no. 2, pp. 209–226, 2016.
  4. S. Yu, H. Chen, Q. Wang, L. Shen, and Y. Huang, “Invariant feature extraction for gait recognition using only one uniform model,” Neurocomputing, vol. 239, pp. 81–93, 2017.
  5. H. Chao, Y. He, J. Zhang, and J. Feng, “Gaitset: Regarding gait as a set for cross-view gait recognition,” in AAAI, 2019, pp. 8126–8133.
  6. C. Fan, Y. Peng, C. Cao, X. Liu, S. Hou, J. Chi, Y. Huang, Q. Li, and Z. He, “Gaitpart: Temporal part-based model for gait recognition,” in CVPR, 2020, pp. 14 225–14 233.
  7. B. Lin, S. Zhang, and X. Yu, “Gait recognition via effective global-local feature representation and local temporal aggregation,” in ICCV, 2021, pp. 14 648–14 656.
  8. T. Teepe, A. Khan, J. Gilg, F. Herzog, S. Hörmann, and G. Rigoll, “Gaitgraph: Graph convolutional network for skeleton-based gait recognition,” in ICIP, 2021, pp. 2314–2318.
  9. A. Zeng, L. Yang, X. Ju, J. Li, J. Wang, and Q. Xu, “Smoothnet: A plug-and-play network for refining human poses in videos,” ECCV, 2022.
  10. H. Zhu, W. Zheng, Z. Zheng, and R. Nevatia, “Gaitref: Gait recognition with refined sequential skeletons,” IJCB, 2023.
  11. L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Two-stream adaptive graph convolutional networks for skeleton-based action recognition,” in CVPR, 2019, pp. 12 026–12 035.
  12. H. Zhu, Z. Zheng, and R. Nevatia, “Temporal shift and attention modules for graphical skeleton action recognition,” in ICPR, 2022, pp. 3145–3151.
  13. S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in AAAI, 2018.
  14. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
  15. S. Yu, D. Tan, and T. Tan, “A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition,” in ICPR, vol. 4, 2006, pp. 441–444.
  16. N. Takemura, Y. Makihara, D. Muramatsu, T. Echigo, and Y. Yagi, “Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition,” TCVA, vol. 10, no. 1, pp. 1–14, 2018.
  17. J. Zheng, X. Liu, W. Liu, L. He, C. Yan, and T. Mei, “Gait recognition in the wild with dense 3d representations and a benchmark,” in CVPR, 2022, pp. 20 228–20 237.
  18. Z. Zhu, X. Guo, T. Yang, J. Huang, J. Deng, G. Huang, D. Du, J. Lu, and J. Zhou, “Gait recognition in the wild: A benchmark,” in ICCV, 2021, pp. 14 789–14 799.
  19. H. Dou, P. Zhang, W. Su, Y. Yu, Y. Lin, and X. Li, “Gaitgci: Generative counterfactual intervention for gait recognition,” in CVPR, 2023, pp. 5578–5588.
  20. M. Wang, X. Guo, B. Lin, T. Yang, Z. Zhu, L. Li, S. Zhang, and X. Yu, “Dygait: Exploiting dynamic representations for high-performance gait recognition,” ICCV, 2023.
  21. K. Ma, Y. Fu, D. Zheng, C. Cao, X. Hu, and Y. Huang, “Dynamic aggregated network for gait recognition,” in CVPR, 2023, pp. 22 076–22 085.
  22. Z. Huang, D. Xue, X. Shen, X. Tian, H. Li, J. Huang, and X.-S. Hua, “3d local convolutional neural networks for gait recognition,” in ICCV, 2021, pp. 14 920–14 929.
  23. C. Fan, J. Liang, C. Shen, S. Hou, Y. Huang, and S. Yu, “Opengait: Revisiting gait recognition toward better practicality,” 2023.
  24. W. An, S. Yu, Y. Makihara, X. Wu, C. Xu, Y. Yu, R. Liao, and Y. Yagi, “Performance evaluation of model-based gait on multi-view very large population database with pose sequences,” TBIOM, vol. 2, no. 4, pp. 421–430, 2020.
  25. Y. Sun, X. Feng, L. Ma, L. Hu, and M. Nixon, “Trigait: Aligning and fusing skeleton and silhouette gait data via a tri-branch network,” in IJCB, 2023.
  26. H. Guo and Q. Ji, “Physics-augmented autoencoder for 3d skeleton-based gait recognition,” in ICCV, 2023, pp. 19 627–19 638.
  27. X. Huang, X. Wang, Z. Jin, B. Yang, B. He, B. Feng, and W. Liu, “Condition-adaptive graph convolution learning for skeleton-based gait recognition,” TIP, 2023.
  28. Y. Cui and Y. Kang, “Multi-modal gait recognition via effective spatial-temporal feature fusion,” in CVPR, 2023, pp. 17 949–17 957.
  29. S. Hou, C. Cao, X. Liu, and Y. Huang, “Gait lateral network: Learning discriminative and compact representations for gait recognition,” in ECCV, 2020, pp. 382–398.
  30. X. Li, Y. Makihara, C. Xu, Y. Yagi, S. Yu, and M. Ren, “End-to-end model-based gait recognition,” in ACCV, 2020.
  31. H. Zhu, W. Zheng, Z. Zheng, and R. Nevatia, “Sharc: Shape and appearance recognition for person identification in-the-wild,” arXiv preprint arXiv:2310.15946, 2023.
  32. H. Zhu, Z. Zheng, and R. Nevatia, “Gait recognition using 3-d human body shape inference,” in WACV, 2023, pp. 909–918.
  33. R. Liao, S. Yu, W. An, and Y. Huang, “A model-based gait recognition method with body pose and human prior knowledge,” PR, 2020.
  34. J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang et al., “Deep high-resolution representation learning for visual recognition,” TPAMI, vol. 43, no. 10, pp. 3349–3364, 2020.
  35. L. Wang, R. Han, J. Chen, and W. Feng, “Combining the silhouette and skeleton data for gait recognition,” arXiv preprint arXiv:2202.10645, 2022.
  36. E. Pinyoanuntapong, A. Ali, P. Wang, M. Lee, and C. Chen, “Gaitmixer: skeleton-based gait representation learning via wide-spectrum multi-axial mixer,” in ICASSP, 2023, pp. 1–5.
  37. B. Xiao, H. Wu, and Y. Wei, “Simple baselines for human pose estimation and tracking,” in ECCV, 2018.
  38. Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “Openpose: realtime multi-person 2d pose estimation using part affinity fields,” TPAMI, vol. 43, no. 1, pp. 172–186, 2019.
  39. K. Li, S. Wang, X. Zhang, Y. Xu, W. Xu, and Z. Tu, “Pose recognition with cascade transformers,” in CVPR, 2021, pp. 1944–1953.
  40. Y. Li, S. Zhang, Z. Wang, S. Yang, W. Yang, S.-T. Xia, and E. Zhou, “Tokenpose: Learning keypoint tokens for human pose estimation,” in ICCV, 2021.
  41. S. Yang, Z. Quan, M. Nie, and W. Yang, “Transpose: Keypoint localization via transformer,” in ICCV, 2021.
  42. Y. Xu, J. Zhang, Q. Zhang, and D. Tao, “Vitpose: Simple vision transformer baselines for human pose estimation,” arXiv preprint arXiv:2204.12484, 2022.
  43. Y. Yuan, R. Fu, L. Huang, W. Lin, C. Zhang, X. Chen, and J. Wang, “Hrformer: High-resolution transformer for dense prediction,” in NeurIPS, 2021.
  44. D. Rempe, T. Birdal, A. Hertzmann, J. Yang, S. Sridhar, and L. J. Guibas, “Humor: 3d human motion model for robust pose estimation,” in ICCV, 2021, pp. 11 488–11 499.
  45. H. Xu, Y. Gao, Z. Hui, J. Li, and X. Gao, “Language knowledge-assisted representation learning for skeleton-based action recognition,” arXiv preprint arXiv:2305.12398, 2023.
  46. Y. Fu, Y. Wei, Y. Zhou, H. Shi, G. Huang, X. Wang, Z. Yao, and T. Huang, “Horizontal pyramid matching for person re-identification,” in AAAI, vol. 33, 2019, pp. 8295–8302.
  47. H. Zhu, Y. Yuan, Y. Zhu, X. Yang, and R. Nevatia, “Open: Order-preserving pointcloud encoder decoder network for body shape refinement,” in ICPR, 2022.
  48. K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in CVPR, 2019, pp. 5693–5703.
  49. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in ECCV, 2014, pp. 740–755.
  50. X. Huang, D. Zhu, H. Wang, X. Wang, B. Yang, B. He, W. Liu, and B. Feng, “Context-sensitive temporal feature learning for gait recognition,” in ICCV, 2021, pp. 12 909–12 918.
  51. X. Li, Y. Makihara, C. Xu, and Y. Yagi, “End-to-end model-based gait recognition using synchronized multi-view pose constraint,” in ICCV, 2021, pp. 4106–4115.
  52. J. Liang, C. Fan, S. Hou, C. Shen, Y. Huang, and S. Yu, “Gaitedge: Beyond plain end-to-end gait recognition for better practicality,” arXiv preprint arXiv:2203.03972, 2022.
  53. Z. Liu, H. Zhang, Z. Chen, Z. Wang, and W. Ouyang, “Disentangling and unifying graph convolutions for skeleton-based action recognition,” in CVPR, 2020, pp. 143–152.
  54. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  55. M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. Hoi, “Deep learning for person re-identification: A survey and outlook,” TPAMI, vol. 44, no. 6, pp. 2872–2893, 2021.
  56. K. Shiraga, Y. Makihara, D. Muramatsu, T. Echigo, and Y. Yagi, “Geinet: View-invariant gait recognition using a convolutional neural network,” in ICB, 2016, pp. 1–8.
  57. Y. Sun, W. Liu, Q. Bao, Y. Fu, T. Mei, and M. J. Black, “Putting people in their place: Monocular regression of 3d people in depth,” in CVPR, 2022, pp. 13 243–13 252.
  58. C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, “Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,” TPAMI, pp. 1325–1339, 2013.
Citations (4)

Summary

  • The paper introduces a novel two-stream framework that refines skeleton data using silhouette integration to significantly boost gait recognition accuracy.
  • It employs intra-modal and inter-modal refinements to correct jittery joint and bone positions, ensuring temporal coherence across frames.
  • Experimental results on CASIA-B and OUMVLP datasets demonstrate notable Rank-1 improvements, highlighting its potential in robust biometric systems.

GaitSTR: Analyzing the Framework for Gait Recognition with Sequential Two-stream Refinement

The paper "GaitSTR: Gait Recognition with Sequential Two-stream Refinement" provides a novel framework for gait recognition, a biometric modality that identifies individuals based on their walking patterns. This framework employs a two-stream representation combining skeletons and silhouettes, significantly improving on previous state-of-the-art methodologies by integrating structural corrections and temporal consistencies. This essay covers the methodology, technical advancements, and implications of the GaitSTR framework for gait recognition.

Methodology Overview

GaitSTR introduces a sophisticated model for improving gait recognition accuracy through enhanced skeleton representation and integration with silhouettes. The primary innovation lies in refining the two-stream skeletons comprising joints and bones, subsequently fusing them with silhouettes for robust gait recognition. Figure 1

Figure 1: Visualization of the (a) silhouette and (b) skeleton sequence used for gait recognition. Silhouettes show different contours with different clothes and carried-on objects, while the skeletons suffer from jittery detection results in the video.

The framework's architecture comprises several critical components:

  • Skeleton Correction Network: This module is responsible for rectifying inconsistencies and jitteriness in skeleton joint and bone positioning.
  • Cross-Modal Adapter (CMA): This component enhances coordination and information flow between skeletons and silhouettes, facilitating cross-modal feature integration and refinement.

The skeleton correction process is optimized by two refinements:

  1. Intra-Modal Refinement: Corrects joint and bone discrepancies using internal multi-layer feature aggregation.
  2. Inter-Modal Refinement: Utilizes silhouette features to further correct skeleton inconsistencies, ensuring temporal consistency and coherence across frames. Figure 2

    Figure 2: Our proposed architecture for GaitSTR. Trapezoids consists of trainable modules, and modules of the same color and fill-in patterns in the same model share the weights.

Implementation and Experimental Results

The implementation leveraged public datasets, including CASIA-B and OUMVLP, to benchmark GaitSTR's gait recognition capabilities. The framework's ability to accurately recognize individuals based on walking patterns surpasses prior models, as demonstrated by results on these datasets. Key metrics such as Rank-1 accuracy illustrate significant improvements, particularly in scenarios affected by factors like occlusions and varying viewpoints.

  • Datasets:
    • CASIA-B: GaitSTR shows improvements in recognition rates by refining skeleton sequences subjected to normal (NM), carrying (BG), and clothing variation (CL) conditions.
    • OUMVLP: The large-scale evaluation further validates the robustness of GaitSTR, achieving higher average recognition scores compared to existing methods.
    • Figure 3
    • Figure 3: Architecture of the skeleton correction network. F_J and F_B represent the joint and bone frame-wise features encoded from J (joints) and B (bones), respectively.

Trade-offs and Considerations

The integration of skeleton refinement complicates the processing pipeline but is offset by increased recognition accuracy and robustness against environmental variations. Potential limitations include additional computational overhead due to the two-stream architecture and potential susceptibility to errors in silhouette extraction, which could affect skeleton refinement accuracy.

Implications and Future Directions

GaitSTR introduces a refined method for processing and recognizing human gait patterns, paving the way for more accurate and reliable biometric systems suitable for high-security environments. Future work could explore the integration of this framework with other biometric modalities, such as facial recognition, to further enhance identity verification systems. Figure 4

Figure 4: Visualization of successful and failed refined skeletons with GaitSTR. For each example, from left to right, we have original skeletons, refined skeletons and its neighbor frames.

GaitSTR's innovative cross-modal approach to gait recognition represents a significant step forward in the application of deep learning techniques to biometrics, offering a robust solution to identity recognition challenges involving diverse environmental factors and occlusions.

Conclusion

In conclusion, the GaitSTR framework presents a significant advancement in gait recognition technology, demonstrating the viability and effectiveness of integrating skeletal and silhouette data. By refining skeleton predictions using silhouettes and introducing a cross-modal refinement strategy, GaitSTR sets a new standard for gait recognition systems, making it a pivotal contribution to the field of biometric identity recognition.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.