Papers
Topics
Authors
Recent
Search
2000 character limit reached

HOKEM: Human and Object Keypoint-based Extension Module for Human-Object Interaction Detection

Published 25 Jun 2023 in cs.CV and cs.LG | (2306.14260v1)

Abstract: Human-object interaction (HOI) detection for capturing relationships between humans and objects is an important task in the semantic understanding of images. When processing human and object keypoints extracted from an image using a graph convolutional network (GCN) to detect HOI, it is crucial to extract appropriate object keypoints regardless of the object type and to design a GCN that accurately captures the spatial relationships between keypoints. This paper presents the human and object keypoint-based extension module (HOKEM) as an easy-to-use extension module to improve the accuracy of the conventional detection models. The proposed object keypoint extraction method is simple yet accurately represents the shapes of various objects. Moreover, the proposed human-object adaptive GCN (HO-AGCN), which introduces adaptive graph optimization and attention mechanism, accurately captures the spatial relationships between keypoints. Experiments using the HOI dataset, V-COCO, showed that HOKEM boosted the accuracy of an appearance-based model by a large margin.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Y. Kong and Y. Fu, “Human action recognition and prediction: A survey,” Int. Jour. Computer Vision (IJCV), vol. 130, no. 5, pp. 1366–1401, 2022.
  2. G. Gkioxari, R. Girshick, P. Dollár, and K. He, “Detecting and recognizing human-object interactions,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8359–8367.
  3. C. Gao, Y. Zou, and J-B. Huang, “Ican: Instance-centric attention network for human-object interaction detection,” arXiv preprint arXiv:1808.10437, 2018.
  4. B. Kim, T. Choi, J. Kang, and H J. Kim, “Uniondet: Union-level detector towards real-time human-object interaction detection,” in Proc. European Conf. Computer Vision (ECCV), 2018, pp. 498–514.
  5. O. Ulutan, A. S. M. Iftekhar, and B.S. Manjunath, “Vsgnet: Spatial attention network for detecting human object interactions using graph convolutions,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2020, pp. 13617–13626.
  6. Z. Cao, T. Simon, S-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7291–7299.
  7. Z. Geng, K. Sun, B. Xiao, Z. Zhang, and J. Wang, “Bottom-up human pose estimation via disentangled keypoint regression,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2021, pp. 14676–14686.
  8. P. Zhou and M. Chi, “Relation parsing neural network for human-object interaction detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2019, pp. 843–851.
  9. B. Wan, D. Zhou, Y. Liu, R. Li, and X. He, “Pose-aware multi-level feature network for human object interaction detection,” in Proc. IEEE Int. Conf. Computer Vision (ICCV), 2019, pp. 9469–9478.
  10. X. Zhong, C. Ding, X. Qu, and D. Tao, “Polysemy deciphering network for human-object interaction detection,” in Proc. European Conf. Computer Vision (ECCV), 2020, pp. 69–85.
  11. D-J. Kim, X. Sun, J. Choi, S. Lin, and I S. Kweon, “Detecting human-object interactions with action co-occurrence priors,” in Proc. European Conf. Computer Vision (ECCV), 2020, pp. 718–736.
  12. Z. Liang, J. Liu, Y. Guan, and J. Rojas, “Pose-based modular network for human-object interaction detection,” arXiv preprint arXiv:2008.02042, 2020.
  13. L. Liu and R.T. Tan, “Human object interaction detection using two-direction spatial enhancement and exclusive object prior,” Pattern Recognition, vol. 124, pp. 108438, 2022.
  14. M. Zhu, E. SL Ho, and H. PH Shum, “A skeleton-aware graph convolutional network for human-object interaction detection,” in Proc. IEEE Int. Conf. Systems, Man, and Cybernetics (SMC), 2022, pp. 474–491.
  15. S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in Proc. AAAI Conf. Artificial Intelligence (AAAI), 2018, vol. 32, no. 1.
  16. L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Two-stream adaptive graph convolutional networks for skeleton-based action recognition,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2019, pp. 12026–12035.
  17. Y.-F. Song, Z. Zhang, C. Shan, and L. Wang, “Constructing stronger and faster baselines for skeleton-based action recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence (TPAMI), 2022.
  18. T Y. Zhang and C Y. Suen, “A fast parallel algorithm for thinning digital patterns,” Communications of the ACM, vol. 27, no. 3, pp. 236–239, 1984.
  19. Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient mobile network design,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2021, pp. 13713–13722.
  20. S. Gupta and J. Malik, “Visual semantic role labeling,” arXiv preprint arXiv:1505.04474, 2015.
  21. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
  22. K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proc. IEEE Int. Conf. Computer Vision (ICCV), 2017, pp. 2961–2969.
  23. B. Kim, J. Lee, J. Kang, E-S. Kim, and H J. Kim, “Hotr: End-to-end human-object interaction detection with transformers,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2021, pp. 74–83.
Citations (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.