Papers
Topics
Authors
Recent
Search
2000 character limit reached

RALACs: Action Recognition in Autonomous Vehicles using Interaction Encoding and Optical Flow

Published 28 Sep 2022 in cs.CV, cs.LG, and cs.RO | (2209.14408v3)

Abstract: When applied to autonomous vehicle (AV) settings, action recognition can enhance an environment model's situational awareness. This is especially prevalent in scenarios where traditional geometric descriptions and heuristics in AVs are insufficient. However, action recognition has traditionally been studied for humans, and its limited adaptability to noisy, un-clipped, un-pampered, raw RGB data has limited its application in other fields. To push for the advancement and adoption of action recognition into AVs, this work proposes a novel two-stage action recognition system, termed RALACs. RALACs formulates the problem of action recognition for road scenes, and bridges the gap between it and the established field of human action recognition. This work shows how attention layers can be useful for encoding the relations across agents, and stresses how such a scheme can be class-agnostic. Furthermore, to address the dynamic nature of agents on the road, RALACs constructs a novel approach to adapting Region of Interest (ROI) Alignment to agent tracks for downstream action classification. Finally, our scheme also considers the problem of active agent detection, and utilizes a novel application of fusing optical flow maps to discern relevant agents in a road scene. We show that our proposed scheme can outperform the baseline on the ICCV2021 Road Challenge dataset and by deploying it on a real vehicle platform, we provide preliminary insight to the usefulness of action recognition in decision making.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Road: The road event awareness dataset for autonomous driving. IEEE transactions on pattern analysis and machine intelligence, 45(1):1036–1054, 2022.
  2. Learning-based adaptive optimal control for connected vehicles in mixed traffic: robustness to driver reaction time. IEEE transactions on cybernetics, 52(6):5267–5277, 2020.
  3. Real-time unified trajectory planning and optimal control for urban autonomous driving under static and dynamic obstacle constraints. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 10139–10145. IEEE, 2023.
  4. A secure adaptive control for cooperative driving of autonomous connected vehicles in the presence of heterogeneous communication delays and cyberattacks. IEEE transactions on cybernetics, 51(3):1134–1149, 2020.
  5. A sensorless state estimation for a safety-oriented cyber-physical system in urban driving: Deep learning approach. IEEE/CAA Journal of Automatica Sinica, 8(1):169–178, 2020.
  6. Onboard sensors-based self-localization for autonomous vehicle with hierarchical map. IEEE Transactions on Cybernetics, 2022.
  7. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Transactions on Intelligent Transportation Systems, 22(3):1341–1360, 2020.
  8. Sim-to-real domain adaptation for lane detection and classification in autonomous driving. In 2022 IEEE Intelligent Vehicles Symposium (IV), pages 457–463. IEEE, 2022.
  9. Cautious planning with incremental symbolic perception: Designing verified reactive driving maneuvers. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 1652–1658. IEEE, 2023.
  10. Deepsegmenter: Temporal action localization for detecting anomalies in untrimmed naturalistic driving videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5358–5364, 2023.
  11. Action probability calibration for efficient naturalistic driving action localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5269–5276, 2023.
  12. Self-learned autonomous driving at unsignalized intersections: A hierarchical reinforced learning approach for feasible decision-making. IEEE Transactions on Intelligent Transportation Systems, 2023.
  13. Drg: A dynamic relation graph for unified prior-online environment modeling in urban autonomous driving. In 2022 International Conference on Robotics and Automation (ICRA), pages 8054–8060. IEEE, 2022.
  14. Raft: Recurrent all-pairs field transforms for optical flow. In European conference on computer vision, pages 402–419. Springer, 2020.
  15. Actor-context-actor relation network for spatio-temporal action localization. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 464–474, 6 2020.
  16. Real-time human action recognition using locally aggregated kinematic-guided skeletonlet and supervised hashing-by-analysis model. IEEE Transactions on Cybernetics, 52(6):4837–4849, 2021.
  17. A novel multiple-view adversarial learning network for unsupervised domain adaptation action recognition. IEEE Transactions on Cybernetics, 52(12):13197–13211, 2021.
  18. Recapnet: Action proposal generation mimicking human cognitive process. IEEE transactions on cybernetics, 51(12):6017–6028, 2020.
  19. Recurrent tubelet proposal and recognition networks for action detection. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
  20. Tuber: Tubelet transformer for video action detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13598–13607, June 2022.
  21. Tacnet: Transition-aware context network for spatio-temporal action detection. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11979–11987, 2019.
  22. You only watch once: A unified cnn architecture for real-time spatiotemporal action localization. 11 2019.
  23. Step: Spatio-temporal progressive learning for video action detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June:264–272, 4 2019.
  24. Mvitv2: Improved multiscale vision transformers for classification and detection. 2021.
  25. Memvit: Memory-augmented multiscale vision transformer for efficient long-term video recognition. 2022.
  26. Slowfast networks for video recognition. Proceedings of the IEEE International Conference on Computer Vision, 2019-October:6201–6210, 12 2018.
  27. Activitynet: A large-scale video benchmark for human activity understanding. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 961–970, 2015.
  28. Tsp: Temporally-sensitive pretraining of video encoders for localization tasks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021.
  29. Temporal action localization in untrimmed videos via multi-stage cnns. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1049–1058, 2016.
  30. Every moment counts: Dense detailed labeling of actions in complex videos. 2015.
  31. Daps: Deep action proposals for action understanding. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision – ECCV 2016, pages 768–784, Cham, 2016. Springer International Publishing.
  32. Retrieving actions in movies. In 2007 IEEE 11th International Conference on Computer Vision, pages 1–8, 2007.
  33. Spatiotemporal deformable part models for action detection. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pages 2642–2649, 2013.
  34. Efficient action localization with approximately normalized fisher vectors. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 2545–2552, 2014.
  35. A database and evaluation methodology for optical flow. Int J Comput Vis, 92:1–31, 2011.
  36. Determining optical flow. https://doi.org/10.1117/12.965761, 0281:319–331, 11 1981.
  37. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23:1222–1239, 11 2001.
  38. Flownet 2.0: Evolution of optical flow estimation with deep networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017-January:1647–1655, 12 2016.
  39. Optical flow augmented semantic segmentation networks for automated driving. 1 2019.
  40. Modnet: Moving object detection network with motion and appearance for autonomous driving. 9 2017.
  41. Monocular instance motion segmentation for autonomous driving: Kitti instancemotseg dataset and multi-task baseline. IEEE Intelligent Vehicles Symposium, Proceedings, 2021-July:114–121, 8 2020.
  42. Argus++: Robust real-time activity detection for unconstrained video streams with overlapping cube proposals. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 112–121, 2022.
  43. Observation-centric sort: Rethinking sort for robust multi-object tracking. 3 2022.
  44. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2):318–327, 2020.
  45. Feature pyramid networks for object detection. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pages 936–944. IEEE, 2017.
  46. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
  47. The kinetics human action video dataset. 2017.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.