REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation
Abstract: Scene Graph Generation (SGG) is a task that encodes visual relationships between objects in images as graph structures. SGG shows significant promise as a foundational component for downstream tasks, such as reasoning for embodied agents. To enable real-time applications, SGG must address the trade-off between performance and inference speed. However, current methods tend to focus on one of the following: (1) improving relation prediction accuracy, (2) enhancing object detection accuracy, or (3) reducing latency, without aiming to balance all three objectives simultaneously. To address this limitation, we propose a novel architecture, inference method, and relation prediction model. Our proposed solution, the REACT model, achieves the highest inference speed among existing SGG models, improving object detection accuracy without sacrificing relation prediction performance. Compared to state-of-the-art approaches, REACT is 2.7 times faster (with a latency of 23 ms) and improves object detection accuracy by 58.51%. Furthermore, our proposal significantly reduces model size, with an average of 5.5x fewer parameters. Code is available at https://github.com/Maelic/SGG-Benchmark
- Spice: Semantic propositional image caption evaluation. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14, pages 382–398. Springer, 2016.
- End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
- Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015.
- RelTR: Relation Transformer for Scene Graph Generation, Aug. 2022. arXiv:2201.11460 [cs] version: 2.
- Understanding the role of scene graphs in visual question answering. arXiv preprint arXiv:2101.05479, 2021.
- Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19405–19414, New Orleans, LA, USA, June 2022. IEEE.
- Continuous scene representations for embodied ai. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14849–14859, 2022.
- Egtr: Extracting graph from transformer for scene graph generation. arXiv preprint arXiv:2404.02072, 2024.
- Ultralytics YOLO, Jan. 2023.
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. International Journal of Computer Vision, 123(1):32–73, May 2017.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning, pages 19730–19742. PMLR, 2023.
- The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18847–18856, New Orleans, LA, USA, June 2022. IEEE.
- SGTR: End-to-end Scene Graph Generation with Transformer. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19464–19474, New Orleans, LA, USA, June 2022. IEEE.
- Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11104–11114, Nashville, TN, USA, June 2021. IEEE.
- PPDL: Predicate Probability Distribution based Loss for Unbiased Scene Graph Generation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19425–19434, New Orleans, LA, USA, June 2022. IEEE.
- Rethinking the Evaluation of Unbiased Scene Graph Generation, Oct. 2022. arXiv:2208.01909 [cs].
- Embodied semantic scene graph generation. In Conference on robot learning, pages 1585–1594. PMLR, 2022.
- VrR-VG: Refocusing Visually-Relevant Relationships. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 10402–10411, Seoul, Korea (South), Oct. 2019. IEEE.
- GPS-Net: Graph Property Sensing Network for Scene Graph Generation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3743–3752, Seattle, WA, USA, June 2020. IEEE.
- Human-centric Relation Segmentation: Dataset and Solution. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1, 2021. Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence.
- Context-aware Scene Graph Generation with Seq2Seq Transformers. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 15911–15921, Montreal, QC, Canada, Oct. 2021. IEEE.
- In defense of scene graph generation for human-robot open-ended interaction in service robotics. In Robot World Cup, pages 299–310. Springer, 2023.
- Fine-grained is too coarse: A novel data-centric approach for efficient scene graph generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11–20, 2023.
- Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
- You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
- Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
- Unbiased Scene Graph Generation From Biased Training. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3713–3722, Seattle, WA, USA, June 2020. IEEE.
- Learning to Compose Dynamic Tree Structures for Visual Contexts. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6612–6621, Long Beach, CA, USA, June 2019. IEEE.
- Exploring Context and Visual Pattern of Relationship for Scene Graph Generation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8180–8189, Long Beach, CA, USA, June 2019. IEEE.
- Unbiased Scene Graph Generation via Rich and Fair Semantic Extraction, Feb. 2020. arXiv:2002.00176 [cs].
- Scene Graph Generation by Iterative Message Passing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5410–5419, 2017.
- PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation. In Proceedings of the 28th ACM International Conference on Multimedia, pages 265–273, Seattle WA USA, Oct. 2020. ACM.
- Panoptic Scene Graph Generation. In Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, editors, Computer Vision – ECCV 2022, Lecture Notes in Computer Science, pages 178–196, Cham, 2022. Springer Nature Switzerland.
- Reformer: The relational transformer for image captioning. In Proceedings of the 30th ACM International Conference on Multimedia, pages 5398–5406, 2022.
- Unbiased Heterogeneous Scene Graph Generation with Relation-aware Message Passing Neural Network, Dec. 2022. arXiv:2212.00443 [cs] version: 1.
- CogTree: Cognition Tree Loss for Unbiased Scene Graph Generation. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, pages 1274–1280, Montreal, Canada, Aug. 2021. International Joint Conferences on Artificial Intelligence Organization.
- Neural Motifs: Scene Graph Parsing with Global Context. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5831–5840, Salt Lake City, UT, June 2018. IEEE.
- Visual Translation Embedding Network for Visual Relation Detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3107–3115, Honolulu, HI, July 2017. IEEE.
- Prototype-based embedding network for scene graph generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22783–22792, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.