Papers
Topics
Authors
Recent
Search
2000 character limit reached

Structure-Aware Residual-Center Representation for Self-Supervised Open-Set 3D Cross-Modal Retrieval

Published 22 Jul 2024 in cs.MM | (2407.15376v1)

Abstract: Existing methods of 3D cross-modal retrieval heavily lean on category distribution priors within the training set, which diminishes their efficacy when tasked with unseen categories under open-set environments. To tackle this problem, we propose the Structure-Aware Residual-Center Representation (SRCR) framework for self-supervised open-set 3D cross-modal retrieval. To address the center deviation due to category distribution differences, we utilize the Residual-Center Embedding (RCE) for each object by nested auto-encoders, rather than directly mapping them to the modality or category centers. Besides, we perform the Hierarchical Structure Learning (HSL) approach to leverage the high-order correlations among objects for generalization, by constructing a heterogeneous hypergraph structure based on hierarchical inter-modality, intra-object, and implicit-category correlations. Extensive experiments and ablation studies on four benchmarks demonstrate the superiority of our proposed framework compared to state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. “Cross-modal Center Loss for 3D Cross-Modal Retrieval,” in CVPR, 2021, pp. 3142–3151.
  2. “Adversarial Cross-modal Retrieval,” in ACMMM, 2017, pp. 154–162.
  3. “RONO: Robust Discriminative Learning With Noisy Labels for 2D-3D Cross-Modal Retrieval,” in CVPR, 2023, pp. 11610–11619.
  4. “Deep Canonical Correlation Analysis,” in ICML. PMLR, 2013, pp. 1247–1255.
  5. “On Deep Multi-View Representation Learning,” in ICML. PMLR, 2015, pp. 1083–1092.
  6. “Open-Set Recognition: A Good Closed-Set Classifier is All You Need?,” arXiv preprint arXiv:2110.06207, 2021.
  7. “Adversarial Reciprocal Points Learning for Open Set Recognition,” TPAMI, vol. 44, no. 11, pp. 8065–8081, 2021.
  8. “Hypergraph-based Multi-Modal Representation for Open-Set 3D Object Retrieval,” TPAMI, , no. 01, pp. 1–18, 2023.
  9. “Cross-Modal Retrieval with Correspondence AutoEncoder,” in ACMMM, 2014, pp. 7–16.
  10. “HGNN+: General Hypergraph Neural Networks,” TPAMI, vol. 45, no. 3, pp. 3181–3199, 2022.
  11. “ABO: Dataset and Benchmarks for Real-World 3D Object Understanding,” in CVPR, 2022, pp. 21126–21136.
  12. “On Visual Similarity based 3D Model Retrieval,” in Computer graphics forum. Wiley Online Library, 2003, pp. 223–232.
  13. “Developing an Engineering Shape Benchmark for CAD Models,” Computer-Aided Design, vol. 38, no. 9, pp. 939–953, 2006.
  14. “3D Shapenets: A Deep Representation for Volumetric Shapes,” in CVPR, 2015, pp. 1912–1920.
  15. “Scalable Deep Multimodal Learning for Cross-Modal Retrieval,” in SIGIR, 2019, pp. 635–644.
  16. “Multi-Modal Semantic AutoEncoder for Cross-Modal Retrieval,” Neurocomputing, vol. 331, pp. 165–175, 2019.
  17. “Learning Placeholders for Open-Set Recognition,” in CVPR, 2021, pp. 4401–4410.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.