Papers
Topics
Authors
Recent
Search
2000 character limit reached

SCANNER: Knowledge-Enhanced Approach for Robust Multi-modal Named Entity Recognition of Unseen Entities

Published 2 Apr 2024 in cs.CL and cs.AI | (2404.01914v1)

Abstract: Recent advances in named entity recognition (NER) have pushed the boundary of the task to incorporate visual signals, leading to many variants, including multi-modal NER (MNER) or grounded MNER (GMNER). A key challenge to these tasks is that the model should be able to generalize to the entities unseen during the training, and should be able to handle the training samples with noisy annotations. To address this obstacle, we propose SCANNER (Span CANdidate detection and recognition for NER), a model capable of effectively handling all three NER variants. SCANNER is a two-stage structure; we extract entity candidates in the first stage and use it as a query to get knowledge, effectively pulling knowledge from various sources. We can boost our performance by utilizing this entity-centric extracted knowledge to address unseen entities. Furthermore, to tackle the challenges arising from noisy annotations in NER datasets, we introduce a novel self-distillation method, enhancing the robustness and accuracy of our model in processing training data with inherent uncertainties. Our approach demonstrates competitive performance on the NER benchmark and surpasses existing methods on both MNER and GMNER benchmarks. Further analysis shows that the proposed distillation and knowledge utilization methods improve the performance of our model on various benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Learning implicit entity-object relations by bidirectional generative alignment for multimodal ner. In ACM MM.
  2. Hybrid transformer with multi-level fusion for multimodal knowledge graph completion. In SIGIR.
  3. Unsupervised cross-lingual representation learning at scale. In ACL.
  4. BERT: Pre-training of deep bidirectional transformers for language understanding. In ACL.
  5. MNER-QG: An end-to-end mrc framework for multimodal named entity recognition with query grounding. In AAAI.
  6. Unified named entity recognition as word-word relation classification. In AAAI.
  7. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597.
  8. Prompt ChatGPT in MNER: Improved multimodal named entity recognition method based on auxiliary refining knowledge from ChatGPT. arXiv preprint arXiv:2305.12212.
  9. A survey on deep learning for named entity recognition. IEEE Transactions on Knowledge and Data Engineering, 34.
  10. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
  11. I. Loshchilov. and . Hutter. 2019. Decoupled weight decay regularization. In ICLR.
  12. Visual attention model for name tagging in multimodal social media. In ACL.
  13. Flat multi-modal interaction transformer for named entity recognition. In COLING.
  14. Semi-supervised sequence tagging with bidirectional language models. In ACL.
  15. Learning transferable visual models from natural language supervision. In ICML.
  16. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In ICCV.
  17. DiffusionNER: Boundary diffusion for named entity recognition. In ACL.
  18. PromptNER: Prompt locating and typing for named entity recognition. In ACL.
  19. DAMO-NLP at SemEval-2023 task 2: A unified retrieval-augmented system for multilingual named entity recognition. In SemEval.
  20. E. F. Tjong Kim Sang and F. De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In NAACL.
  21. Named entity and relation extraction with multi-modal retrieval. In EMNLP findings.
  22. ITA: Image-text alignments for multi-modal named entity recognition. In NAACL-HLT.
  23. Improving named entity recognition by external context retrieving and cooperative learning. In ACL.
  24. DAMO-NLP at SemEval-2022 task 11: A knowledge-based system for multilingual named entity recognition. In SemEval.
  25. PromptMNER: prompt-based entity-related visual clue extraction and integration for multimodal named entity recognition. In DASFAA.
  26. CAT-MNER: Multimodal named entity recognition with knowledge-refined cross-modal attention. In ICME.
  27. Crossweigh: Training named entity tagger from imperfect annotations. In EMNLP-IJCNLP.
  28. Adversarial weight perturbation helps robust generalization. In NeurIPS.
  29. LUKE: Deep contextualized entity representations with entity-aware self-attention. In EMNLP.
  30. A unified generative framework for various ner subtasks. In ACL.
  31. A unified generative framework for various NER subtasks. In ACL.
  32. Improving multimodal named entity recognition via entity span detection with unified multimodal transformer. In ACL.
  33. Grounded multimodal named entity recognition on social media. In ACL.
  34. Multi-modal graph fusion for named entity recognition with targeted visual guidance. AAAI.
  35. VinVL: Revisiting visual representations in vision-language models. In CVPR.
  36. Adaptive co-attention network for named entity recognition in tweets. In AAAI.
  37. Reducing the bias of visual objects in multimodal named entity recognition. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pages 958–966.
  38. Learning from different text-image pairs: A relation-enhanced graph convolutional network for multimodal ner. In ACM MM.
  39. E. Zhu and J. Li. 2022. Boundary smoothing for named entity recognition. In ACL.
Citations (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.