Papers
Topics
Authors
Recent
Search
2000 character limit reached

UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification

Published 4 May 2024 in cs.LG and cs.AI | (2405.03714v2)

Abstract: Extreme Multi-label Classification (XMC) involves predicting a subset of relevant labels from an extremely large label space, given an input query and labels with textual features. Models developed for this problem have conventionally made use of dual encoder (DE) to embed the queries and label texts and one-vs-all (OvA) classifiers to rerank the shortlisted labels by the DE. While such methods have shown empirical success, a major drawback is their computational cost, often requiring upto 16 GPUs to train on the largest public dataset. Such a high cost is a consequence of calculating the loss over the entire label space. While shortlisting strategies have been proposed for classifiers, we aim to study such methods for the DE framework. In this work, we develop UniDEC, a loss-independent, end-to-end trainable framework which trains the DE and classifier together in a unified manner with a multi-class loss, while reducing the computational cost by 4-16x. This is done via the proposed pick-some-label (PSL) reduction, which aims to compute the loss on only a subset of positive and negative labels. These labels are carefully chosen in-batch so as to maximise their supervisory signals. Not only does the proposed framework achieve state-of-the-art results on datasets with labels in the order of millions, it is also computationally and resource efficient in achieving this performance on a single GPU. Code is made available at https://github.com/the-catalyst/UniDEC.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Zipf’s law and the internet. Glottometrics, 3(1):143–150, 2002.
  2. R. Babbar and B. Schölkopf. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification. In WSDM, 2017.
  3. R. Babbar and B. Schölkopf. Data scarcity, robustness and extreme multi-label classification. Machine Learning, 108:1329–1351, 2019.
  4. The extreme classification repository: Multi-label datasets and code, 2016.
  5. Unsupervised learning of visual features by contrasting cluster assignments. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 9912–9924. Curran Associates, Inc., 2020.
  6. Taming Pretrained Transformers for Extreme Multi-label Text Classification. In KDD, 2020.
  7. Siamesexml: Siamese networks meet extreme classifiers with 100m labels. In International Conference on Machine Learning, pages 2330–2340. PMLR, 2021.
  8. Ngame: Negative mining-aware mini-batching for extreme classification. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pages 258–266, 2023.
  9. Deepxml: A deep extreme multi-label learning framework applied to short text documents. In Conference on Web Search and Data Mining (WSDM’21), 2021.
  10. Deep encoders with auxiliary parameters for extreme classification. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 358–367, 2023.
  11. Supervised contrastive learning for pre-trained language model fine-tuning, 2021.
  12. Efficacy of dual-encoders for extreme multi-label classification, 2023.
  13. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In KDD, pages 935–944, 2016.
  14. Renee: End-to-end training of extreme classification models. Proceedings of Machine Learning and Systems, 2023.
  15. Extreme F-measure Maximization using Sparse Probability Estimates. In ICML, June 2016.
  16. Lightxml: Transformer with dynamic negative sampling for high-performance extreme multi-label text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 7987–7994, 2021.
  17. Hard negative mixing for contrastive learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc.
  18. Dense passage retrieval for open-domain question answering. In B. Webber, T. Cohn, Y. He, and Y. Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online, Nov. 2020. Association for Computational Linguistics.
  19. Inceptionxml: A lightweight framework with synchronized negative sampling for short text extreme classification. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 760–769, 2023.
  20. Cascadexml: Rethinking transformers for end-to-end multi-resolution training in extreme multi-label classification. Advances in Neural Information Processing Systems, 35:2074–2087, 2022.
  21. Gandalf : Data augmentation is all you need for extreme classification, 2023.
  22. Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
  23. Multilabel reductions: what is my loss optimising? Advances in Neural Information Processing Systems, 32, 2019.
  24. Decaf: Deep extreme classification with label features. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pages 49–57, 2021.
  25. Eclare: Extreme classification with label graph correlations. In Proceedings of the Web Conference 2021, pages 3721–3732, 2021.
  26. Rocketqa: An optimized training approach to dense passage retrieval for open-domain question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5835–5847, 2021.
  27. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 2021.
  28. On missing labels, long-tails and propensities in extreme multi-label classification. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1547–1557, 2022.
  29. A no-regret generalization of hierarchical softmax to extreme multi-label classification. In NIPS, 2018.
  30. Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808, 2020.
  31. Unified contrastive learning in image-text-label space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19163–19173, 2022.
  32. Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In NeurIPS, 2019.
  33. Fast multi-resolution transformer fine-tuning for extreme multi-label text classification. Advances in Neural Information Processing Systems, 34:7267–7280, 2021.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 1 like about this paper.