Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages

Published 27 Mar 2024 in cs.CL and cs.LG | (2403.18336v1)

Abstract: User-generated data sources have gained significance in uncovering Adverse Drug Reactions (ADRs), with an increasing number of discussions occurring in the digital world. However, the existing clinical corpora predominantly revolve around scientific articles in English. This work presents a multilingual corpus of texts concerning ADRs gathered from diverse sources, including patient fora, social media, and clinical reports in German, French, and Japanese. Our corpus contains annotations covering 12 entity types, four attribute types, and 13 relation types. It contributes to the development of real-world multilingual LLMs for healthcare. We provide statistics to highlight certain challenges associated with the corpus and conduct preliminary experiments resulting in strong baselines for extracting entities and relations between these entities, both within and across languages.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Large Language Models are Few-Shot Clinical Information Extractors. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1998–2022.
  2. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
  3. Matching the Blanks: Distributional Similarity for Relation Learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2895–2905, Florence, Italy. Association for Computational Linguistics.
  4. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5:135–146.
  5. Multi-Task Pharmacovigilance Mining from Social Media Posts. In Proceedings of the 2018 World Wide Web Conference, WWW ’18, pages 117–126, Republic and Canton of Geneva, CHE. International World Wide Web Conferences Steering Committee.
  6. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
  7. An Effective Transition-based Model for Discontinuous NER. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5860–5870, Online. Association for Computational Linguistics.
  8. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
  9. FuzzyBIO: A Proposal for Fuzzy Representation of Discontinuous Entities. In Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis, pages 77–82, online. Association for Computational Linguistics.
  10. I. Ralph Edwards and Jeffrey K. Aronson. 2000. Adverse drug reactions: Definitions, diagnosis, and management. The Lancet, 356(9237):1255–1259.
  11. DKADE: A novel framework based on deep learning and knowledge graph for identifying adverse drug events and related medications. Briefings in Bioinformatics, page bbad228.
  12. Karën Fort. 2016. Collaborative Annotation for Reliable Natural Language Processing: Technical and Sociological Aspects, 1 edition. Wiley.
  13. Oguzhan Gencoglu. 2020. Sentence Transformers and Bayesian Optimization for Adverse Drug Effect Detection from Twitter. In Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task, pages 161–164, Barcelona, Spain (Online). Association for Computational Linguistics.
  14. Eszter Hargittai and Gina Walejko. 2008. The participation divide: Content creation and sharing in the digital age1. Information, Communication & Society, 11(2):239–256.
  15. Lorna Hazell and Saad A. W. Shakir. 2006. Under-reporting of adverse drug reactions : A systematic review. Drug Safety, 29(5):385–396.
  16. Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts in Health-Related Social Networks. In Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, pages 117–125, Uppsala, Sweden. Association for Computational Linguistics.
  17. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4814–4828, Online. Association for Computational Linguistics.
  18. RoBERTa: A Robustly Optimized BERT Pretraining Approach.
  19. DeepADEMiner: A deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter. Journal of the American Medical Informatics Association, 28(10):2184–2192.
  20. Large Language Models as Instructors: A Study on Multilingual Clinical Entity Extraction. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 178–190, Toronto, Canada. Association for Computational Linguistics.
  21. KFU NLP Team at SMM4H 2020 Tasks: Cross-lingual Transfer Learning with Pretrained Language Models for Drug Reactions. In Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task, pages 51–56, Barcelona, Spain (Online). Association for Computational Linguistics.
  22. Clinical Natural Language Processing in languages other than English: Opportunities and challenges. Journal of Biomedical Semantics, 9(1):12.
  23. Mariana Neves and Jurica Ševa. 2021. An extensive review of tools for manual annotation of documents. Briefings in Bioinformatics, 22(1):146–163.
  24. Training language models to follow instructions with human feedback. 36th Conference on Neural Information Processing Systems (NeurIPS 2022).
  25. Limitations and obstacles of the spontaneous adverse drugs reactions reporting: Two “challenging” case reports. Journal of Pharmacology & Pharmacotherapeutics, 4(Suppl1):S66–S72.
  26. AILAB-Udine@SMM4H’22: Limits of Transformers and BERT Ensembles. In Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task, pages 130–134, Gyeongju, Republic of Korea. Association for Computational Linguistics.
  27. Language Models are Unsupervised Multitask Learners.
  28. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):140:5485–140:5551.
  29. Lev Ratinov and Dan Roth. 2009. Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning - CoNLL ’09, page 147, Boulder, Colorado. Association for Computational Linguistics.
  30. Exploring a Unified Sequence-To-Sequence Transformer for Medical Product Safety Monitoring in Social Media. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3534–3546, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  31. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
  32. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. In EMC2̂: 5th Edition Co-located with NeurIPS’19.
  33. Utilizing social media data for pharmacovigilance: A review. Journal of Biomedical Informatics, 54:202–212.
  34. Increasing adverse drug events extraction robustness on social media: Case study on negation and speculation. Experimental Biology and Medicine (Maywood, N.J.), page 15353702221128577.
  35. Get To The Point: Summarization with Pointer-Generator Networks.
  36. LLaMA: Open and Efficient Foundation Language Models.
  37. The Russian Drug Reaction Corpus and neural models for drug reactions and effectiveness detection in user reviews. Bioinformatics (Oxford, England), 37(2):243–249.
  38. Attention is all you need. 31st Conference on Neural Information Processing Systems (NIPS 2017, page 11.
  39. It’s a man’s wikipedia? assessing gender inequality in an online encyclopedia. In Proceedings of the international AAAI conference on web and social media, volume 9, pages 454–463.
  40. Overview of the Seventh Social Media Mining for Health Applications (#SMM4H) Shared Tasks at COLING 2022. In Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task, pages 221–241, Gyeongju, Republic of Korea. Association for Computational Linguistics.
  41. Early identification of adverse drug reactions from search log data. Journal of Biomedical Informatics, 59:42–48.
  42. Social media mining for drug safety signal detection. In Proceedings of the 2012 International Workshop on Smart Health and Wellbeing, SHB ’12, pages 33–40, New York, NY, USA. Association for Computing Machinery.
  43. A machine learning approach to classification of drug reviews in Russian. In 2017 Ivannikov ISPRAS Open Conference (ISPRAS), pages 64–69.
  44. Annotation of adverse drug reactions in patients’ Weblogs. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 6769–6776.
  45. Medical Needs Extraction for Breast Cancer Patients from Question and Answer Services: Natural Language Processing-Based Approach. JMIR Cancer, 7(4):e32005.
  46. Cadec: A corpus of adverse drug event annotations. Journal of Biomedical Informatics, 55:73–81.
  47. Overview of the Fifth Social Media Mining for Health Applications (#SMM4H) Shared Tasks at COLING 2020. In Fifth Social Media Mining for Health Applications(#SMM4H) Shared Tasks at COLING 2020, page 10.
  48. Evaluation of text-processing algorithms for adverse drug event extraction from social media. In Proceedings of the First International Workshop on Social Media Retrieval and Analysis, SoMeRA ’14, pages 15–20, New York, NY, USA. Association for Computing Machinery.
  49. Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient’s Perspective. In Proceedings of the Language Resources and Evaluation Conference, pages 3637–3649, Marseille. European Language Resources Association.
  50. Analysis of the Full-Size Russian Corpus of Internet Drug Reviews with Complex NER Labeling Using Deep Learning Neural Networks and Language Models. Applied Sciences, 12(1):491.
  51. Detecting drugs and adverse events from Spanish social media streams. In Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi), pages 106–115, Gothenburg, Sweden. Association for Computational Linguistics.
  52. CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4003–4012, Marseille, France. European Language Resources Association.
  53. A systematic approach for developing a corpus of patient reported adverse drug events: A case study for SSRI and SNRI medications. Journal of Biomedical Informatics, 90:103091.
Citations (4)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.