Papers
Topics
Authors
Recent
Search
2000 character limit reached

MaiBaam Annotation Guidelines

Published 9 Mar 2024 in cs.CL | (2403.05902v2)

Abstract: This document provides the annotation guidelines for MaiBaam, a Bavarian corpus manually annotated with part-of-speech (POS) tags, syntactic dependencies, and German lemmas. MaiBaam belongs to the Universal Dependencies (UD) project, and our annotations elaborate on the general and German UD version 2 guidelines. In this document, we detail how to preprocess and tokenize Bavarian data, provide an overview of the POS tags and dependencies we use, explain annotation decisions that would also apply to closely related languages like German, and lastly we introduce and motivate decisions that are specific to Bavarian grammar.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Noëmi Aepli and Simon Clematide (2018). ``Parsing approaches for Swiss German.'' In Proceedings of the 3rd Swiss Text Analytics Conference (SwissText).
  2. Hans Altmann (1984). ``Das System der enklitischen Personalpronomina in einer mittelbairischen Mundart.'' Zeitschrift für Dialektologie und Linguistik, 51.
  3. Josef Bayer (1984). ``COMP in Bavarian syntax.'' Linguistic Review, 3(3).
  4. ——— (1993). ``Zum in Bavarian and scrambling.'' In Werner Abraham and Josef Bayer, editors, Dialektsyntax. Westdeutscher Verlag.
  5. ——— (2013). ``Klitisierung, Reanalyse und die Lizensierung von Nullformen: zwei Beispiele aus dem Bairischen.'' In Werner Abraham and Elisabeth Leiss, editors, Dialektologie in neuem Gewand. Zu Mikro-/Varietätenlinguistik, Sprachenvergleich und Universalgrammatik, volume 19 of Linguistische Berichte, Sonderhefte. Buske.
  6. Josef Bayer and Ellen Brandner (2004). ``Klitisiertes zu im Bairischen und Alemannischen.'' In Morphologie und Syntax deutscher Dialekte und Historische Dialektologie des Deutschen: Beiträge zum 1. Kongress der Internationalen Gesellschaft für Dialektologie des Deutschen.
  7. ``HDT-UD: A very large Universal Dependencies treebank for German.'' In Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019), pp. 46–57. Association for Computational Linguistics.
  8. ``Structures of adnominal possession in Austria’s traditional dialects: Variation and change.'' Journal of Linguistic Geography, 9(2):69–85.
  9. ``Universal Dependencies.'' Computational Linguistics, 47(2):255–308.
  10. Jürg Fleischer (2019). ``Vergleichende Aspekte der deutschen Regionalsprachen: Syntax.'' In Joachim Herrgen and Jürgen Erich Schmidt, editors, Deutsch, pp. 635–664. De Gruyter Mouton.
  11. Stefan Grünewald and Annemarie Friedrich (2020). ``Unifying the treatment of preposition-determiner contractions in German Universal Dependencies treebanks.'' In Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020), pp. 94–98. Association for Computational Linguistics.
  12. Georg F.K. Höhn (2021). ``Towards a consistent annotation of nominal person in Universal Dependencies.'' In Proceedings of the Fifth Workshop on Universal Dependencies (UDW, SyntaxFest 2021), pp. 75–83. Association for Computational Linguistics.
  13. Daniel Hole (2002). ``Agentive selbst in German.'' In Proceedings of Sinn und Bedeutung 6, pp. 133–150.
  14. ``Zur Dynamik bairischer Dialektsyntax – eine Pilotstudie.'' Zeitschrift für Dialektologie und Linguistik, 81(1).
  15. ``Universal Dependency annotation for multilingual parsing.'' In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 92–97. Association for Computational Linguistics.
  16. Ludwig Merkle (1993). Bairische Grammatik, 5th edition. Heinrich Hugendubel Verlag.
  17. Ann-Marie Moser (2023). ``The ups and downs of relative particles in German diachrony: On loss, grammaticalization, and standardization.'' Journal of Historical Linguistics, 13(3).
  18. Karin Pittner (1996). ``Attraktion, Tilgung und Verbposition: Zur diachronen und dialektalen Variation beim Relativpronomen im Deutschen.'' In Ellen Brandner and Gisella Ferraresi, editors, Language Change and Generative Grammar, volume 7 of Linguistische Berichte Sonderhefte, pp. 120–153. Westdeutscher Verlag.
  19. Alessio Salomoni (2017). ``Toward a treebank collecting German aesthetic writings of the late 18th century.'' In Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017. Accademia University Press.
  20. ``Guidelines für das Tagging deutscher Textcorpora mit STTS (kleines und großes Tagset).''
  21. Nathan Schneider and Amir Zeldes (2021). ``Mischievous nominal constructions in Universal Dependencies.'' In Proceedings of the Fifth Workshop on Universal Dependencies (UDW, SyntaxFest 2021), pp. 160–172. Association for Computational Linguistics.
  22. ``Towards a balanced annotated Low Saxon dataset for diachronic investigation of dialectal variation.'' In Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021), pp. 242–246. KONVENS 2021 Organizers.
  23. ``A gold standard dependency corpus for English.'' In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014).
  24. Helmut Weiß (1998). Syntax des Bairischen. Max Niemeyer Verlag.
  25. Peter Wiesinger (1983). ``Die Einteilung der deutschen Dialekte.'' In Werner Besch, Ulrich Knoop, Wolfgang Putschke, and Herbert Ernst Wiegand, editors, Dialektologie: Ein Handbuch zur deutschen und allgemeinen Dialektforschung, pp. 807–900. Walter de Gruyter.
  26. Ludwig Zehetner (1978). ``Kontrastive Morphologie: Bairisch/Einheitssprache.'' In Ulrich Ammon, editor, Grundlagen einer dialektorientierten Sprachdidaktik : theoretische und empirische Beiträge zu einem vernachlässigten Schulproblem. Beltz.
  27. Amir Zeldes (2017). ``The GUM corpus: Creating multilayer resources in the classroom.'' Language Resources and Evaluation, 51(3):581–612.
  28. Daniel Zeman (2021). ``Date and time in Universal Dependencies.'' In Proceedings of the Fifth Workshop on Universal Dependencies (UDW, SyntaxFest 2021), pp. 173–193. Association for Computational Linguistics.
  29. ``Universal Dependencies 2.12.'' LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University.
  30. ``CoNLL 2017 shared task: Multilingual parsing from raw text to Universal Dependencies.'' In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 1–19. Association for Computational Linguistics.

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.