Papers
Topics
Authors
Recent
Search
2000 character limit reached

German's Next Language Model

Published 21 Oct 2020 in cs.CL and cs.LG | (2010.10906v4)

Abstract: In this work we present the experiments which lead to the creation of our BERT and ELECTRA based German LLMs, GBERT and GELECTRA. By varying the input training data, model size, and the presence of Whole Word Masking (WWM) we were able to attain SoTA performance across a set of document classification and named entity recognition (NER) tasks for both models of base and large size. We adopt an evaluation driven approach in training these models and our results indicate that both adding more data and utilizing WWM improve model performance. By benchmarking against existing German models, we show that these models are the best German models to date. Our trained models will be made publicly available to the research community.

Citations (252)

Summary

  • The paper outlines detailed formatting standards and submission instructions for COLING-2020, ensuring consistency and clarity in manuscript preparation.
  • The paper advocates for using LaTeX with precise specifications on fonts, margins, and layout to generate compliant and professional PDF submissions.
  • The paper emphasizes ethical practices with a double-blind review process and mandatory CC-BY licensing, promoting open-access research dissemination.

Overview of COLING-2020 Proceedings Instructions

This paper serves as a comprehensive guide for authors submitting their work to the COLING-2020 conference, detailing specifications that ensure uniformity and adherence to the expected format. The document not only exemplifies its own guidelines but also provides historical context by tracing its evolution through previous COLING and ACL proceedings. The guidelines cover a wide array of aspects related to manuscript preparation, including formatting standards, submission requirements, and specifics for the camera-ready versions of accepted papers.

Formatting and Manuscript Preparation

The authors emphasize the importance of uniformity in manuscript formatting to maintain consistency throughout the conference proceedings. The document outlines the necessity for using a single-column format on A4 paper, with strict margin and font guidelines. Notably, Times Roman or Times New Roman are recommended for uniform appearance across submissions.

The paper includes a detailed section on electronic manuscript preparation, strongly favoring the use of \LaTeX{} over Microsoft Word due to its efficiency in creating compliant PDF files. Adherence to the COLING 2020 style file is emphasized to minimize discrepancies.

Submission and Review Process

An essential component of the document addresses the submission process, highlighting the requirement for authors to present their work anonymously to ensure a double-blind review process. The instructions elaborate on managing citations, self-references, and the presentation of author information in both the submission and the final camera-ready paper.

Numerical Results and Specifications

Among the numerical specifications, the paper delineates font sizes for various sections of the manuscript, explicitly setting expectations to aid in maintaining a standard format. For instance, it specifies a 15 pt bold font for paper titles, with a decremental scale for other text elements, such as 11 pt for the main document text and 10 pt for the bibliography.

Licensing and Ethical Considerations

A significant emphasis is placed on ethical considerations by mandating that final papers be licensed under Creative Commons Attribution 4.0 International Licence (CC-BY). This requirement underscores the conference's commitment to open access, allowing for adaptation and redistribution of research while ensuring proper author attribution.

Implications and Future Developments

The paper's meticulous approach to formatting and submission guidelines reflects an ongoing effort within the academic community to streamline the dissemination process, facilitating a barrier-free exchange of ideas and research findings. As academic conferences continue to expand their reach and audience, the emphasis on standardized guidelines will remain fundamental in managing the ever-increasing volume of submissions.

Future conferences may consider further enhancements to the submission process, possibly incorporating automated tools for format verification and increasing support for diverse manuscript preparation platforms. Additionally, embracing more inclusive policies for non-English terms could broaden the accessibility and comprehension of research for a global audience.

Conclusion

In summary, the instructions for the COLING-2020 proceedings provide a detailed and structured approach to manuscript preparation, ensuring consistency and professionalism across all submissions. By adhering to these guidelines, authors contribute to a cohesive and accessible body of work that supports the advancement of computational linguistics research.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.