2000 character limit reached
On the Scaling Laws of Geographical Representation in Language Models
Published 29 Feb 2024 in cs.CL and cs.AI | (2402.19406v2)
Abstract: LLMs have long been shown to embed geographical information in their hidden representations. This line of work has recently been revisited by extending this result to LLMs. In this paper, we propose to fill the gap between well-established and recent literature by observing how geographical knowledge evolves when scaling LLMs. We show that geographical knowledge is observable even for tiny models, and that it scales consistently as we increase the model size. Notably, we observe that larger LLMs cannot mitigate the geographical bias that is inherent to the training data.
- On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, New York, NY, USA. Association for Computing Machinery.
- Pythia: A suite for analyzing large language models across training and scaling. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 2397–2430. PMLR.
- GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow. If you use this software, please cite it using these metadata.
- ELECTRA: pre-training text encoders as discriminators rather than generators. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
- Unsupervised cross-lingual representation learning at scale.
- What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2126–2136, Melbourne, Australia. Association for Computational Linguistics.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Fahim Faisal and Antonios Anastasopoulos. 2021. Investigating post-pretraining representation alignment for cross-lingual question answering. In Proceedings of the 3rd Workshop on Machine Reading for Question Answering, pages 133–148, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Fahim Faisal and Antonios Anastasopoulos. 2022. Geographic and geopolitical biases of language models. In arXiv:2212.10408, Online.
- Dataset geography: Mapping language data to language users. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3381–3411, Dublin, Ireland. Association for Computational Linguistics.
- Corrado Gini. 1912. Variabilità e mutabilità: contributo allo studio delle distribuzioni e delle relazioni statistiche.[Fasc. I.]. Tipogr. di P. Cuppini.
- Distributional vectors encode referential attributes. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 12–21, Lisbon, Portugal. Association for Computational Linguistics.
- Wes Gurnee and Max Tegmark. 2023. Language models represent space and time.
- Deberta: Decoding-enhanced BERT with disentangled attention. CoRR, abs/2006.03654.
- Arthur E. Hoerl and Robert W. Kennard. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67.
- Robust disambiguation of named entities in text. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 782–792, Edinburgh, Scotland, UK. Association for Computational Linguistics.
- Dieuwke Hupkes and Willem Zuidema. 2018. Visualisation and ’diagnostic classifiers’ reveal how recurrent and recursive neural networks process hierarchical structure (extended abstract). In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 5617–5621. International Joint Conferences on Artificial Intelligence Organization.
- Shielded representations: Protecting sensitive attributes through iterative gradient-based projection. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5961–5977, Toronto, Canada. Association for Computational Linguistics.
- What does BERT learn about the structure of language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3651–3657, Florence, Italy. Association for Computational Linguistics.
- Arne Köhn. 2015. What’s in an embedding? analyzing word embeddings through multilingual evaluation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2067–2073, Lisbon, Portugal. Association for Computational Linguistics.
- Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
- Max M. Louwerse and Nick Benesh. 2012. Representing spatial structure through maps and language: Lord of the rings encodes the spatial structure of middle earth. Cognitive Science, 36(8):1556–1569.
- Bootleg: Chasing the tail with self-supervised named entity disambiguation.
- Language models are unsupervised multitask learners.
- Null it out: Guarding protected attributes by iterative nullspace projection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7237–7256, Online. Association for Computational Linguistics.
- Does string-based neural MT learn source syntax? In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1526–1534, Austin, Texas. Association for Computational Linguistics.
- mgpt: Few-shot learners go multilingual.
- Llama: Open and efficient foundation language models.
- Well-read students learn better: On the importance of pre-training compact models.
- Opt: Open pre-trained transformer language models.
- Gender bias in coreference resolution: Evaluation and debiasing methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 15–20, New Orleans, Louisiana. Association for Computational Linguistics.
- Frequency-based distortions in contextualized word embeddings.
- The pile: An 800gb dataset of diverse text for language modeling.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.