Papers
Topics
Authors
Recent
Search
2000 character limit reached

Exploring the LLM Journey from Cognition to Expression with Linear Representations

Published 27 May 2024 in cs.CL and cs.AI | (2405.16964v2)

Abstract: This paper presents an in-depth examination of the evolution and interplay of cognitive and expressive capabilities in LLMs, with a specific focus on Baichuan-7B and Baichuan-33B, an advanced bilingual (Chinese and English) LLM series. We define and explore the model's cognitive and expressive capabilities through linear representations across three critical phases: Pretraining, Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF). Cognitive capability is defined as the quantity and quality of information conveyed by the neuron output vectors within the network, similar to the neural signal processing in human cognition. Expressive capability is defined as the model's capability to produce word-level output. Our findings unveil a sequential development pattern, where cognitive abilities are largely established during Pretraining, whereas expressive abilities predominantly advance during SFT and RLHF. Statistical analyses confirm a significant correlation between the two capabilities, suggesting that cognitive capacity may limit expressive potential. The paper also explores the theoretical underpinnings of these divergent developmental trajectories and their connection to the LLMs' architectural design. Moreover, we evaluate various optimization-independent strategies, such as few-shot learning and repeated sampling, which bridge the gap between cognitive and expressive capabilities. This research reveals the potential connection between the hidden space and the output space, contributing valuable insights into the interpretability and controllability of their training processes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644, 2016.
  3. A latent variable model approach to pmi-based word embeddings. Transactions of the Association for Computational Linguistics, 4:385–399, 2016.
  4. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861, 2021.
  5. Baichuan. Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305, 2023. URL https://arxiv.org/abs/2309.10305.
  6. Evaluating hallucinations in chinese large language models. arXiv preprint arXiv:2310.03368, 2023.
  7. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457, 2018.
  8. A framework for few-shot language model evaluation, 12 2023. URL https://zenodo.org/records/10256836.
  9. Arcee’s mergekit: A toolkit for merging large language models. arXiv preprint arXiv:2403.13257, 2024.
  10. Linearity of relation decoding in transformer language models. arXiv preprint arXiv:2308.09124, 2023.
  11. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pp.  2668–2677. PMLR, 2018.
  12. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
  13. Race: Large-scale reading comprehension dataset from examinations. arXiv preprint arXiv:1704.04683, 2017.
  14. Emergent world representations: Exploring a sequence model trained on a synthetic task. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=DeG07_TcZvT.
  15. Let’s verify step by step. arXiv preprint arXiv:2305.20050, 2023.
  16. Aligning large language models with human preferences through representation engineering. arXiv preprint arXiv:2312.15997, 2023.
  17. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
  18. Can a suit of armor conduct electricity? a new dataset for open book question answering. arXiv preprint arXiv:1809.02789, 2018.
  19. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168, 2013a.
  20. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26, 2013b.
  21. The strange geometry of skip-gram with negative sampling. In Empirical Methods in Natural Language Processing, 2017.
  22. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  23. The linear representation hypothesis and the geometry of large language models. arXiv preprint arXiv:2311.03658, 2023.
  24. Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937, 2018.
  25. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  26. Activation addition: Steering language models without optimization. arXiv preprint arXiv:2308.10248, 2023.
  27. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  28. Ties-merging: Resolving interference when merging models. Advances in Neural Information Processing Systems, 36, 2024.
  29. Large language models as optimizers. arXiv preprint arXiv:2309.03409, 2023.
  30. Explainability for large language models: A survey. ACM Transactions on Intelligent Systems and Technology, 2023.
  31. Representation engineering: A top-down approach to ai transparency. arXiv preprint arXiv:2310.01405, 2023.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (4)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.