Papers
Topics
Authors
Recent
Search
2000 character limit reached

Causal Interventions on Causal Paths: Mapping GPT-2's Reasoning From Syntax to Semantics

Published 28 Oct 2024 in cs.CL and cs.AI | (2410.21353v1)

Abstract: While interpretability research has shed light on some internal algorithms utilized by transformer-based LLMs, reasoning in natural language, with its deep contextuality and ambiguity, defies easy categorization. As a result, formulating clear and motivating questions for circuit analysis that rely on well-defined in-domain and out-of-domain examples required for causal interventions is challenging. Although significant work has investigated circuits for specific tasks, such as indirect object identification (IOI), deciphering natural language reasoning through circuits remains difficult due to its inherent complexity. In this work, we take initial steps to characterize causal reasoning in LLMs by analyzing clear-cut cause-and-effect sentences like "I opened an umbrella because it started raining," where causal interventions may be possible through carefully crafted scenarios using GPT-2 small. Our findings indicate that causal syntax is localized within the first 2-3 layers, while certain heads in later layers exhibit heightened sensitivity to nonsensical variations of causal sentences. This suggests that models may infer reasoning by (1) detecting syntactic cues and (2) isolating distinct heads in the final layers that focus on semantic relationships.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Abductive commonsense reasoning. ArXiv, abs/1908.05739, 2019. URL https://api.semanticscholar.org/CorpusID:201058651.
  2. Science in the age of large language models. Nature Reviews Physics, 5:277–280, 2023. URL https://api.semanticscholar.org/CorpusID:258361324.
  3. Language models are few-shot learners. ArXiv, abs/2005.14165, 2020. URL https://api.semanticscholar.org/CorpusID:218971783.
  4. What does bert look at? an analysis of bert’s attention. In BlackboxNLP@ACL, 2019. URL https://api.semanticscholar.org/CorpusID:184486746.
  5. Training verifiers to solve math word problems. ArXiv, abs/2110.14168, 2021. URL https://api.semanticscholar.org/CorpusID:239998651.
  6. Measuring causal effects of data statistics on language model’s ’factual’ predictions. ArXiv, abs/2207.14251, 2022. URL https://api.semanticscholar.org/CorpusID:251134985.
  7. A mathematical framework for transformer circuits. Transformer Circuits Thread, 2021. https://transformer-circuits.pub/2021/framework/index.html.
  8. Can large language models explain themselves? a study of llm-generated self-explanations. ArXiv, abs/2310.11207, 2023. URL https://api.semanticscholar.org/CorpusID:264172366.
  9. Attention is not explanation. In North American Chapter of the Association for Computational Linguistics, 2019. URL https://api.semanticscholar.org/CorpusID:67855860.
  10. Large language models are zero-shot reasoners. ArXiv, abs/2205.11916, 2022. URL https://api.semanticscholar.org/CorpusID:249017743.
  11. Learn to explain: Multimodal reasoning via thought chains for science question answering. ArXiv, abs/2209.09513, 2022. URL https://api.semanticscholar.org/CorpusID:252383606.
  12. Are emergent abilities in large language models just in-context learning? ArXiv, abs/2309.01809, 2023. URL https://api.semanticscholar.org/CorpusID:261531236.
  13. Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372, 2022.
  14. Zoom in: An introduction to circuits. Distill, 2020. doi: 10.23915/distill.00024.001. https://distill.pub/2020/circuits/zoom-in.
  15. In-context learning and induction heads. Transformer Circuits Thread, 2022. https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html.
  16. Language models are unsupervised multitask learners. 2019. URL https://api.semanticscholar.org/CorpusID:160025533.
  17. Counterfactual interventions reveal the causal effect of relative clause representations on agreement prediction. In Conference on Computational Natural Language Learning, 2021. URL https://api.semanticscholar.org/CorpusID:234681155.
  18. A primer in bertology: What we know about how bert works. Transactions of the Association for Computational Linguistics, 8:842–866, 2020. URL https://api.semanticscholar.org/CorpusID:211532403.
  19. Are emergent abilities of large language models a mirage? ArXiv, abs/2304.15004, 2023. URL https://api.semanticscholar.org/CorpusID:258418299.
  20. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. ArXiv, abs/2206.04615, 2022. URL https://api.semanticscholar.org/CorpusID:263625818.
  21. Analyzing the structure of attention in a transformer language model. In BlackboxNLP@ACL, 2019. URL https://api.semanticscholar.org/CorpusID:184486755.
  22. Interpretability in the wild: a circuit for indirect object identification in gpt-2 small. arXiv preprint arXiv:2211.00593, 2022.
  23. Emergent abilities of large language models. ArXiv, abs/2206.07682, 2022a. URL https://api.semanticscholar.org/CorpusID:249674500.
  24. Chain of thought prompting elicits reasoning in large language models. ArXiv, abs/2201.11903, 2022b. URL https://api.semanticscholar.org/CorpusID:246411621.
  25. Attention is not not explanation. In Conference on Empirical Methods in Natural Language Processing, 2019. URL https://api.semanticscholar.org/CorpusID:199552244.
  26. Scaling relationship on learning mathematical reasoning with large language models. ArXiv, abs/2308.01825, 2023. URL https://api.semanticscholar.org/CorpusID:260438790.
  27. From recognition to cognition: Visual commonsense reasoning. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6713–6724, 2018. URL https://api.semanticscholar.org/CorpusID:53734356.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.