Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition

Published 8 Oct 2024 in cs.LG, cs.AI, and cs.CL | (2410.05603v1)

Abstract: LLMs have demonstrated remarkable in-context learning (ICL) capabilities. In this study, we explore a surprising phenomenon related to ICL: LLMs can perform multiple, computationally distinct ICL tasks simultaneously, during a single inference call, a capability we term "task superposition". We provide empirical evidence of this phenomenon across various LLM families and scales and show that this phenomenon emerges even if we train the model to in-context learn one task at a time. We offer theoretical explanations that this capability is well within the expressive power of transformers. We also explore how LLMs internally compose task vectors during superposition. Furthermore, we show that larger models can solve more ICL tasks in parallel, and better calibrate their output distribution. Our findings offer insights into the latent capabilities of LLMs, further substantiate the perspective of "LLMs as superposition of simulators", and raise questions about the mechanisms enabling simultaneous task execution.

Abstract PDF HTML Upgrade to Chat

Summary

The paper empirically demonstrates that LLMs perform multiple distinct tasks concurrently through in-context learning (task superposition).
It reveals that even models initially trained for single tasks, such as a small GPT-2 on retrieval, can exhibit task superposition during inference.
The study provides theoretical insights via task vector analysis and scaling, confirming Transformers’ inherent capacity for multitasking.

Task Superposition in LLMs

The paper, "Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition," explores the intriguing phenomenon of task superposition in LLMs. This study demonstrates that LLMs can perform multiple distinct tasks simultaneously during a single inference, a capability termed as "task superposition." The research offers empirical evidence across various LLM families and scales, and provides theoretical explanations suggesting that this capability is inherent to the expressive power of Transformers.

Overview of Findings

The authors present several key findings:

Empirical Evidence of Task Superposition: The paper presents comprehensive empirical evidence indicating that LLMs such as GPT-3.5 and Llama-3 can execute multiple In-Context Learning (ICL) tasks concurrently. When prompted with examples from multiple tasks, the models generate solutions corresponding to all tasks present. This phenomenon is prevalent across different tasks and LLM families.
Training Insights: It is shown that task superposition can emerge even when models are trained to learn a single task at a time. This is demonstrated by training a small GPT-2 model from scratch on retrieval tasks, yet it still showcases task superposition during inference.
Theoretical Construction: A theoretical construction is provided to demonstrate that a seven-layer Transformer can perform task superposition. This supports the assertion that Transformers inherently have the capacity to handle multiple tasks simultaneously.
Task Vector Analysis: The study explores the internal workings of LLMs, exploring how task vectors—vector representations of tasks in the embedding space—are combined during task superposition. It is observed that LLMs internally compose these vectors, which substantiates the task superposition effect.
Model Scaling: As LLMs scale in model size, they can handle more tasks in parallel and improve alignment between output distribution and in-context task distributions. Larger models are better equipped to reflect task distributions found in prompts.

Implications and Future Directions

The implications of this research are both practical and theoretical. From a practical standpoint, understanding task superposition could lead to more efficient utilization of LLMs in applications requiring multitasking capabilities. Theoretically, this work provides insights into the latent capabilities of LLMs and pushes forward the perspective of “LLMs as superposition of simulators.”

Additionally, the finding that task vectors internally combine during task superposition opens new avenues for exploring the mechanistic operations of LLMs. However, the current challenge of leveraging this ability due to generation collapse—where LLMs tend to commit to a single task after initial token generation—invites future research to develop decoding strategies that maintain simultaneous task execution.

Conclusion

This paper significantly contributes to our understanding of LLMs' in-context learning capabilities, revealing their ability to perform task superposition. By doing so, it enriches our comprehension of how these models can be harnessed for complex multitasking scenarios, further inviting exploration into decoding strategies that can fully capitalize on this capability. These findings are poised to influence both the development of LLM technologies and the theoretical frameworks through which we understand neural network function.

Markdown Report Issue