Adapting language models to handle sequences longer than those seen during training
Develop effective methods to adapt pretrained transformer-based language models to handle inference sequences that exceed the training context length seen during pretraining.
References
Given the rapidly growing costs of self-attention, adapting LMs for longer sequences than those seen during training has been a longstanding open problem.
— Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings
(2512.12167 - Gelberg et al., 13 Dec 2025) in Section 2, Context extension for RoPE