Designing a self-supervised objective for high-level language representations
Develop a self-supervised pretraining objective specifically tailored to high-level language representations that operates beyond token-level prediction, enabling language models to learn and predict in an abstract representation space rather than directly over discrete tokens.
References
Despite these advancements, designing a self-supervised pretraining objective specifically for high-level language representations remains an open challenge.
— Next Concept Prediction in Discrete Latent Space Leads to Stronger Language Models
(2602.08984 - Liu et al., 9 Feb 2026) in Section 6.2 (Related Work: Abstract-level Modeling in Language Models)