Finding representations in language models
Determine principled and reliable methods to find latent representations corresponding to linguistic abstractions in deep neural language models (LMs).
References
However, finding representations in LMs remains an open problem.
— Perturbation: A simple and efficient adversarial tracer for representation learning in language models
(2603.23821 - Rozner et al., 25 Mar 2026) in Section 1: Introduction