Fully unsupervised detection of the lexical identity subspace (LIS)
Develop a fully unsupervised method to identify the lexical identity subspace in MLP intermediate activations of transformer language models, enabling removal of lexical-identity directions without relying on WordNet sense annotations or other labeled resources.
References
This partial recovery suggests the subspace is identifiable without labels, though a fully unsupervised method remains open.
— Polysemanticity or Polysemy? Lexical Identity Confounds Superposition Metrics
(2604.00443 - Hou et al., 1 Apr 2026) in Appendix, Section "Unsupervised LIS detection"