Fully unsupervised detection of the lexical identity subspace (LIS)

Develop a fully unsupervised method to identify the lexical identity subspace in MLP intermediate activations of transformer language models, enabling removal of lexical-identity directions without relying on WordNet sense annotations or other labeled resources.

Background

The authors construct a lexical identity subspace (LIS) by contrasting representations of words with their WordNet synonyms and show that projecting out a small number of LIS dimensions substantially reduces the lexical confound.

They provide a preliminary, label-free approximation using nearest-neighbor substitutes for synonyms but note that this only partially recovers the LIS, explicitly stating that a fully unsupervised method remains open.

References

This partial recovery suggests the subspace is identifiable without labels, though a fully unsupervised method remains open.

Polysemanticity or Polysemy? Lexical Identity Confounds Superposition Metrics  (2604.00443 - Hou et al., 1 Apr 2026) in Appendix, Section "Unsupervised LIS detection"