What constitutes a domain for an LLM?

Characterize what constitutes a domain for a large language model using a model-native definition that can be operationalized to study expert specialization and routing behavior, rather than relying on externally imposed human categories.

Background

To quantify expert specialization, the paper proposes defining domains via unsupervised clustering of the model’s output embedding matrix, arguing this captures the model’s own semantic structure. This approach is motivated by the lack of a settled, model-internal definition of domains.

Clarifying a model-native notion of domains would enable principled evaluation of routing and functional specialization, and help distinguish between broad semantic domains and fine-grained task roles attributed to experts.

References

What constitutes a domain for an LLM remains an open question.

The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level  (2604.02178 - Herbst et al., 2 Apr 2026) in Section 6.2, Experts in the output embedding space