Persistence of the normative shift at extreme model scales

Ascertain whether the alignment-induced normative shift and the base-model predictive advantage persist, weaken, or disappear at extreme model scales beyond those tested, thereby determining if the effect is inherent to alignment or mitigated by increased model capacity.

Background

The paper reports that the base-model advantage tends to grow with model size within the evaluated range, suggesting that richer pre-training representations may be shifted by alignment.

It remains unresolved whether this trend continues at much larger scales or whether extremely capable models reduce or eliminate the normative–descriptive trade-off introduced by alignment.

References

Several open questions follow naturally. Finally, testing whether the effect persists at extreme scale would clarify whether the normative shift is inherent to alignment or diminishes as models grow more capable.

— Alignment Makes Language Models Normative, Not Descriptive (2603.17218 - Shapira et al., 17 Mar 2026) in Discussion and Conclusion

Persistence of the normative shift at extreme model scales

Background

References

Related Problems