Alignment methods that preserve behavioral distributions while adding helpfulness
Develop alignment techniques that maintain the empirical distribution of human strategic behavior encoded in pre-trained models while simultaneously improving helpfulness and instruction-following, thereby avoiding distributional collapse that harms behavioral prediction.
References
Several open questions follow naturally. From an alignment perspective, developing methods that preserve empirical behavioral distributions while adding helpfulness is a natural direction.
— Alignment Makes Language Models Normative, Not Descriptive
(2603.17218 - Shapira et al., 17 Mar 2026) in Discussion and Conclusion