Specify the target character for powerful language models
Characterize the normative persona or "character" that powerful language models should embody to reduce misalignment risks, including defining desirable traits, values, and behavioral commitments that should be instilled through pretraining and post-training.
References
Further, a better understanding of exactly what character powerful LLMs ought to have remains an open question.
— Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment
(2601.10160 - Tice et al., 15 Jan 2026) in Section 7, Future Work – Deep Character Training