Persistence of the memorization signature under non–cross-entropy training paradigms
Ascertain whether the architecture-invariant memorization signature detectable after supervised fine-tuning with cross-entropy loss persists when language models are trained or adapted using alternative paradigms such as reinforcement learning from human feedback, direct preference optimization, instruction tuning, or continual pretraining.
References
Whether it persists under other training paradigms remains open.
— Learning the Signature of Memorization in Autoregressive Language Models
(2604.03199 - Ilić et al., 3 Apr 2026) in Discussion, Beyond Cross-Entropy Fine-Tuning subsection