Your Self-Play Algorithm is Secretly an Adversarial Imitator: Understanding LLM Self-Play through the Lens of Imitation Learning

Published 1 Feb 2026 in cs.LG | (2602.01357v1)

Abstract: Self-play post-training methods has emerged as an effective approach for finetuning LLMs and turn the weak LLM into strong LLM without preference data. However, the theoretical foundations for self-play finetuning remain underexplored. In this work, we tackle this by connecting self-play finetuning with adversarial imitation learning by formulating finetuning procedure as a min-max game between the model and a regularized implicit reward player parameterized by the model itself. This perspective unifies self-play imitation and general preference alignment within a common framework. Under this formulation, we present a game-theoretic analysis showing that the self-play finetuning will converge to it's equilibrium. Guided by this theoretical formulation, we propose a new self-play imitation finetuning algorithm based on the $χ^{2$-divergence} variational objective with bounded rewards and improved stability. Experiments on various of LLM finetuning tasks demonstrate consistent improvements over existing self-play methods and validate our theoretical insights.