Identify or develop the optimal neural audio codec for general-purpose self-supervised speech representation learning
Determine which neural audio codec, in terms of architecture, training methodology, and quantization strategy, produces discrete unit sequences that are optimally suited for general-purpose self-supervised speech representation learning across diverse downstream speech tasks, when these units are used as the exclusive input during pre-training.
References
Identifying or developing the optimal codec—whose discrete units are ideally suited for general-purpose speech representation learning—remains an open research question.
— Codec2Vec: Self-Supervised Speech Representation Learning Using Neural Speech Codecs
(2511.16639 - Tseng et al., 20 Nov 2025) in Section: Discussion and Limitations, bullet “Codec Selection”