What makes for good action tokenizers for VLA optimization
Determine the specific design principles and criteria that make discrete action tokenizers effective for optimizing Vision-Language-Action models that fine-tune Vision-Language Models under a native autoregressive paradigm, explicitly beyond reconstruction fidelity metrics and with respect to training efficiency and downstream performance.
References
Consequently, the fundamental question of what makes for good action tokenizers remains unanswered.
— ActionCodec: What Makes for Good Action Tokenizers
(2602.15397 - Dong et al., 17 Feb 2026) in Section 1 (Abstract), Page 1