Downstream performance impact of SuperBPE at Trinity’s experimental scale
Determine whether the SuperBPE tokenizer—trained by first using standard whitespace splitting, truncating, and then resuming training without the whitespace constraint to learn multi‑word tokens—yields measurable downstream performance improvements for Trinity language models at the experimental scale considered, beyond the demonstrated token compression gains.
References
While our SuperBPE variant achieved substantially better compression — particularly on English text (~29% fewer tokens) and reasoning traces (~27% fewer tokens) — we were unable to reproduce a corresponding improvement in downstream model performance at our experimental scale.
— Arcee Trinity Large Technical Report
(2602.17004 - Singh et al., 19 Feb 2026) in Subsubsection “Vocabulary Size” under “Tokenizer” (Architecture, Section 2)