Generalization of JEPA+DAAM to tonal and morphologically rich languages

Determine whether the JEPA encoder with Density Adaptive Attention for self-supervised speech representation learning, evaluated only on English LibriLight data, generalizes to tonal and morphologically rich languages beyond English.

Background

The proposed framework trains and evaluates a JEPA encoder augmented with Density Adaptive Attention on English speech from the LibriLight corpus. As noted in the limitations, the experiments are monolingual and do not assess behavior on languages with different phonological and morphological properties.

The authors explicitly state that generalization to tonal and morphologically rich languages remains open, highlighting the need to evaluate whether the learned representations and associated tokenization maintain quality across diverse language families.

References

Generalization to tonal and morphologically rich languages remains open.

JEPA as a Neural Tokenizer: Learning Robust Speech Representations with Density Adaptive Attention  (2512.07168 - Ioannides et al., 8 Dec 2025) in Limitations and Future Work, item 2 (Monolingual evaluation)