Efficient hyperparameter tuning for LLM-JEPA
Develop an efficient hyperparameter tuning method for the LLM-JEPA training objective to explore and identify optimal values of the JEPA loss weight λ and the number of predictor tokens k used in the tied-weights predictor, reducing the significant cost of grid search given that the best accuracy may occur anywhere within the tested grid (λ, k) ∈ {0.5, 1.0, 2.0, 4.0} × {0, 1, 2, 3, 4}.
References
While we have not identified an efficient method to explore this space, we empirically observe that adjacent grid points often yield similar accuracy, suggesting the potential for a more efficient tuning algorithm.
— LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures
(2509.14252 - Huang et al., 11 Sep 2025) in Appendix, Subsection "Hyperparameter Tuning for LLM-JEPA"