Causes and mitigation of train–evaluation gap under high reset probabilities
Ascertain the underlying causes of the observed train–evaluation performance gap at high frontier reset probabilities in Frontier Checkpointing and investigate whether randomizing checkpoint states across procedurally generated layouts recovers the benefits of high reset without incurring overfitting.
References
While the train-eval gap at high reset probabilities shown in Figure~\ref{fig:eval-train-gap} suggests overfitting to checkpoint configurations, we cannot definitively rule out alternative explanations. Future work should investigate whether diversity interventions such as randomizing checkpoints across procedurally generated layouts can recover the benefits of high reset without the overfitting cost.
— SCALAR: Learning and Composing Skills through LLM Guided Symbolic Planning and Deep RL Grounding
(2603.09036 - Zabounidis et al., 10 Mar 2026) in Appendix, Section "Limitations and Future Work"