Credit horizon selection in LLM-based generative optimization
Determine the appropriate number of steps from a multi-step agentic system’s execution trace to include in the learning context (the credit horizon) when using large language model–based generative optimization, specifically assessing whether to optimize using immediate per-step feedback or only after observing all feedback for the full multi-step process, so as to guide effective updates to the system.
References
More generally, it is unclear how many steps of the process's execution trace should be included in the learning context for the optimizer. Should we optimize the agent for instantaneous feedback, or should we only optimize it until all feedback in the multi-step process has been observed?
— Understanding the Challenges in Iterative Generative Optimization with LLMs
(2603.23994 - Nie et al., 25 Mar 2026) in Learning Context (Credit Horizon), Section 2