Efficacy of agentic systems on long-horizon, weakly supervised research loops

Determine whether existing large language model–based agentic systems can effectively tackle long-horizon scientific research loops in AI development that are costly to execute and weakly supervised, rather than only performing well on well-scoped tasks with rapid feedback.

Background

The paper motivates ASI-Evolve by noting that most recent agentic systems demonstrate strong performance on narrowly scoped tasks with immediate feedback but provide limited evidence for effectiveness on costly, long-horizon research cycles that drive real AI progress. These cycles involve hypothesis generation, implementation across large codebases, expensive experimentation, and synthesis of complex, multi-dimensional feedback signals.

The authors frame this as a central uncertainty motivating their work and propose ASI-Evolve to address it through a learn–design–experiment–analyze loop augmented with a cognition base and analyzer. While the paper presents evidence that ASI-Evolve can make progress in this regime, the abstract explicitly highlights the unresolved question regarding whether agentic systems can tackle such long-horizon, weakly supervised research loops.

References

While recent agentic systems have shown strong performance on well-scoped tasks with rapid feedback, it remains unclear whether they can tackle the costly, long-horizon, and weakly supervised research loops that drive real AI progress.

— ASI-Evolve: AI Accelerates AI (2603.29640 - Xu et al., 31 Mar 2026) in Abstract (Page 1)

Efficacy of agentic systems on long-horizon, weakly supervised research loops

Background

References

Related Problems