Autonomy of LLMs for production-scale, specification-driven software construction
Determine whether large language models can autonomously build production-scale software systems from explicit specifications.
References
Although LLMs have demonstrated impressive coding capabilities, their ability to autonomously build production-scale software from explicit specifications remains an open question.
— SWE-AGI: Benchmarking Specification-Driven Software Construction with MoonBit in the Era of Autonomous Agents
(2602.09447 - Zhang et al., 10 Feb 2026) in Abstract (Page 1)