Unclear evolution of LLM-driven architecture synthesis under iterative refinement

Characterize how Large Language Model–driven neural architecture synthesis evolves under iterative supervised refinement, focusing specifically on changes in syntactic validity (compilation and execution reliability), structural novelty at the source-code level, and the maintenance of architectural diversity as the generator specializes.

Background

The paper positions a code-oriented LLM as a generator of PyTorch neural architectures and studies its behavior across 22 cycles of generate–evaluate–select–fine-tune. The authors emphasize that prior work has focused on search outcomes rather than the reliability and diversity of the generator itself. As a result, they flag uncertainty around how the generator’s output distribution changes under iterative refinement, particularly in terms of validity, novelty, and diversity retention.

This problem is central to evaluating whether LLMs can be shaped into robust architectural priors that produce executable, high-quality, and diverse models, beyond simply optimizing for peak accuracy.

References

Specifically, it remains unclear how LLM-driven synthesis evolves under iterative refinement, particularly regarding syntactic validity, structural novelty, and the ability to maintain diversity as the model specializes.

From Memorization to Creativity: LLM as a Designer of Novel Neural-Architectures  (2601.02997 - Khalid et al., 6 Jan 2026) in Section 1 (Introduction)