NNGPT Framework: Next-Gen AutoML

Updated 2 December 2025

NNGPT is a LLM-driven framework that integrates automated neural architecture search, hyperparameter tuning, and code synthesis in a closed-loop pipeline.
It employs five synergistic LLM-based pipelines, including zero-shot synthesis, hyperparameter recommendation, and code-aware performance prediction, to enhance efficiency.
The framework achieves state-of-the-art results by continuously improving models through reward-driven fine-tuning and rigorous validation on benchmarks like the LEMUR corpus.

NNGPT refers to a LLM–centric framework for self-improving neural network generation and optimization, proposed as a foundation for next-generation AutoML. NNGPT reconceptualizes automated neural architecture search, hyperparameter tuning, and code synthesis as closed-loop tasks orchestrated by an LLM agent, integrating prompt-based model generation, validation, code execution, performance prediction, and on-the-fly fine-tuning. The framework achieves state-of-the-art efficiency and performance in computer vision neural design, leveraging structured prompting and differentiable adapters for rapid, data-driven iteration, as demonstrated on the LEMUR corpus and PyTorch backends (Kochnev et al., 25 Nov 2025).

1. System Architecture and Closed-Loop Cycle

At its core, NNGPT operates a fully closed-loop AutoML pipeline, positioning the LLM as the central agent that incrementally improves itself and the models it generates. The cycle entails:

Retrieval and configuration: Given a prompt $P$ (specifying task, dataset, and resource budget) and context set $R$ (retrieved from the LEMUR corpus), a configuration module emits a JSON/YAML scaffold outlining requirements and constraints.
Prompt assembly: This scaffold, augmented by relevant reference implementations, is rendered into a structured instruction block encompassing both textual and code exemplars.
LLM generation: The LLM $G_\theta$ receives the prompt and outputs a full candidate $C$ containing executable PyTorch code and training specifications.
Validation: An automated validator $V$ enforces schema, shape, type, and code constraints, optionally correcting or reprompting if errors are detected.
Execution and logging: The validated candidate is executed by a trainer $E$ , producing per-epoch metric logs $m_{1:T}$ and auxiliary runtime data $u$ . All artifacts are logged for further analysis and re-training.
Code-aware performance prediction: A predictor $H_\phi$ ingests the training code and early epoch metrics $m_{1:t_0}$ , estimating both final accuracy $R$ 0 and an ideal early-stopping point $R$ 1. Low-confidence runs may be terminated or reallocated.
Self-improvement: After a batch of $R$ 2 runs, $R$ 3 is fine-tuned via LoRA adapters using successful (prompt, model, metrics) tuples, and $R$ 4 is retrained for improved regression on logged outcomes.

This tightly interlocked workflow distinguishes NNGPT from classical search-based AutoML, as the model improves by direct experience, reward-driven policy gradient, and supervised finetuning within a reproducible, auditable protocol (Kochnev et al., 25 Nov 2025).

2. Integrated LLM-Based Pipelines

NNGPT's innovation arises from the integration of five synergistic LLM-based pipelines atop a shared backend:

Zero-Shot Architecture Synthesis: Uses Few-Shot Architecture Prompting (FSAP) to produce executable neural architectures from natural language task descriptions, reference code blocks, and dataset metadata. Hash-based deduplication ensures diversity and reduces redundant experiments.
Hyperparameter Recommendation: Predicts optimal hyperparameters directly from model code and task context using a structured prompt schema, achieving root mean square error (RMSE) $R$ 5 on the LEMUR corpus versus $R$ 6 for Optuna.
Code-Aware Accuracy and Early-Stop Prediction: Employs a DeepSeek-Coder-1.3B backbone fine-tuned with QLoRA to regress on final accuracy and early stop epoch from raw code and early validation metrics, reaching RMSE $R$ 7, Pearson $R$ 8.
Retrieval-Augmented Block Synthesis (NN-RAG): Queries a curated SQLite index of 1,289 scope-closed PyTorch blocks for plug-and-play architectures, with 73% executability on randomized insertions. The backend extracts dependency closures and assembles modules respecting Python scope and import order.
Reinforcement Learning (RL) with Closed-Loop Generation: Utilizes a policy-gradient–trained LLM as a policy $R$ 9 to generate novel architectures, with reward aggregating syntactic validity, runtime feasibility, and short-run accuracy, and includes channel-mutation of architectures via torch.fx (Kochnev et al., 25 Nov 2025).

These pipelines operate within a unified seven-stage backbone (retrieval, config, prompt assembly, LLM generation, validation/execution, LoRA fine-tuning, logging), enabling continuous, data-driven improvement.

3. Validation, Fine-Tuning, and Dataset Management

All NNGPT outputs and experiments are validated, logged, and reprocessed using the LEMUR dataset—a large, auditable corpus of executable neural networks and unified preprocessing code. The system:

Retrieves exemplars for contextual prompting and in-context learning.
Fine-tunes $G_\theta$ 0 on accepted (input prompt, code, and metrics) pairs, employing LoRA adapters for rapid, parameter-efficient transfer.
Fine-tunes $G_\theta$ 1 on (code, early metrics) to (final accuracy, optimal stop epoch), improving early-termination heuristics.
Performs rigorous hash-based deduplication (via MD5) on whitespace-normalized code, saving between 200–300 GPU-hours in synthetic architectures.
Logs all results for reproducible, queryable benchmarking (Kochnev et al., 25 Nov 2025).

This design guarantees that all self-improvement cycles are traceable and auditable.

4. Key Algorithms and Pseudocode

A representative end-to-end run is encapsulated by the following pseudocode:

$G_\theta$ 3 This codifies the closed-loop selection, generation, validation, prediction, and self-improvement sequence (Kochnev et al., 25 Nov 2025).

5. Quantitative Evaluation and Comparison

NNGPT is empirically benchmarked on the LEMUR dataset and diverse computer vision tasks. Notable results include:

Task/Pipeline	Metric / Result	Baseline Comparison
Zero-shot synthesis	Balanced mean acc. 53.1%	Baseline 51.5%
Hyperparameter HPO	RMSE ≈ 0.60	Optuna: 0.64
Code accuracy pred.	RMSE ≈ 0.145, $G_\theta$ 2	-
NN-RAG executability	941/1,289 (73%)	-
RL one-epoch (MNIST)	0.9876 avg	AlexNet 0.7088

This suggests that LLM-driven pipelines can match or outperform classical search-based AutoML in accuracy and efficiency, achieving similar results with substantially fewer trials and lower compute cost. NNGPT's one-shot predictions, RL-based architecture search, and retrieval-augmented synthesis represent substantial compute savings over >20K trial-based approaches (Kochnev et al., 25 Nov 2025).

6. Design Considerations, Trade-Offs, and Future Directions

NNGPT's efficacy and generality depend on architectural, prompt, and backend choices:

Performance degrades when prompt context exceeds 3–4 examples due to LLM context length limits; retrieval-augmented construction may mitigate this bottleneck.
Prolonged LoRA fine-tuning on hyperparameter prompts leads to overfitting; early-stopping and strong regularization are required.
Correlation in early-stop prediction varies by task; model uncertainty quantification is needed for robust early termination.
Current scaling is limited to single 24 GB GPU nodes; large-scale extensions (e.g., ImageNet-1k, segmentation) necessitate distributed orchestration.
Potential research extensions: hardware-constrained architecture search via prompt-objective integration, multi-objective RL pipelines, expanded retrieval of user-supplied modules, and architectural novelty enforcement.

NNGPT provides a reproducible, framework-agnostic foundation for generative AutoML workflows, demonstrating the capability of LLMs to autonomously generate, validate, predict, and improve neural network designs in a rigorously closed loop (Kochnev et al., 25 Nov 2025).

Markdown Report Issue Upgrade to Chat

References (1)

NNGPT: Rethinking AutoML with Large Language Models (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to NNGPT Framework.

NNGPT Framework: Next-Gen AutoML

1. System Architecture and Closed-Loop Cycle

2. Integrated LLM-Based Pipelines

3. Validation, Fine-Tuning, and Dataset Management

4. Key Algorithms and Pseudocode

5. Quantitative Evaluation and Comparison

6. Design Considerations, Trade-Offs, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

NNGPT Framework: Next-Gen AutoML

1. System Architecture and Closed-Loop Cycle

2. Integrated LLM-Based Pipelines

3. Validation, Fine-Tuning, and Dataset Management

4. Key Algorithms and Pseudocode

5. Quantitative Evaluation and Comparison

6. Design Considerations, Trade-Offs, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research