InternAgent: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification

Published 22 May 2025 in cs.AI, cs.CL, and cs.CV | (2505.16938v3)

Abstract: AI is accelerating the transformation of scientific research paradigms, not only enhancing research efficiency but also driving innovation. We introduce InternAgent, a unified closed-loop multi-agent framework to conduct Autonomous Scientific Research (ASR) across various scientific research fields, enabling researchers to tackle complicated problems in these fields with unprecedented speed and precision. InternAgent highlights three key advantages: 1) Scalability: InternAgent has demonstrated its versatility across 12 scientific research tasks, capable of generating innovative ideas to enhance the performance of baseline code. 2) Interactivity: InternAgent provides an interface for human expert feedback and multi-agent interaction in automated end-to-end processes, allowing for the seamless integration of domain expert knowledge. 3) Efficiency: InternAgent has achieved promising performance gains in several scientific fields with significantly less time cost compared to human efforts. For instance, in reaction yield prediction, it increased from 27.6% to 35.4% in just 12 hours; in enhancer activity prediction, accuracy rose from 0.65 to 0.79 with only 4 hours of processing; and in 2D semantic segmentation, precision advanced from 78.8% to 81.0% in a mere 30 hours.

Abstract PDF Upgrade to Chat

Summary

The paper introduces InternAgent, a closed-loop framework that automates scientific research from hypothesis generation to verification.
It employs specialized agents for literature survey, code analysis, idea innovation, and assessment to construct detailed research methodologies.
Experimental results across 12 tasks demonstrate improved performance and stability, outperforming existing systems like Dolphin.

InternAgent: An Autonomous Scientific Research Framework

The paper "InternAgent: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification" (2505.16938) introduces InternAgent, a unified closed-loop multi-agent framework designed for Autonomous Scientific Research (ASR) across various scientific domains. InternAgent automates the entire research cycle, including idea generation, methodology construction, experiment execution, and results feedback.

Key Components and Capabilities

InternAgent comprises four primary modules that facilitate autonomous scientific research:

Self-Evolving Idea Generation: This module employs specialized agents, including a Survey Agent, Code Review Agent, Idea Innovation Agent, and Assessment Agent, to generate and refine research ideas autonomously.
Human-Interactive Feedback: Integrates human insights to guide agent behavior, ensuring alignment with complex user requirements and practical applicability.
Idea-to-Methodology Construction: Systematically translates research ideas into concrete, implementable methodologies.
Multi-Round Experiment Planning and Execution: Designs experimental plans and decomposes the experimental process to validate the effectiveness of InternAgent-generated modules through experimentation.
Figure 1: The InternAgent pipeline, illustrating the flow of information and interactions between its core modules.

The Survey Agent adaptively aligns with user-specified requirements, employing literature review and deep research modes to explore existing methodologies. The relevance evaluation of each document is mathematically represented as $R: \mathcal{L}_{abs} \times \mathcal{T} \rightarrow [0, 1]$ , where $\mathcal{L}_{abs}$ is the abstract of retrieved literature $\mathcal{L}$ , and $R(r, t)$ measures the relevance of literature $l$ to the task $t$ .

The Code Review Agent analyzes code structures, dependencies, and functionalities to identify potential enhancements. It manages user-provided code and searches for relevant codebases, performing thorough analyses at both the repository and file levels.

The Idea Innovation Agent generates and evolves ideas, balancing creativity and rigor. The idea generation process is represented by the function $G: (\mathcal{T}, \mathcal{B}, \mathcal{L}) \rightarrow \mathcal{I}$ , where $\mathcal{B}$ denotes analysis of baseline methods and $\mathcal{I}$ is the set of generated ideas.

The Assessment Agent evaluates ideas using multidimensional scoring, considering coherence, credibility, verifiability, novelty, and alignment.

Human-interactive feedback refines ideas based on insights and critiques, with the iterative process facilitating continuous improvement.

The Orchestration Agent coordinates all other agents, synchronizing tasks and managing data flow.

Figure 2: The idea evolution process, showcasing the iterative refinement of research concepts through multiple stages.

The Methodology Development Agent constructs basic method structures by integrating ideas with baseline codes and relevant literature. The transformation function is represented as: $T: \mathcal{I} \times \mathcal{T} \times \mathcal{B} \times \mathcal{L} \rightarrow \mathcal{M}$ , where $\mathcal{I}$ denotes research ideas, $\mathcal{T}$ includes task descriptions, $\mathcal{B}$ represents baseline methods, $\mathcal{L}$ is the literature corpus, and $\mathcal{M}$ is the resulting methodological framework.

The exception-guided debugging framework converts abstract methodological text descriptions into executable implementation codes. The coder module employs Aider coding assistant for single-file tasks and the OpenHands framework for complex repository-level codes.

Experimental Validation and Results

InternAgent has been validated across 12 scientific research tasks, demonstrating its versatility and effectiveness. The tasks span multiple modalities, including science, time series, natural language, image, and point cloud.

The experiments demonstrate that InternAgent can generate better ideas on each specific domain benefiting from the self-evolving idea generation process and automatically implement them. For example, in Reaction Yield Prediction, methods proposed by InternAgent can largely outperform those proposed by Dolphin (i.e., +3.6 on max performance).

The runtime statistics of InternAgent on all 12 tasks including the training costs (i.e., GPU hours) and monetary costs in the idea generation stage (including self-evolving idea generation and idea-to-methodology) and code execution and debug stage are provided. The idea generation cost of each idea is about \$0.6 using GPT-4o which is cost-efficient.

The survey agent can search for domain-related papers and automatically select the most relevant literature to read and extract task-related information. Under deep research mode, the survey agent needs to search for literature related to specific technical terms used in generated ideas.

The results show that InternAgent can improve both the performance and the stability of the results. This phenomenon further shows the quality of ideas and code implementation of InternAgent.

Figure 3: The InternAgent software platform, highlighting its various functionalities and user interfaces for scientific research tasks.

Comparison with Existing Systems

InternAgent is compared with existing auto-research systems such as Dolphin (Yuan et al., 7 Jan 2025) and AI-Scientist-V2 (Yamada et al., 10 Apr 2025).

InternAgent demonstrates a higher performance improvement rate compared to Dolphin, attributed to the idea-to-methodology feature, enabling the concretization of high-level ideas and the progressive integration of submodules into the baseline code.

InternAgent can support repo-level tasks such as AutoPCDet, AutoVLM, AutoTPPR, and so on, and achieve better performance on these repo-level tasks compared to their baselines. For example, on Auto2DSeg, InternAgent pipeline can improve the DeepLabV3Plus baseline [chen2018encoder] from the original 78.80\% to 81.0\%. This is attributed to the detailed methodology, code comprehension achieved by the code review agent, and the auto-exploration ability of the coder agent.

Human evaluations comparing the novelty of ideas generated by InternAgent and AI-Scientist-V2 (Yamada et al., 10 Apr 2025) across various research tasks show InternAgent outperforming AI-Scientist-V2 in all aspects, especially in overall rating and soundness.

Implications and Future Directions

The development of InternAgent represents a significant step toward automating scientific research. Key technical challenges that need to be addressed in the future include:

Knowledge Retrieval: Establishing connections and relationships between papers.
Knowledge Understanding and Representation: Utilizing VLM/LLM to accurately analyze relevant academic papers.
Agent Capability Enhancement: Improving the ability of AI systems to autonomously perform complex tasks in scientific research.
Scientific Discovery-related Benchmark Construction: Evaluating the value that an idea can bring, rather than simply evaluating its novelty.

Conclusion

InternAgent is a promising framework for automating scientific research across diverse domains. By integrating self-evolving idea generation, human-interactive feedback, idea-to-methodology construction, and multi-round experiment planning and execution, InternAgent facilitates the entire research cycle from hypothesis generation to verification. The framework's modular design and extensive experimental validation demonstrate its potential to accelerate scientific discovery and reduce the dependence on human effort in scientific research.

Markdown