From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery

Published 19 May 2025 in cs.CL | (2505.13259v1)

Abstract: LLMs are catalyzing a paradigm shift in scientific discovery, evolving from task-specific automation tools into increasingly autonomous agents and fundamentally redefining research processes and human-AI collaboration. This survey systematically charts this burgeoning field, placing a central focus on the changing roles and escalating capabilities of LLMs in science. Through the lens of the scientific method, we introduce a foundational three-level taxonomy-Tool, Analyst, and Scientist-to delineate their escalating autonomy and evolving responsibilities within the research lifecycle. We further identify pivotal challenges and future research trajectories such as robotic automation, self-improvement, and ethical governance. Overall, this survey provides a conceptual architecture and strategic foresight to navigate and shape the future of AI-driven scientific discovery, fostering both rapid innovation and responsible advancement. Github Repository: https://github.com/HKUST-KnowComp/Awesome-LLM-Scientific-Discovery.

Abstract PDF Upgrade to Chat

Summary

The paper presents a taxonomy categorizing LLMs as tools, analysts, and scientists to illustrate their evolving roles in scientific discovery.
The paper introduces iterative methodologies and benchmarks (e.g., IdeaBench, DiscoveryWorld) to assess LLM performance in hypothesis generation and data analysis.
The findings highlight challenges in transparency, continuous self-improvement, and ethics, paving the way for future autonomous research advancements.

LLMs in Scientific Discovery: From Automation to Autonomy

This survey paper (2505.13259) examines the transformative role of LLMs in scientific discovery, focusing on their increasing autonomy and impact on research processes. It introduces a taxonomy that classifies LLMs' involvement in scientific tasks based on their level of autonomy: LLM as Tool, LLM as Analyst, and LLM as Scientist. The paper identifies challenges and future research directions, including robotic automation, self-improvement, and ethical governance.

Three Levels of LLM Autonomy

The authors propose a three-tiered taxonomy to categorize the roles of LLMs in scientific discovery based on their level of autonomy.

LLM as Tool (Level 1): LLMs perform specific, well-defined tasks under direct human supervision, such as literature summarization or code generation.
LLM as Analyst (Level 2): LLMs exhibit greater autonomy in processing complex information, conducting analyses, and offering insights with reduced human intervention.
LLM as Scientist (Level 3): LLM-based systems autonomously conduct major research stages, from formulating hypotheses to interpreting results and suggesting new avenues of inquiry.

The paper argues that LLMs are progressing from Level 1 to Level 3, indicating a significant shift towards AI-driven autonomous scientific discovery.

LLM as Tool: Task-Specific Automation

LLMs at Level 1 function as tools, augmenting human researchers by automating discrete tasks within the scientific method.

Figure 1: Stages of the scientific method with corresponding LLM applications and research topics.

These tasks include:

Literature Review and Information Gathering: LLMs assist in literature search, retrieval, and summarization. Recent "Deep Research" products enhance literature web search, organization, and report generation.
Idea Generation and Hypothesis Formulation: LLMs generate novel research ideas and hypotheses, with benchmarks like IdeaBench (Guo et al., 2024) evaluating their capability.
Experiment Planning and Execution: LLMs aid in designing causal discovery experiments and generating code for AI research.
Data Analysis and Organization: LLMs automate data organization, presentation, and analysis, including tabular and chart data processing.
Conclusion and Hypothesis Validation: LLMs provide feedback on claims and conclusions, with research focusing on LLMs as reviewers for AI papers.
Iteration and Refinement: LLMs iteratively refine research hypotheses using theorem provers and agent frameworks.

LLM as Analyst: Enhanced Information Processing

At Level 2, LLMs function as analysts, demonstrating greater autonomy in complex information processing and analytical reasoning. They can independently manage sequences of tasks within boundaries set by human researchers. Examples include:

Automated Machine Learning (AutoML): LLMs are used in automated modeling of machine learning tasks, with frameworks like IMPROVE (Xue et al., 25 Feb 2025) emphasizing iterative refinement.
Data Modeling and Analysis: LLMs assist in statistical data modeling and hypothesis validation, with benchmarks like BLADE (Gu et al., 2024) improving evaluation robustness.
Function Discovery: LLMs enhance function discovery by leveraging prior domain knowledge and incorporating feedback from clustered memory storage.
Natural Science Research: LLMs are applied to autonomous research workflows for natural science discovery, such as in the biomedical domain with BioResearcher (Luo et al., 2024).
General Research: Benchmarks like DiscoveryWorld (Jansen et al., 2024) evaluate LLMs on diverse tasks from different stages of scientific discovery.

Level 3 signifies a leap in autonomy, where LLM-based systems operate as scientists, capable of orchestrating and navigating multiple stages of the scientific discovery process with considerable independence. These systems can formulate hypotheses, plan and execute experiments, analyze data, and propose new research questions. Agent Laboratory (Schmidgall et al., 8 Jan 2025) and AI Scientist (Lu et al., 2024) are examples of systems capable of autonomously producing research outputs.

Iterative refinement involves sophisticated feedback loops that enable not just incremental improvements but also fundamental reassessments of the research trajectory. AI Scientist (v1 and v2) incorporates highly automated internal review and refinement processes. Zochi [Intology2025Zochi] integrates human expertise for macro-level guidance, where feedback can trigger complete re-evaluations of hypotheses or designs.

Challenges and Future Directions

The paper identifies several challenges and future directions for LLM-based scientific discovery:

Fully-Autonomous Research Cycle: Developing LLM-based systems capable of engaging in a truly autonomous research cycle, discerning implications, and identifying subsequent investigations.
Robotic Automation: Integrating LLMs with robotic systems to enable physical laboratory experiments and broaden research in fields requiring physical interaction.
Transparency and Interpretability: Moving towards systems designed for verifiable reasoning and justifiable conclusions, ensuring the AI's internal logic aligns with scientific principles.
Continuous Self-Improvement: Incorporating online reinforcement learning frameworks to enable systems to learn from ongoing engagement and adapt research strategies.
Ethics and Societal Alignment: Embedding ethical constraints directly in scientific AI design frameworks, ensuring advancements serve human well-being and the common good.

Taxonomy of LLM-Based Scientific Discovery

Figure 2: Taxonomy of research works in LLM-based scientific discovery with detailed categorization.

The paper consolidates 90 research works within the scope of the three levels of autonomy (Figure 2).

Conclusion

The survey concludes that LLMs are significantly impacting scientific discovery, evolving from simple automation tools to increasingly autonomous agents. Addressing the identified challenges and pursuing the outlined future directions will be crucial for realizing the full potential of AI as a transformative partner in scientific exploration.

Markdown Report Issue