STELLA: Self-Evolving LLM Agent for Biomedical Research

Published 1 Jul 2025 in cs.AI, cs.CL, and q-bio.BM | (2507.02004v1)

Abstract: The rapid growth of biomedical data, tools, and literature has created a fragmented research landscape that outpaces human expertise. While AI agents offer a solution, they typically rely on static, manually curated toolsets, limiting their ability to adapt and scale. Here, we introduce STELLA, a self-evolving AI agent designed to overcome these limitations. STELLA employs a multi-agent architecture that autonomously improves its own capabilities through two core mechanisms: an evolving Template Library for reasoning strategies and a dynamic Tool Ocean that expands as a Tool Creation Agent automatically discovers and integrates new bioinformatics tools. This allows STELLA to learn from experience. We demonstrate that STELLA achieves state-of-the-art accuracy on a suite of biomedical benchmarks, scoring approximately 26\% on Humanity's Last Exam: Biomedicine, 54\% on LAB-Bench: DBQA, and 63\% on LAB-Bench: LitQA, outperforming leading models by up to 6 percentage points. More importantly, we show that its performance systematically improves with experience; for instance, its accuracy on the Humanity's Last Exam benchmark almost doubles with increased trials. STELLA represents a significant advance towards AI Agent systems that can learn and grow, dynamically scaling their expertise to accelerate the pace of biomedical discovery.

Abstract PDF Upgrade to Chat

Summary

The paper introduces STELLA, a self-evolving LLM agent that autonomously enhances biomedical research through continuous learning and dynamic tool integration.
The methodology leverages a Template Library and an expanding Tool Ocean to optimize reasoning strategies and integrate new bioinformatics tools.
Benchmarking results demonstrate that STELLA achieves state-of-the-art accuracy and progressive improvements through iterative test-time evolution.

STELLA: Self-Evolving LLM Agent for Biomedical Research

Introduction to STELLA's Architecture

STELLA represents an advanced approach to AI-driven biomedical research through a unique self-evolving architecture. This system is designed to autonomously adapt and enhance its capabilities, addressing the rapid diversification and complexity inherent in biomedical data and methodologies. The framework comprises several specialized agents that collaboratively facilitate continuous learning and tool integration, effectively adapting to novel challenges without manual intervention.

Figure 1: Overall Framework of STELLA, a self-evolving LLM Agent for Biomedical Research. (A) Overview of STELLA's multi-agent architecture, leveraging key agents such as Manager Agent, Dev Agent, Critic Agent, and Tool Creation Agent. (B and C) Illustration of STELLA's self-evolving mechanisms, highlighting the adaptability of the Template Library and Tool Ocean.

Self-Evolving Mechanisms

The core of STELLA's capability lies in its dynamic system of evolution, which incorporates two primary mechanisms: the Template Library and the Tool Ocean.

Template Library: This component accumulates and optimizes successful reasoning strategies, allowing STELLA to generalize experiences into reusable templates. This mechanism enables the system to apply learned solutions to similar future tasks more efficiently.
Tool Ocean: Acted upon by the Tool Creation Agent, this continually expanding repository of bioinformatics tools ensures that STELLA is not limited to a static set of functionalities. By integrating new tools as they emerge, STELLA maintains its relevance and capability in rapidly evolving scientific landscapes.

Comparative Performance and Benchmarking

STELLA's performance evaluation against established benchmarks, including "Humanity's Last Exam: Biomedicine" and "LAB-Bench: DBQA and LitQA", demonstrates the system's advanced reasoning capabilities and adaptive efficiency. The results indicate a notable achievement of state-of-the-art accuracy, surpassing established models by a measurable margin across various challenging biomedical tasks.

Figure 2: (A) STELLA's benchmark results compared with state-of-the-art LLMs and agents. (B) Demonstration of test-time self-evolving effects showing accuracy improvements with an increasing number of trials.

Test-Time Evolution and Continuous Improvement

A distinctive feature of STELLA is its ability to improve its performance through iterative computational processes. The analysis of test-time evolution highlights a systematic enhancement in accuracy correlating with extended computational trials, showcasing STELLA's capacity to refine its approach and solution strategies over time through experiential learning.

Conclusion

STELLA exemplifies progress in the design of AI systems that are capable of autonomous learning and tool integration, significantly enhancing the efficiency and scope of biomedical research applications. By shifting from static operations to a dynamic, evolving approach, STELLA presents a novel paradigm in AI-based scientific inquiry, promising to accelerate discovery processes and improve the adaptability of research agents in complex and rapidly changing environments. Future endeavors will focus on real-world deployment and further enhancement of collaborative interfaces with human researchers, paving the way for more autonomous and insightful scientific tools.