Agentic Large Language Models, a survey

Published 29 Mar 2025 in cs.AI, cs.CL, and cs.LG | (2503.23037v3)

Abstract: Background: There is great interest in agentic LLMs, LLMs that act as agents. Objectives: We review the growing body of work in this area and provide a research agenda. Methods: Agentic LLMs are LLMs that (1) reason, (2) act, and (3) interact. We organize the literature according to these three categories. Results: The research in the first category focuses on reasoning, reflection, and retrieval, aiming to improve decision making; the second category focuses on action models, robots, and tools, aiming for agents that act as useful assistants; the third category focuses on multi-agent systems, aiming for collaborative task solving and simulating interaction to study emergent social behavior. We find that works mutually benefit from results in other categories: retrieval enables tool use, reflection improves multi-agent collaboration, and reasoning benefits all categories. Conclusions: We discuss applications of agentic LLMs and provide an agenda for further research. Important applications are in medical diagnosis, logistics and financial market analysis. Meanwhile, self-reflective agents playing roles and interacting with one another augment the process of scientific research itself. Further, agentic LLMs provide a solution for the problem of LLMs running out of training data: inference-time behavior generates new training states, such that LLMs can keep learning without needing ever larger datasets. We note that there is risk associated with LLM assistants taking action in the real world-safety, liability and security are open problems-while agentic LLMs are also likely to benefit society.

Abstract PDF Upgrade to Chat

Summary

The paper presents a comprehensive survey of agentic LLMs, detailing advancements in chain-of-thought, self-reflection, and tool integration.
It categorizes models into reasoning, acting, and interacting domains, highlighting their interdependency and real-world applications.
The survey outlines future research directions to address data quality, hallucination, scalable agent behavior, and safety concerns.

Agentic LLMs: A Survey of Reasoning, Acting, and Interacting Models

This paper provides a comprehensive survey of the emerging field of agentic LLMs, defining them as LLMs that can reason, act, and interact (2503.23037). It organizes the literature based on these three core capabilities and discusses applications and future research directions.

Taxonomy of Agentic LLMs

The paper categorizes agentic LLMs into three main areas, reflecting their ability to reason, act, and interact:

Reasoning: This category focuses on enhancing LLMs' decision-making through improved reasoning, reflection, and information retrieval. Sub-areas include multi-step reasoning, self-reflection, and retrieval augmentation.
Acting: This category deals with LLMs that can perform actions in the real world, often as assistants. It covers world models, robot/tool integration, and various applications of LLM assistants.
Interacting: This category explores LLMs in multi-agent systems, focusing on collaborative task-solving and simulating social behaviors. It includes social capabilities, role-based interaction, and open-ended societies.

The authors highlight that these categories are complementary, with advances in one area benefiting others. For instance, retrieval augmentation supports tool use, self-reflection enhances multi-agent collaboration, and reasoning improves all categories.

Reasoning in Agentic LLMs

The survey explores techniques for improving LLM reasoning capabilities, including:

Chain of Thought (CoT): Prompting LLMs to generate intermediate reasoning steps to solve complex problems. Techniques like "Let's think step by step" have shown significant improvements in performance (2503.23037).
Self-Consistency: Sampling diverse reasoning paths and selecting the most consistent answer through majority voting to mitigate hallucination (2503.23037).
Interpreter and Debugger: Using formal languages like Python to reformulate problems, allowing specialized systems to solve them. Debuggers provide feedback on generated code, enhancing code generation (2503.23037).
Search Tree: Employing external control algorithms to explore a tree of reasoning steps, enabling backtracking and alternative solutions (2503.23037).
Self-Reflection: LLMs assess and refine their own predictions through prompt-improvement loops, using external memory to store state information (2503.23037).
Retrieval Augmentation: Augmenting LLMs with external knowledge bases for timely and specialized information retrieval at inference time (2503.23037).

Acting in the World

This section covers LLMs that interact with the world through action models, robots, and tools:

World Models: Learning surrogate models of the environment to enable sample-efficient training of policies (2503.23037).
Vision-Language-Action (VLA) Models: Training models on robotic sequences to perform actions in visual scenes based on language prompts. VLA models achieve impressive zero-shot results in complex tasks (2503.23037).
Robot Planning: Grounding LLMs in robotic affordances by integrating knowledge of the environment and robot capabilities. Techniques like SayCan and Inner Monologue enhance robot planning and interaction (2503.23037).
Action Tools: Integrating LLMs with external tools through APIs, enabling them to perform tasks like calling search engines or using specialized services. Frameworks like ToolBench and EasyTool facilitate tool calling (2503.23037).
Assistants: Developing virtual assistants for various applications, including conversational assistance, shopping, flight operations, medical support, and financial trading (2503.23037).

Interacting in Multi-Agent Systems

The survey discusses LLMs in multi-agent simulations, focusing on:

Social Capabilities: Examining social and interactive abilities in LLMs, such as conversation, etiquette, empathy, strategic behavior, and theory of mind. Benchmarks like GTBench and EgoSocialArena are used to evaluate these capabilities (2503.23037).
Role-Based Interaction: Assigning LLMs distinct roles to perform tasks in pairs or teams, fostering cooperative or adversarial interactions. Frameworks like CAMEL and Multi-Agent Debate (MAD) are used to study role-based interactions (2503.23037).
Open-Ended Societies: Simulating large-scale agent societies to study emergent behaviors, social dynamics, and norms. Platforms like Generative Agents and OASIS are used to model social interactions at scale (2503.23037).

Research Agenda

The authors propose a research agenda for agentic LLMs, focusing on:

Training Data: Finetuning LLMs with inference-time reasoning data and exploring convergent reinforcement learning techniques (2503.23037).
Hallucination: Addressing hallucination through self-verification, mechanistic interpretability, and open-world models (2503.23037).
Agent Behavior: Scaling simulation infrastructure, distilling reasoning into small models, and modeling agent and human behavior (2503.23037).
Self-Reflection: Developing in-model self-reflection mechanisms, exploring metacognition and personality, and automating scientific discovery (2503.23037).
Safety: Addressing responsibility and liability issues, ensuring privacy and fairness, and expanding application areas for assistants (2503.23037).

Conclusion

Agentic LLMs are a rapidly evolving field with significant potential for various applications. The survey highlights the importance of reasoning, acting, and interacting capabilities and provides a roadmap for future research directions. Key areas for development include generating high-quality training data, mitigating hallucinations, scaling agent behavior, improving self-reflection, and ensuring the safety and ethical use of agentic LLMs.

Markdown