Papers
Topics
Authors
Recent
Search
2000 character limit reached

AIOS: LLM Agent Operating System

Published 25 Mar 2024 in cs.OS, cs.AI, and cs.CL | (2403.16971v5)

Abstract: LLM-based intelligent agents face significant deployment challenges, particularly related to resource management. Allowing unrestricted access to LLM or tool resources can lead to inefficient or even potentially harmful resource allocation and utilization for agents. Furthermore, the absence of proper scheduling and resource management mechanisms in current agent designs hinders concurrent processing and limits overall system efficiency. To address these challenges, this paper proposes the architecture of AIOS (LLM-based AI Agent Operating System) under the context of managing LLM-based agents. It introduces a novel architecture for serving LLM-based agents by isolating resources and LLM-specific services from agent applications into an AIOS kernel. This AIOS kernel provides fundamental services (e.g., scheduling, context management, memory management, storage management, access control) for runtime agents. To enhance usability, AIOS also includes an AIOS SDK, a comprehensive suite of APIs designed for utilizing functionalities provided by the AIOS kernel. Experimental results demonstrate that using AIOS can achieve up to 2.1x faster execution for serving agents built by various agent frameworks. The source code is available at https://github.com/agiresearch/AIOS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Intelligent agents: Theory and practice. The knowledge engineering review, 10(2):115–152, 1995.
  2. A roadmap of agent research and development. Autonomous agents and multi-agent systems, 1:7–38, 1998.
  3. Tropos: An agent-oriented software development methodology. Autonomous Agents and Multi-Agent Systems, 8:203–236, 2004.
  4. OpenAI. Gpt-4. https://openai.com/research/gpt-4, 2023.
  5. Facebook. Meta. introducing llama: A foundational, 65-billion-parameter large language model. https://ai.facebook.com/blog/largelanguage-model-llama-meta-ai, 2022.
  6. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  7. OpenAGI: When LLM Meets Domain Experts. Advances in Neural Information Processing Systems, 36, 2023.
  8. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  9. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  10. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  11. Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5). In Proceedings of the 16th ACM Conference on Recommender Systems, page 299–315, 2022.
  12. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
  13. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474, 2022.
  14. Galactica: A large language model for science. arXiv preprint arXiv:2211.09085, 2022.
  15. Reasoning with language model is planning with world model. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8154–8173, 2023.
  16. Language models can solve computer tasks. Advances in Neural Information Processing Systems, 36, 2023.
  17. The programmer’s assistant: Conversational interaction with a large language model for software development. In Proceedings of the 28th International Conference on Intelligent User Interfaces, pages 491–514, 2023.
  18. Palm-e: an embodied multimodal language model. In Proceedings of the 40th International Conference on Machine Learning, pages 8469–8488, 2023.
  19. Do as i can, not as i say: Grounding language in robotic affordances. In Conference on robot learning, pages 287–318. PMLR, 2023.
  20. ReAct: Synergizing reasoning and acting in language models. International Conference on Learning Representations, 2023.
  21. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36, 2023.
  22. Mind2web: Towards a generalist agent for the web. Advances in Neural Information Processing Systems, 36, 2023.
  23. UW:CSE451. History of Operating Systems, 2023. https://courses.cs.washington.edu/courses/cse451/16wi/readings/lecture_readings/LCM_OperatingSystemsTimeline_Color_acd_newsize.pdf.
  24. The unix time-sharing system. Commun. ACM, 17(7):365–375, jul 1974.
  25. Charles Antony Richard Hoare. Monitors: An operating system structuring concept. Communications of the ACM, 17(10):549–557, 1974.
  26. Exokernel: An operating system architecture for application-level resource management. ACM SIGOPS Operating Systems Review, 29(5):251–266, 1995.
  27. Scheduling algorithms for multiprogramming in a hard-real-time environment. Journal of the ACM (JACM), 20(1):46–61, 1973.
  28. Edsger W Dijkstra. Cooperating sequential processes. In The origin of concurrent programming: from semaphores to remote procedure calls, pages 65–138. Springer, 2002.
  29. Peter J Denning. The working set model for program behavior. Communications of the ACM, 11(5):323–333, 1968.
  30. Virtual memory, processes, and sharing in multics. Communications of the ACM, 11(5):306–312, 1968.
  31. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems (TOCS), 10(1):26–52, 1992.
  32. A fast file system for unix. ACM Transactions on Computer Systems (TOCS), 2(3):181–197, 1984.
  33. LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem. arXiv:2312.03815, 2023.
  34. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
  35. Language agents in the digital world: Opportunities and risks. princeton-nlp.github.io, Jul 2023.
  36. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255, 2022.
  37. Toolalpaca: Generalized tool learning for language models with 3000 simulated cases. arXiv preprint arXiv:2306.05301, 2023.
  38. Webgpt: Browser-assisted question-answering with human feedback, 2022.
  39. Toolcoder: Teach code generation models to use apis with search tools. arXiv preprint arXiv:2305.04032, 2023.
  40. Minedojo: Building open-ended embodied agents with internet-scale knowledge. Advances in Neural Information Processing Systems, 35:18343–18362, 2022.
  41. Voyager: An open-ended embodied agent with large language models. In Intrinsically-Motivated and Open-Ended Learning Workshop@ NeurIPS2023, 2023.
  42. Emergent autonomous scientific research capabilities of large language models. arXiv preprint arXiv:2304.05332, 2023.
  43. Chemcrow: Augmenting large-language models with chemistry tools. arXiv preprint arXiv:2304.05376, 2023.
  44. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR, 2022.
  45. Language models meet world models: Embodied experiences enhance language models. Advances in neural information processing systems, 36, 2023.
  46. Camel: Communicative agents for "mind" exploration of large language model society. Advances in Neural Information Processing Systems, 36, 2023.
  47. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1–22, 2023.
  48. Metagpt: Meta programming for multi-agent collaborative framework. In The Twelfth International Conference on Learning Representations, 2023.
  49. Communicative agents for software development. arXiv preprint arXiv:2307.07924, 2023.
  50. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155, 2023.
  51. Flows: Building blocks of reasoning and collaborating ai. arXiv preprint arXiv:2308.01285, 2023.
  52. Improving language model negotiation with self-play and in-context learning from ai feedback. arXiv preprint arXiv:2305.10142, 2023.
  53. Improving factuality and reasoning in language models through multiagent debate. arXiv preprint arXiv:2305.14325, 2023.
  54. Chateval: Towards better llm-based evaluators through multi-agent debate. In The Twelfth International Conference on Learning Representations, 2023.
  55. Encouraging divergent thinking in large language models through multi-agent debate. arXiv preprint arXiv:2305.19118, 2023.
  56. War and peace (waragent): Large language model-based multi-agent simulation of world wars. arXiv preprint arXiv:2311.17227, 2023.
  57. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  58. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR, 2023.
  59. Extending context window of large language models via positional interpolation. arXiv preprint arXiv:2306.15595, 2023.
  60. Yarn: Efficient context window extension of large language models. arXiv preprint arXiv:2309.00071, 2023.
  61. Augmented language models: a survey. Transactions on Machine Learning Research, 2023.
  62. LangChain. Langchain. https://github.com/langchain-ai/langchain, 2024.
  63. Rapid. Rapid api hub. https://rapidapi.com/hub, 2024.
  64. Ken Thompson. Reflections on trusting trust. Communications of the ACM, 27(8):761–763, 1984.
  65. Towards taming privilege-escalation attacks on android. In NDSS, volume 17, page 19, 2012.
  66. Tris Warkentin Jeanine Banks. Gemma: Introducing new state-of-the-art open models. https://blog.google/technology/developers/gemma-open-models/, 2024.
  67. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
  68. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
Citations (8)

Summary

  • The paper introduces a novel AIOS that isolates LLM-specific services from traditional OS functions, enhancing resource allocation.
  • It employs classic scheduling algorithms, advanced context management, and an adaptive K-LRU memory policy to manage execution efficiently.
  • Empirical results demonstrate up to 2.1× faster execution and improved throughput, underscoring the system’s scalability for LLM agents.

AIOS: LLM Agent Operating System

Introduction

The paper "AIOS: LLM Agent Operating System" introduces the architecture of an LLM-based AI Agent Operating System (AIOS), specifically targeting the resource management challenges associated with deploying LLM-based intelligent agents. The primary objective of AIOS is to isolate resources and LLM-specific services from agent applications into an AIOS kernel, thereby improving resource allocation efficiency and enabling proper scheduling for concurrent processing. This operating system is designed to serve LLM-based agents by providing core functionalities such as scheduling, context management, memory management, storage management, and access control. AIOS includes an SDK with a comprehensive API suite to streamline the use of the AIOS kernel functionalities. Figure 1

Figure 1: A motivating example of how an agent (i.e., travel agent) requires both LLM-related and Non-LLM-related (i.e., OS) services to complete a task, where color in red represents services related to LLM and color in blue represents services not related to LLM.

Architecture of AIOS

The architecture of AIOS is structured into three distinct layers: the application layer, the kernel layer, and the hardware layer.

  1. Application Layer: This layer facilitates the design and development of agent applications. It provides interfaces via the AIOS SDK to request system resources, thereby abstracting the complexities involved in direct resource manipulation and ensuring system isolation. Figure 2

    Figure 2: An overview of the AIOS architecture of distinct layers. Application layer facilitates the design and development of agent applications. Kernel layer manages core functionalities and resources to serve agent applications. Hardware layer controls and manages physical computing resources and devices to support kernel layer functionalities.

  2. Kernel Layer: The kernel layer encompasses the traditional OS kernel for non-LLM tasks and the AIOS kernel that handles LLM-specific functionalities. Within the AIOS kernel, different modules manage agent queries by decomposing them into execution units that the scheduler orchestrates. This layer provides specialized modules for LLM processing, memory and storage management, and tool usage, with features like context management for handling interruptions efficiently.
  3. Hardware Layer: This controls and manages physical computing resources and devices to support the functionalities of the kernel layer. Although it is less of a focus in AIOS, its efficient management remains crucial for overall system performance.

Kernel Implementation

Scheduler and Context Manager

The scheduler in AIOS centralizes all requests, efficiently dispatching them to appropriate modules, and is designed to handle system calls with classic algorithms like FIFO and Round Robin (RR) to optimize resource distribution. Figure 3

Figure 3: How agent queries are decomposed into AIOS system calls and how AIOS system calls are dispatched and scheduled. We omit the access manager module here as the access-related system calls will not be dispatched by the scheduler.

The context manager supports task interruption and resumption via snapshot and restoration processes, managing long-running system calls by preserving intermediate states using text-based and logits-based methods. Figure 4

Figure 4: Illustration of the logits-based context snapshot and restoration process. We use beam search algorithm where beam width is set to 1 as an example.

Memory and Storage Management

The memory manager addresses runtime data storage and retrieval, ensuring efficient resource usage through a K-LRU eviction policy, swapping less accessed data to storage when necessary. Figure 5

Figure 5: Illustration of memory and storage manager as well as their relationship. An agent's memory item in its memory block will be evicted to storage if its memory usage exceeds the memory limit, which is set to 80\% of the memory block size. This threshold is configurable through AIOS configuration.

The storage manager handles persistent data storage using mechanisms that ensure data integrity and scalability, supporting operations such as versioning and rollback.

Evaluation

AIOS demonstrates its effectiveness through extensive evaluations, achieving up to 2.1× faster execution for serving agents built by various frameworks. The empirical results consistently show AIOS maintaining agent performance across standard benchmarks while significantly improving execution efficiency and system throughput. Figure 6

Figure 6: Overall execution time and average agent waiting time when agent number increases from 250 to 2000.

Conclusion

AIOS provides an innovative architecture for efficiently managing LLM-based agents by isolating resources and enhancing system functionalities through an AIOS kernel. It facilitates improved scalability, execution efficiency, and resource management for LLM-intensive applications. Future research could explore more advanced scheduling algorithms, optimization of context management, and safety enhancements to further leverage AIOS's potential in various real-world deployments.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 40 tweets with 2592 likes about this paper.