- The paper introduces a model-driven framework that uses hierarchical state machines to systematically guide LLM execution and enhance structured reasoning.
- It details mechanisms for integrating domain-specific best practices into policies and belief models to drive accurate decision-making during task execution.
- Evaluations in use cases like code generation and question answering show improved performance and reduced costs compared to direct LLM approaches.
SHERPA: A Model-Driven Framework for LLM Execution
Introduction
SHERPA proposes a structured approach to enhance LLM performance on complex tasks by integrating hierarchical state machines. While LLMs showcase significant capabilities across various domains, they struggle with structured reasoning, especially when context-specific best practices are absent from their training data. SHERPA addresses this limitation by applying model-driven engineering (MDE) principles, leveraging structured system behavior models as hierarchical state machines to guide LLM execution.
Framework Overview
SHERPA uses hierarchical state machines to provide fine-grained control over LLM behavior. It incorporates domain-specific best practices into the LLM execution process through well-defined states and transitions. This structuring mechanism allows for both rule-based and LLM-driven decision-making, with actions performed during transitions using machine learning-based approaches.
Architecture Components
- State Machine: The core of SHERPA's architecture. States represent different phases of task execution, and transitions define progress between states based on external events or policy decisions. Actions attached to states or transitions can invoke LLMs or other tools to perform calculations or retrieve information.
- Policy: Determines the next transition by considering the current state and belief (information relevant to the task). SHERPA supports diverse policy implementations, including rule-based systems and LLM-driven decisions.
- Belief: A structured record of task-related information, including state transitions and execution logs. Belief integrates external inputs and outputs from actions, providing context for decision-making and tracking performance.
Example Implementation
In the question answering context, SHERPA structures the process into distinct subtasks based on question type. It applies the LLM to classify question types, extract relevant objects from a scene graph, and count or query objects deterministically. This model-driven approach, using state machines, systematically improves the reasoning performance by separating complex sub-tasks while keeping a trajectory of transitions and executed actions.
Use Cases
Code Generation
SHERPA applies hierarchical state machines to replicate human best practices in code generation, such as test-driven methods. The state machines structure the LLM interactions iteratively, refining generated code by testing outputs before finalizing, thus improving the task's robustness.
Class Name Generation
For domain model generation, SHERPA addresses the lack of datasets by integrating pattern identification and iterative refinement states, ensuring generated models align with domain-specific practices. The hierarchical decomposition of generation tasks enables more accurate class detection and naming.
Question Answering
SHERPA offers two strategies: the routing SM, which classifies and processes questions based on their type, and the ReAct SM, which uses a sequential planning approach to integrate reasoning and acting, improving accuracy across diverse question categories.
Evaluation
SHERPA demonstrates improved performance over direct LLM approaches by implementing structured control through hierarchical state machines. Across various LLMs, SHERPA outperformed the direct approach, especially in tasks with established best practices like code generation and class name generation.
Cost Considerations
State machine designs optimizing for execution costs achieved significant reductions in LLM calls, maintaining effectiveness and reliability while minimizing resource usage. The separation of SM design from action implementation in SHERPA allows for rapid experimentation and fine-tuning to optimize both performance and costs.
Conclusion
SHERPA showcases the potential of model-driven frameworks to enhance LLM applications by structuring execution processes through hierarchical state machines. By aligning LLM execution with domain-specific best practices, SHERPA effectively manages complex tasks beyond the reach of traditional LLM approaches. Future work will explore incorporating diverse textual languages for SM definitions and expanding SHERPA's scope to other domains, exploring synergy with advanced decision-making techniques like reinforcement learning.