SHERPA: A Model-Driven Framework for Large Language Model Execution

Published 29 Aug 2025 in cs.AI and cs.SE | (2509.00272v1)

Abstract: Recently, LLMs have achieved widespread application across various fields. Despite their impressive capabilities, LLMs suffer from a lack of structured reasoning ability, particularly for complex tasks requiring domain-specific best practices, which are often unavailable in the training data. Although multi-step prompting methods incorporating human best practices, such as chain-of-thought and tree-of-thought, have gained popularity, they lack a general mechanism to control LLM behavior. In this paper, we propose SHERPA, a model-driven framework to improve the LLM performance on complex tasks by explicitly incorporating domain-specific best practices into hierarchical state machines. By structuring the LLM execution processes using state machines, SHERPA enables more fine-grained control over their behavior via rules or decisions driven by machine learning-based approaches, including LLMs. We show that SHERPA is applicable to a wide variety of tasks-specifically, code generation, class name generation, and question answering-replicating previously proposed approaches while further improving the performance. We demonstrate the effectiveness of SHERPA for the aforementioned tasks using various LLMs. Our systematic evaluation compares different state machine configurations against baseline approaches without state machines. Results show that integrating well-designed state machines significantly improves the quality of LLM outputs, and is particularly beneficial for complex tasks with well-established human best practices but lacking data used for training LLMs.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a model-driven framework that uses hierarchical state machines to systematically guide LLM execution and enhance structured reasoning.
It details mechanisms for integrating domain-specific best practices into policies and belief models to drive accurate decision-making during task execution.
Evaluations in use cases like code generation and question answering show improved performance and reduced costs compared to direct LLM approaches.

SHERPA: A Model-Driven Framework for LLM Execution

Introduction

SHERPA proposes a structured approach to enhance LLM performance on complex tasks by integrating hierarchical state machines. While LLMs showcase significant capabilities across various domains, they struggle with structured reasoning, especially when context-specific best practices are absent from their training data. SHERPA addresses this limitation by applying model-driven engineering (MDE) principles, leveraging structured system behavior models as hierarchical state machines to guide LLM execution.

Framework Overview

SHERPA uses hierarchical state machines to provide fine-grained control over LLM behavior. It incorporates domain-specific best practices into the LLM execution process through well-defined states and transitions. This structuring mechanism allows for both rule-based and LLM-driven decision-making, with actions performed during transitions using machine learning-based approaches.

Architecture Components

State Machine: The core of SHERPA's architecture. States represent different phases of task execution, and transitions define progress between states based on external events or policy decisions. Actions attached to states or transitions can invoke LLMs or other tools to perform calculations or retrieve information.
Policy: Determines the next transition by considering the current state and belief (information relevant to the task). SHERPA supports diverse policy implementations, including rule-based systems and LLM-driven decisions.
Belief: A structured record of task-related information, including state transitions and execution logs. Belief integrates external inputs and outputs from actions, providing context for decision-making and tracking performance.

Example Implementation

In the question answering context, SHERPA structures the process into distinct subtasks based on question type. It applies the LLM to classify question types, extract relevant objects from a scene graph, and count or query objects deterministically. This model-driven approach, using state machines, systematically improves the reasoning performance by separating complex sub-tasks while keeping a trajectory of transitions and executed actions.

Use Cases

Code Generation

SHERPA applies hierarchical state machines to replicate human best practices in code generation, such as test-driven methods. The state machines structure the LLM interactions iteratively, refining generated code by testing outputs before finalizing, thus improving the task's robustness.

Class Name Generation

For domain model generation, SHERPA addresses the lack of datasets by integrating pattern identification and iterative refinement states, ensuring generated models align with domain-specific practices. The hierarchical decomposition of generation tasks enables more accurate class detection and naming.

Question Answering

SHERPA offers two strategies: the routing SM, which classifies and processes questions based on their type, and the ReAct SM, which uses a sequential planning approach to integrate reasoning and acting, improving accuracy across diverse question categories.

Evaluation

Performance Analysis

SHERPA demonstrates improved performance over direct LLM approaches by implementing structured control through hierarchical state machines. Across various LLMs, SHERPA outperformed the direct approach, especially in tasks with established best practices like code generation and class name generation.

Cost Considerations

State machine designs optimizing for execution costs achieved significant reductions in LLM calls, maintaining effectiveness and reliability while minimizing resource usage. The separation of SM design from action implementation in SHERPA allows for rapid experimentation and fine-tuning to optimize both performance and costs.

Conclusion

SHERPA showcases the potential of model-driven frameworks to enhance LLM applications by structuring execution processes through hierarchical state machines. By aligning LLM execution with domain-specific best practices, SHERPA effectively manages complex tasks beyond the reach of traditional LLM approaches. Future work will explore incorporating diverse textual languages for SM definitions and expanding SHERPA's scope to other domains, exploring synergy with advanced decision-making techniques like reinforcement learning.

Markdown Report Issue