Create a Video View Paper

ExecTune: Effective Steering of Black-Box LLMs with Guide Models

This presentation explores ExecTune, a groundbreaking framework for efficiently steering black-box large language models through trainable guide models. The talk examines the Guide-Core Policy architecture, which separates strategy generation from execution, and demonstrates how optimizing for executability rather than advice quality alone yields superior performance and cost savings. Through rigorous theory and empirical validation across mathematical reasoning and code synthesis benchmarks, ExecTune shows how cheaper models guided by specialized policies can match or exceed the performance of much more expensive systems.

Script

Deploying large language models through APIs creates a hidden cost spiral. Every query to a powerful model burns tokens, and for recurring tasks, those inference costs can dwarf the original training budget. What if you could train a small, reusable guide to steer a black-box model so effectively that a cheap core outperforms an expensive one?

The authors decompose agentic reasoning into two distinct roles. A trainable guide proposes structured strategies, while a black-box core executes them. Critically, the paper proves that performance depends on executability: whether the core can reliably parse and act on what the guide generates, not just whether the advice sounds good.

So how do you train a guide for executability?

ExecTune uses a two-stage curriculum. First, a strong teacher generates strategies, but only those that the actual black-box core can execute successfully enter the training set. Then, reinforcement learning refines the guide with structured rewards: strategies must be parseable, must not leak answers, and must improve over core-only baselines without causing failures.

The results are striking. On mathematical reasoning, ExecTune pushes accuracy to 93.56%, beating the core alone by over 22 points. On code synthesis, a cheaper core with ExecTune outperforms a model costing 38% more. The framework doesn't just improve performance—it inverts the cost-accuracy tradeoff.

Traditional advisor models fail because they optimize for the wrong objective. Advice that sounds insightful to a human or a different model can be unparseable or harmful to the actual core. ExecTune solves this by training directly against the deployment target, and because the guide is modular, you can adapt it for new domains or unlearn unwanted behaviors without ever touching the black-box weights.

ExecTune proves that the future of efficient language model deployment is not bigger cores, but smarter guides trained for the cores you already have. Visit EmergentMind.com to learn more and create your own videos.