Man-Made Heuristics Are Dead. Long Live Code Generators!
Abstract: Policy design for various systems controllers has conventionally been a manual process, with domain experts carefully tailoring heuristics for the specific instance in which the policy will be deployed. In this paper, we re-imagine policy design via a novel automated search technique fueled by recent advances in generative models, specifically LLM-driven code generation. We outline the design and implementation of PolicySmith, a framework that applies LLMs to synthesize instance-optimal heuristics. We apply PolicySmith to two long-standing systems policies - web caching and congestion control, highlighting the opportunities unraveled by this LLM-driven heuristic search. For caching, PolicySmith discovers heuristics that outperform established baselines on standard open-source traces. For congestion control, we show that PolicySmith can generate safe policies that integrate directly into the Linux kernel.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What is this paper about?
This paper argues that instead of people hand‑crafting simple rules (called “heuristics”) to run computer systems, we can now automatically generate those rules as code with the help of modern AI code generators. The authors built a framework called PolicySmith that uses LLMs to write and improve small chunks of code that make decisions in systems like web caches and Internet traffic control.
The big idea
Rather than trying to find one “best” rule that works everywhere, the paper focuses on finding the “best for this specific situation” rule—what they call an instance‑optimal policy. PolicySmith keeps trying new code, tests it, keeps the best parts, and repeats—so each system can get rules tailored to its workload, hardware, and goals.
What questions are the authors asking?
- Can an AI code generator automatically find good, simple rules (heuristics) for computer systems?
- Can these AI‑written rules beat long‑standing, expert‑made rules in real tasks like web caching and congestion control?
- Can we make the generated code safe, fast, and understandable enough to run even inside the operating system kernel?
How did they do it? (Explained simply)
Think of PolicySmith like a “smart workshop” that trains a team of rule‑making apprentices:
- The Template: This is like a recipe outline. It says what the rule is allowed to do, what information it can see, and the limits (for example, “no slow loops” or “no floating‑point math in the kernel”).
- The Generator: This is the AI that writes candidate rules as code based on the template.
- The Checker: This is the strict reviewer that catches mistakes or disallowed code and gives clear feedback so the AI can fix them.
- The Evaluator: This is a test bench. It runs each candidate rule on realistic data (or a simulator) and gives it a score.
- The Search Loop: It’s like tryouts. Generate many rules, test them, keep the best, use them as examples to inspire better ones next round, and repeat.
In everyday terms: imagine trying lots of cookie recipes. You set some limits (no nuts, must bake in under 10 minutes), the AI proposes recipes, a safety checker rejects bad ones, a taste test scores them, and you then remix the best to get even tastier cookies next time.
Case study 1: Web caching
- What is caching? A cache stores recently used items so the system can fetch them faster next time. When the cache fills up, it needs a rule to decide what to throw out (evict).
- What PolicySmith controls: a small “priority()” function that scores each item; lower‑scoring items get evicted first.
- What info the rule can use: how often something is used, how recently it was used, its size, and summaries of what’s in the cache (like “what counts as a ‘big’ item right now?”). This keeps the rule expressive but still fast.
Case study 2: Congestion control (in the Linux kernel)
- What is congestion control? It decides how fast to send data over the Internet so connections stay smooth and fair.
- Kernel constraints: The operating system’s kernel is strict—no floating‑point math, limited loops, and safety first. Crashes are not acceptable.
- Safe execution trick: The authors attach the generated logic using eBPF (a safe, verified “mini‑program” system for the kernel). The eBPF verifier acts like a gatekeeper, refusing unsafe code before it can run.
What did they find, and why is it important?
Web caching results
- The AI‑generated eviction rules beat many well‑known expert rules on standard, real‑world traces (recorded request patterns) from two datasets (CloudPhysics and MSR).
- A single AI‑found rule, tuned on one trace, often did very well on many other traces in the same dataset—suggesting the rules weren’t just memorizing one pattern.
- When comparing “perfect pickers” (oracles) that always choose the best rule for each trace, adding PolicySmith’s rules raised the upper bound by about 2% over using only classic baselines—meaning there’s real extra performance available.
- Cost and speed: Finding a top rule for one trace took about 5.5 CPU hours of testing and cost under $7 in AI API usage—quite practical.
Why this matters: Better cache rules mean fewer “misses,” which can make websites and apps feel faster and reduce load on storage systems.
Congestion control results (kernel safety and feasibility)
- Generating safe kernel code is harder: about 63% of candidates compiled cleanly the first time; more passed after fixing errors (like removing forbidden floating‑point math).
- The successful rules showed a wide range of behaviors—from conservative (lower bandwidth, low delay) to aggressive (higher bandwidth, more delay). This variety proves the search can explore many strategies.
- Key point: It’s feasible to safely generate and test policies even inside the OS kernel with the right checks (eBPF), which is a big step for real‑world adoption.
Why this matters: Smarter, tailored congestion control can mean smoother video calls, faster downloads, and less network lag—without risking system crashes.
What could this change?
- Faster adaptation: Instead of years of manual tweaking, systems could quickly get custom rules for new apps, hardware, or performance goals.
- More transparency than neural nets: The output is human‑readable code (not a tangle of neural network weights), making it easier to inspect, trust, and deploy—especially in safety‑critical places like the kernel.
- A new workflow for engineers: Experts focus on high‑level goals and constraints; the generator explores and proposes the code.
Limitations and next steps
- Knowing when to re‑tune: Systems need good signals to realize “the situation changed” and it’s time to regenerate rules.
- Testing without surprises: Simulators and test harnesses must be realistic, and safety checks must be strong—especially for kernel code.
- Coordinating across components: Real systems have many interacting policies (network, storage, CPU). Future work needs to synthesize rules that play nicely together.
- Developer tools: New tools are needed to prompt, debug, and guide the AI so experts can steer the search effectively.
Takeaway
Hand‑made, one‑size‑fits‑all rules struggle to keep up with today’s fast‑changing systems. PolicySmith shows a practical path to automatically generate small, understandable pieces of code that match each system’s unique needs—and can even run safely inside the operating system. It’s a shift from writing fixed rules to running a smart, repeatable process that discovers the best rule for the job.
Collections
Sign up for free to add this paper to one or more collections.