Reliability of LLM agents for generating correct concurrent code

Determine whether large language model coding agents can reliably generate correct concurrent code for multi-threaded bespoke OLAP database engines, including correct handling of parallelism, synchronization, and NUMA-aware data placement.

Background

The paper demonstrates an LLM-guided pipeline that synthesizes workload-specific OLAP engines and deliberately restricts the prototype to single-threaded, in-memory execution to isolate the effects of specialization from parallelism and I/O.

Extending the approach to multi-threaded execution introduces concurrency-related complexities (parallelism, synchronization, NUMA-aware placement). The authors explicitly state that whether LLM agents can reliably generate correct concurrent code in this context remains an open question, motivating investigation into agent capabilities for complex concurrent systems.

References

Supporting multi-threaded execution introduces additional challenges, including reasoning about parallelism, synchronization, and NUMA-aware data placement, and raises the open question of whether LLM agents can reliably generate correct concurrent code at this level of complexity.

Bespoke OLAP: Synthesizing Workload-Specific One-size-fits-one Database Engines  (2603.02001 - Wehrstein et al., 2 Mar 2026) in Section 7: Conclusion and Future Work