Assessing Meta-Harness performance across different proposer agents

Determine how the performance and effectiveness of Meta-Harness depend on the choice of the coding-agent proposer by conducting a broader, systematic study across multiple proposer agents (beyond Claude Code) and across diverse task domains to quantify variation in outcomes.

Background

The paper evaluates Meta-Harness on three domains using a single, particularly strong coding-agent proposer (Claude Code). While results show substantial gains, the authors explicitly note that understanding how these gains vary with different proposer agents is not addressed in the current work and is deferred for future study.

References

While we evaluate on three diverse domains, our experiments demonstrate that harness search can work with one particularly strong coding-agent proposer (Claude Code); a broader study of how the effect varies across proposer agents remains for future work.

— Meta-Harness: End-to-End Optimization of Model Harnesses (2603.28052 - Lee et al., 30 Mar 2026) in Discussion (Section 5)

Assessing Meta-Harness performance across different proposer agents

Background

References

Related Problems