Assessing Meta-Harness performance across different proposer agents
Determine how the performance and effectiveness of Meta-Harness depend on the choice of the coding-agent proposer by conducting a broader, systematic study across multiple proposer agents (beyond Claude Code) and across diverse task domains to quantify variation in outcomes.
References
While we evaluate on three diverse domains, our experiments demonstrate that harness search can work with one particularly strong coding-agent proposer (Claude Code); a broader study of how the effect varies across proposer agents remains for future work.
— Meta-Harness: End-to-End Optimization of Model Harnesses
(2603.28052 - Lee et al., 30 Mar 2026) in Discussion (Section 5)