Determine whether long Lean proof contexts cause failure on the A5 key lemma

Determine whether overly long Lean proof contexts cause the observed difficulty of Numina-Lean-Agent, which uses Claude Code as the base model, in formalizing the key lemma of the Putnam 2025 A5 problem that asserts alternating permutations occur in the largest number among permutations satisfying a specified property.

Background

Section 3.2 analyzes the system’s behavior on Putnam 2025 A5, where the core is to prove that among permutations with a given property, alternating permutations are most numerous. The authors report that the model repeatedly stalled on a critical intermediate lemma.

They hypothesize that excessively long proof contexts degrade instruction-following and focus on subgoals, and thus adopted a subagent strategy to isolate and solve the lemma separately. The conjecture explicitly attributes the difficulty to long contexts, motivating the decomposition approach.

References

The core of A5 is to prove that, among all permutations satisfying a certain property, alternating permutations occur in the largest number. In several previous experiments, the model repeatedly got stuck on this key lemma. We conjecture that this difficulty is caused by overly long contexts, and therefore adopt a subagent strategy that isolates this lemma from the overall proof and handles it separately.

— Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics (2601.14027 - Liu et al., 20 Jan 2026) in Section 3.2 (Putnam-2025-A5)

Determine whether long Lean proof contexts cause failure on the A5 key lemma

Background

References

Related Problems