Autonomous AI capability on research-level mathematics

Ascertain the current capabilities of contemporary AI systems to autonomously solve research-level mathematics questions without expert human involvement, thereby determining where such systems presently stand in independent research problem-solving.

Background

The paper proposes a methodology to assess AI systems on authentic research-level mathematics problems whose solutions are known to the authors but unpublished, aiming to minimize data contamination and distinguish genuine problem-solving from search capabilities.

Within this context, the authors explicitly note uncertainty about the current autonomous problem-solving ability of AI systems for research-level mathematics, motivating their experimental setup and future benchmark plans.

References

While commercial AI systems are undoubtedly already at a level where they are useful tools for mathematicians, it is not yet clear where AI systems stand at solving research-level math questions on their own, without an expert in the loop.

First Proof  (2602.05192 - Abouzaid et al., 5 Feb 2026) in Section 1 (Introduction), paragraph 3, page 2