Generalizability of template-based LLM-assisted mechanized proof development
Ascertain the degree to which template-based, LLM-assisted mechanized proof development generalizes to settings where one or more enabling preconditions are absent, specifically: the absence of a closely related, complete proof serving as a template; the absence of a domain expert able to identify correspondences and provide targeted guidance; or the absence of a sufficiently capable large language model. Evaluate performance and reliability when adapting proofs in contexts such as transformations without close templates or domains where human expertise is limited.
References
The success of our experiment depends on several preconditions: the existence of a closely related, complete proof serving as a template; a domain expert able to identify correspondences and provide targeted guidance; and a sufficiently capable LLM. How well this approach generalizes to settings where one or more of these preconditions are absent, e.g., proofs without a close template, or domains where the human lacks deep expertise, is an important open question.