Auditing and calibrating LLM-driven verification gates
Develop auditing protocols to calibrate large-language-model-driven quality gates used in verification and validation, incorporating adversarial probing, measurement of inter-agent disagreement, and evaluation against held-out physical measurements.
References
Nine open questions will determine whether instrumented data matures into a recognised substrate for scientific machine learning. Verification of the verifier. Quality gates are LLM-driven; auditing their calibration requires adversarial probing, inter-agent disagreement, and held-out physical measurements.
— Instrumented data for causal scientific machine learning
(2606.07865 - Wilke, 5 Jun 2026) in Section 7, Methodological questions for the community, Item 3