Risk-surface protocols for HITL to automated validation transitions

Develop protocols that manage the risk surface in transitions from human-in-the-loop validation to automated validation by addressing three failure modes: shared-mode error via base-LLM diversity, collusive calibration via held-out physical measurements, and governance via retained sign-off at federation level.

Background

The paper anticipates that validation may evolve from human-in-the-loop toward more automated forms on fixed problem classes, but highlights significant risks that must be managed systematically.

Three classes of failure—shared LLM failure modes, collusive calibration, and liability governance—require explicit procedural safeguards before automation is safe.

References

Nine open questions will determine whether instrumented data matures into a recognised substrate for scientific machine learning. HITL→ automated validation as a risk surface. Three failure modes need protocols: shared-mode error (diversity across base LLMs), collusive calibration (held-out physical measurements), and autonomy in production is not autonomy in liability (sign-off retained at federation level).

Instrumented data for causal scientific machine learning  (2606.07865 - Wilke, 5 Jun 2026) in Section 7, Methodological questions for the community, Item 8