Improving LLM “oracle” aleatoric uncertainty via prompting
Develop prompting strategies for the Mistral Medium (version 25.05) large language model used as a confidence-scoring "oracle" to generate responses whose confidence scores more accurately estimate aleatoric uncertainty when answering PubMedQA-style biomedical questions with the specified JSON-format interface.
References
Although more complex prompts were tested to generate responses that better estimate aleatoric uncertainty, we were unable to improve the quality of this approach.
— Enhancing the Reliability of Medical AI through Expert-guided Uncertainty Modeling
(2604.01898 - Khalin et al., 2 Apr 2026) in Subsection "Generating confidence scores from an LLM 'oracle'", Supplementary Methods