Accuracy, justification quality, and human comparability of LLMs on raw first-person clinical narratives
Establish whether large language models can (i) maintain diagnostic accuracy when operating on raw first-person autobiographical testimonies, (ii) provide diagnosis-relevant justifications grounded in those testimonies, and (iii) achieve performance and reasoning that are comparable to those of mental health professionals.
References
This discrepancy highlights a critical research gap: it remains unknown if LLMs can maintain diagnostic accuracy on raw first-person testimonies, provide diagnosis-relevant justifications, and whether their performance and reasoning compare with those of mental health professionals.
— Patterns vs. Patients: Evaluating LLMs against Mental Health Professionals on Personality Disorder Diagnosis through First-Person Narratives
(2512.20298 - Drożdż et al., 23 Dec 2025) in Introduction