Ψ-Arena: Interactive Assessment and Optimization of LLM-based Psychological Counselors with Tripartite Feedback

Published 6 May 2025 in cs.CL | (2505.03293v1)

Abstract: LLMs have shown promise in providing scalable mental health support, while evaluating their counseling capability remains crucial to ensure both efficacy and safety. Existing evaluations are limited by the static assessment that focuses on knowledge tests, the single perspective that centers on user experience, and the open-loop framework that lacks actionable feedback. To address these issues, we propose Ψ-Arena, an interactive framework for comprehensive assessment and optimization of LLM-based counselors, featuring three key characteristics: (1) Realistic arena interactions that simulate real-world counseling through multi-stage dialogues with psychologically profiled NPC clients, (2) Tripartite evaluation that integrates assessments from the client, counselor, and supervisor perspectives, and (3) Closed-loop optimization that iteratively improves LLM counselors using diagnostic feedback. Experiments across eight state-of-the-art LLMs show significant performance variations in different real-world scenarios and evaluation perspectives. Moreover, reflection-based optimization results in up to a 141% improvement in counseling performance. We hope PsychoArena provides a foundational resource for advancing reliable and human-aligned LLM applications in mental healthcare.

Abstract PDF Upgrade to Chat

Summary

The paper presents Ψ-Arena, a framework that simulates realistic counseling sessions and achieves up to 141% performance improvement.
It utilizes a tripartite evaluation system assessing client satisfaction, supervisory standards, and counselor self-reflection across 33 dimensions.
The closed-loop optimization continuously refines LLM responses, marking a significant advance in AI-driven mental health support.

Ψ-Arena: Interactive Assessment and Optimization of LLM-based Psychological Counselors with Tripartite Feedback

Introduction

The utilization of LLMs in mental health support presents an emergent yet challenging domain. While such models show potential in providing scalable psychological counseling, ensuring their efficacy and safety remains crucial. Existing evaluation methods are limited by static assessments, a user-centric focus, and open-loop frameworks without actionable feedback. The paper "Ψ-Arena: Interactive Assessment and Optimization of LLM-based Psychological Counselors with Tripartite Feedback" (2505.03293) introduces an innovative framework ( $\Psi$ -Arena) to address these limitations by implementing comprehensive, interactive, and closed-loop evaluations of LLM counselors.

Framework Overview

$\Psi$ -Arena distinguishes itself by simulating real-world counseling through multi-stage dialogues with NPC clients possessing rich psychological profiles. This approach incorporates three key features:

Realistic Counseling Interactions: Virtual clients are created with authentic characteristics derived from real counseling records, allowing for meaningful exchanges throughout phases such as trust-building, diagnosis, and solution exploration.
Tripartite Evaluation: Counselor performance is assessed from multiple perspectives—client, supervisor, and counselor—each offering distinct insights into the model's capabilities across 33 dimensions.
Closed-loop Optimization: Using diagnostic feedback, LLM counselors undergo iterative improvement, enhancing their therapeutic capabilities significantly, evidenced by counseling performance improvements of up to 141%.

Evaluation and Results

The study measures the effectiveness of eight LLMs, demonstrating significant variability in performance across different counseling scenarios and perspectives. The introduction of realistic virtual interactions through $\Psi$ -Arena underscores the necessity of this multi-source evaluation approach to achieve high consistency in outcomes.

Figure 1: The comparison between $\Psi$ -Arena and existing studies on evaluating LLM-base counselors.

Tripartite Feedback and Optimization

The tripartite evaluation system comprises client-oriented, supervisor-oriented, and counselor-oriented scales, ensuring a holistic appraisal of model competencies:

Client Scale emphasizes subjective experience and satisfaction.
Supervisor Scale assesses professional competence and ethical adherence.
Counselor Scale promotes self-reflective practice.

The closed-loop feedback mechanism uses these evaluations to refine counselor responses iteratively.

Figure 2: Client scale.

Implications and Future Directions

The incorporation of closed-loop feedback significantly enhances the practical efficacy of LLM-based counseling, aligning it closer with therapeutic standards. This framework serves as a foundational resource for advancing LLM applications in mental healthcare and potentially opens new avenues for deploying AI in diverse psychological domains. Future research could explore the scalability of this model and its application across various cultural contexts to enhance the inclusivity and personalization of AI-driven mental health support.

Conclusion

The $\Psi$ -Arena framework presents a robust system for evaluating and improving the capabilities of LLM-based psychological counselors. Its comprehensive approach, combining realistic interactions with multi-angle evaluations and feedback-driven optimization, marks a pivotal step towards achieving clinically effective and human-aligned AI in mental healthcare. This work provides a significant contribution to the responsible development of AI applications in therapeutic settings.