Separating Stylistic and Content Quality in LLM-as-a-Judge
Design methods that classify and separately evaluate stylistic quality and content quality when LLM-as-a-Judge assesses responses, preventing stylistic presentation from masking factual inaccuracies.
References
The open research problems in this context are: Design different methods for classifying stylistic quality and content quality during the evaluation.
— Security in LLM-as-a-Judge: A Comprehensive SoK
(2603.29403 - Masoud et al., 31 Mar 2026) in Section 7.3, Length and Style Bias Exploitation (Challenges and Open Problems)