Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation

Published 17 Oct 2024 in cs.CL, cs.AI, and cs.LG | (2410.13640v2)

Abstract: LLM self-evaluation relies on the LLM's own ability to estimate response correctness, which can greatly improve its deployment reliability. In this research track, we propose the Chain-of-Embedding (CoE) in the latent space to enable LLMs to perform output-free self-evaluation. CoE consists of all progressive hidden states produced during the inference time, which can be treated as the latent thinking path of LLMs. We find that when LLMs respond correctly and incorrectly, their CoE features differ, these discrepancies assist us in estimating LLM response correctness. Experiments in four diverse domains and seven LLMs fully demonstrate the effectiveness of our method. Meanwhile, its label-free design intent without any training and millisecond-level computational cost ensures real-time feedback in large-scale scenarios. More importantly, we provide interesting insights into LLM response correctness from the perspective of hidden state changes inside LLMs.

Abstract PDF HTML Upgrade to Chat

Authors (5)

Citations (1)

View on Semantic Scholar

Summary

The paper introduces Chain-of-Embedding (CoE), a novel latent space method enabling LLMs to self-evaluate response correctness without ground truth labels or external tools.
CoE analyzes the geometric features (magnitude and angle changes) of progressive hidden states during inference to quantify the model's 'thinking path' and predict correctness.
Evaluations across diverse domains and LLMs demonstrate CoE achieves state-of-the-art label-free self-evaluation performance, significantly improving AUROC, FPR95, and AUPR while being efficient and robust.

Here's a detailed summary of the paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation".

The paper addresses the problem of label-free self-evaluation in LLMs. The goal is to estimate the correctness of an LLM's response without relying on ground truth labels or external tools. The authors propose a novel method called Chain-of-Embedding (CoE) in the latent space to enable LLMs to perform output-free self-evaluation. The CoE consists of all progressive hidden states produced during the inference time, which the authors treat as the latent thinking path of LLMs. The core idea is that the trajectories of hidden states differ when LLMs generate correct versus incorrect responses.

The authors formalize the CoE as a progressive chain $\bm{H}$ of sentence hidden states:

$\bm{H} = \underbrace{\bm{h}_0}_{\text{Input State}} \rightarrow \underbrace{\bm{h}_1 \rightarrow \cdots \rightarrow \bm{h}_l \rightarrow \cdots \rightarrow \bm{h}_{L-1}}_{\text{Intermediate Hidden States}} \rightarrow \underbrace{\bm{h}_L}_{\text{Output State}}$

where:

$\bm{h}_l$ is the average embedding at layer $l$
$L$ is the number of hidden layers in the model.

They quantify the CoE's geometric features, focusing on magnitude and angle changes between adjacent states. The magnitude change $M(\bm{h}_l, \bm{h}_{l+1})$ is the L2-norm between hidden states, and the angle change $A(\bm{h}_l, \bm{h}_{l+1})$ is the arc cosine of the normalized dot product of the hidden states:

$M(\bm{h}_l, \bm{h}_{l+1}) = ||\bm{h}_{l+1} - \bm{h}_l||_2$

$A(\bm{h}_l, \bm{h}_{l+1}) = \arccos \left( \frac{\bm{h}_{l+1}^{\top} \bm{h}_l}{||\bm{h}_{l+1}||_2 \cdot ||\bm{h}_{l}||_2} \right)$

Then, the magnitude and angle features of the whole CoE trajectory, denoted as $\mathrm{Mag}(\bm{H})$ and $\mathrm{Ang}(\bm{H})$ , can be defined as the average changes in magnitude and angle between each adjacent state pair:

$\mathrm{Mag}(\bm{H}) = \frac{1}{L} \cdot \sum_{l=0}^{L-1} \frac{M(\bm{h}_l, \bm{h}_{l+1})}{\mathcal{Z}_\mathrm{Mag}}$

$\mathrm{Ang}(\bm{H}) = \frac{1}{L} \cdot \sum_{l=0}^{L-1} \frac{A(\bm{h}_l, \bm{h}_{l+1}) }{\mathcal{Z}_\mathrm{Ang}}$

where:

$\mathcal{Z}_\mathrm{Mag} = M(\bm{h}_0, \bm{h}_{L})$ and $\mathcal{Z}_\mathrm{Ang} = A(\bm{h}_0, \bm{h}_{L})$ are scaling factors.

Based on these features, the authors propose two CoE-based metrics for label-free LLM self-evaluation:

CoE-R: Real-Space Combination This metric calculates a numerical summation of the magnitude and angle features:

$\text{CoE-R}(\bm{H}) = \frac{1}{L} \cdot \sum_{l=0}^{L-1} \left( \frac{M(\bm{h}_l, \bm{h}_{l+1})}{M(\bm{h}_0, \bm{h}_{L})} - \frac{A(\bm{h}_l, \bm{h}_{l+1}) }{A(\bm{h}_0, \bm{h}_{L})} \right)$
CoE-C: Complex-Space Combination This metric maps magnitude and angle to the complex plane and combines them into a new feature point $C(\bm{h}_l, \bm{h}_{l+1})$ .

$C(\bm{h}_l, \bm{h}_{l+1}) = M(\bm{h}_l, \bm{h}_{l+1}) e^{i \cdot A(\bm{h}_l, \bm{h}_{l+1})}$

The final $\text{CoE-C}(\bm{H})$ score is the magnitude of the average of these complex numbers:

$\text{CoE-C}(\bm{H}) = \frac{1}{L} \cdot \sqrt{ \left( \sum_{l=0}^{L-1} M(\bm{h}_l, \bm{h}_{l+1}) \cos(A(\bm{h}_l, \bm{h}_{l+1})) \right)^2 + \left( \sum_{l=0}^{L-1} M(\bm{h}_l, \bm{h}_{l+1}) \sin(A(\bm{h}_l, \bm{h}_{l+1})) \right)^2 }$

The authors evaluated their method on four diverse domains: Mathematics (MATH), Reasoning (TheoremQA), Knowledge (MMLU), and Understanding (Belebele), using seven LLMs, including Llama2-7B-Instruct, Llama3-8B-Instruct, Qwen1.5-7B-Instruct, Qwen2-7B-Instruct, Mistral-7B-Instruct, Llama3-70B-Instruct, and Qwen2-72B-Instruct. The evaluation metrics were AUROC, FPR95, and AUPR.

The results showed that CoE achieves state-of-the-art performance across almost all scenarios. On average, CoE improves by 8.30% in AUROC, 5.55% in FPR95, and 5.52% in AUPR compared to the best baseline. Component ablation studies show the positive influence of both magnitude and angle components, with CoE-C demonstrating greater robustness. The method also shows robustness to data ratio variations and multilingual scalability on the MGSM dataset. Efficiency analysis shows the execution costs are at the millisecond level.

The authors provide a theoretical analysis revisiting CoE-C and CoE-R, proving that CoE-C is more robust than CoE-R due to its lower sensitivity to outliers. They also compare their label-free CoE to label-based methods ITI and MIND, demonstrating that CoE maintains consistent applicability across diverse datasets and is robust in complex real-world scenarios.

Markdown Report Issue