Papers
Topics
Authors
Recent
Search
2000 character limit reached

VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding

Published 24 Nov 2024 in cs.CV | (2411.15839v2)

Abstract: Large Vision-LLMs (LVLMs) have demonstrated remarkable capabilities in multimodal task reasoning. However, they often generate responses that appear plausible yet do not accurately reflect the visual content, a phenomenon known as hallucination. Recent approaches have introduced training-free methods to mitigate hallucinations by adjusting the decoding strategy during the inference stage, typically attributing hallucinations to the LLM itself. Our analysis, however, reveals that distortions in the visual encoding process significantly affect the model's reasoning capabilities. Specifically, earlier visual layers may retain key features but gradually distort as the information propagates toward the output layer. Building on these insights, we propose a novel hallucination-mitigation method from the visual encoding perspective: \textbf{V}isu\textbf{a}l \textbf{L}ayer Fus\textbf{i}on Contrastive \textbf{D}ecoding (\textbf{VaLiD}). This method utilizes uncertainty to guide the visual layer selection, correcting distortions in the visual encoding process and thereby enhancing the reliability of the generated content. Experimental results demonstrate the effectiveness of VaLiD in mitigating hallucinations across various benchmarks, achieving state-of-the-art performance when compared to baseline methods. Codes are available at \href{https://github.com/RicardoLuL/VaLiD_LVLMs_hallucinations}{Github}.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.