Papers
Topics
Authors
Recent
Search
2000 character limit reached

Feedback-Enhanced Hallucination-Resistant Vision-Language Model for Real-Time Scene Understanding

Published 7 Apr 2025 in cs.LG | (2504.04772v1)

Abstract: Real-time scene comprehension is a key advance in artificial intelligence, enhancing robotics, surveillance, and assistive tools. However, hallucination remains a challenge. AI systems often misinterpret visual inputs, detecting nonexistent objects or describing events that never happened. These errors, far from minor, threaten reliability in critical areas like security and autonomous navigation where accuracy is essential. Our approach tackles this by embedding self-awareness into the AI. Instead of trusting initial outputs, our framework continuously assesses them in real time, adjusting confidence thresholds dynamically. When certainty falls below a solid benchmark, it suppresses unreliable claims. Combining YOLOv5's object detection strength with VILA1.5-3B's controlled language generation, we tie descriptions to confirmed visual data. Strengths include dynamic threshold tuning for better accuracy, evidence-based text to reduce hallucination, and real-time performance at 18 frames per second. This feedback-driven design cuts hallucination by 37 percent over traditional methods. Fast, flexible, and reliable, it excels in applications from robotic navigation to security monitoring, aligning AI perception with reality.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.