Papers
Topics
Authors
Recent
Search
2000 character limit reached

Gazeify Then Voiceify: Physical Object Referencing Through Gaze and Voice Interaction with Displayless Smart Glasses

Published 27 Jan 2026 in cs.HC | (2601.19281v1)

Abstract: Smart glasses enhance interactions with the environment by using head-mounted cameras to observe the user's viewpoint, but lack the visual feedback used for common interactions. We introduce Gazeify then Voiceify, a multimodal approach allowing object selection via gaze and voice using displayless smart glasses. Users can select a physical object with their gaze, and the system generates a digital mask and a voice description of the object's semantics. Users can further correct errors through free-form conversation. To demonstrate our approach, we develop an interactive system by integrating advanced object segmentation and detection with a vision-LLM. User studies reveal that participants achieve correct gaze selection in 53% of the task trials and use voice disambiguation to correct 58% of the remaining errors. Participants also rated the system as likable, useful, and easy to use.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.