Papers
Topics
Authors
Recent
Search
2000 character limit reached

Binary State Recognition by Robots using Visual Question Answering of Pre-Trained Vision-Language Model

Published 25 Oct 2023 in cs.RO | (2310.16405v1)

Abstract: Recognition of the current state is indispensable for the operation of a robot. There are various states to be recognized, such as whether an elevator door is open or closed, whether an object has been grasped correctly, and whether the TV is turned on or off. Until now, these states have been recognized by programmatically describing the state of a point cloud or raw image, by annotating and learning images, by using special sensors, etc. In contrast to these methods, we apply Visual Question Answering (VQA) from a Pre-Trained Vision-LLM (PTVLM) trained on a large-scale dataset, to such binary state recognition. This idea allows us to intuitively describe state recognition in language without any re-training, thereby improving the recognition ability of robots in a simple and general way. We summarize various techniques in questioning methods and image processing, and clarify their properties through experiments.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.