EgoPet: Egomotion and Interaction Data from an Animal's Perspective

Published 15 Apr 2024 in cs.RO and cs.CV | (2404.09991v1)

Abstract: Animals perceive the world to plan their actions and interact with other agents to accomplish complex tasks, demonstrating capabilities that are still unmatched by AI systems. To advance our understanding and reduce the gap between the capabilities of animals and AI systems, we introduce a dataset of pet egomotion imagery with diverse examples of simultaneous egomotion and multi-agent interaction. Current video datasets separately contain egomotion and interaction examples, but rarely both at the same time. In addition, EgoPet offers a radically distinct perspective from existing egocentric datasets of humans or vehicles. We define two in-domain benchmark tasks that capture animal behavior, and a third benchmark to assess the utility of EgoPet as a pretraining resource to robotic quadruped locomotion, showing that models trained from EgoPet outperform those trained from prior datasets.

Abstract PDF HTML Upgrade to Chat

Authors (8)

References (1)

Labuguen, R., Matsumoto, J., Negrete, S.B., Nishimaru, H., Nishijo, H., Takada, M., Go, Y., Inoue, K.i., Shibata, T.: Macaquepose: a novel “in the wild” macaque monkey pose dataset for markerless motion capture. Frontiers in behavioral neuroscience 14, 581154 (2021)

Citations (1)

View on Semantic Scholar

Summary

The paper introduces EgoPet, a novel egocentric dataset with over 84 hours of footage and 6,646 segments capturing interactive animal behaviors.
It outlines key tasks like Visual Interaction, Locomotion, and Vision to Proprioception Prediction, benchmarking AI's understanding of animal actions.
Empirical evaluations show models pretrained on EgoPet improve downstream robotic applications, highlighting its impact on AI and animal behavior research.

Introducing the EgoPet Dataset: Advancing AI's Understanding of Animal Behavior from an Egocentric Perspective

Overview of the EgoPet Dataset

The EgoPet dataset offers a novel perspective in egocentric video datasets by focusing on a variety of animals. Comprising over 84 hours of video footage across more than 6,646 video segments, EgoPet enriches the AI research landscape by providing insights into animal behaviors from their own viewpoints. The dataset predominantly features domestic animals like dogs and cats but includes diverse species such as eagles, turtles, and dolphins.

Unique Features and Dataset Composition

EgoPet diverges from traditional video datasets by combining elements of egomotion and interactive behaviors within a singular dataset. Most animal behavior studies utilize either third-person views or non-interactive egocentric videos; EgoPet fills this gap by offering first-person, interactive videos of animals engaging with their environments.

Video Source and Variety: The majority of the videos come from platforms like TikTok and YouTube, focusing on pets equipped with mounted cameras, showcasing natural behavior in uncontrolled environments.
Video Segmentation: Videos are segmented into clips, with a thoughtful curation process to ensure each segment from an egocentric point of view highlights interactive or locomotive behaviors.
Annotation Detail: Segments are meticulously annotated for both the presence of interactive behaviors and the type of interaction (e.g., interacting with humans, other animals, or objects).

Tasks Defined on EgoPet

EgoPet is not just a dataset; it's a comprehensive research tool thanks to the three tasks defined for benchmarking AI models: Visual Interaction Prediction (VIP), Locomotion Prediction (LP), and Vision to Proprioception Prediction (VPP).

Visual Interaction Prediction (VIP): Task focuses on recognizing when an animal is interacting with an object or another creature from its own viewpoint, relying on detailed temporal annotations of interactions within the videos.
Locomotion Prediction (LP): This involves predicting the movement trajectory of an animal based on past video frames, which is crucial for understanding navigation and obstacle avoidance behaviors.
Vision to Proprioception Prediction (VPP): The most innovative of the tasks, VPP involves predicting the features of a landscape (such as terrain type) from visual inputs alone, which has direct applications in robotics, particularly for developing autonomous navigational capabilities in uneven or complex terrains.

Empirical Evaluations and Initial Findings

Models trained on EgoPet have shown promising results, particularly in tasks that demand a nuanced understanding of physical space and interaction, such as VPP. The dataset's unique composition allows AI systems to better model animal-like perception and interaction skills, outperforming models trained on more conventional datasets like Ego4D and Kinetics 400.

Task Performance: Initial benchmarks indicate that tasks like VIP and LP are challenging and far from being solved, highlighting the complexity of animal behavior and the potential for future research.
Cross-Dataset Utility: Interestingly, pretraining on EgoPet not only benefits tasks defined on the dataset itself but also enhances performance on downstream robotic tasks, underscoring its value beyond mere animal behavior modeling.

The Road Ahead

The introduction of EgoPet is poised to catalyze further research in both the understanding of animal behavior through AI and the application of these insights into practical domains such as robotics. Future directions might include integrating multimodal data to capture more sensory dimensions (like audio), or expanding the dataset to include a wider variety of animal species. Furthermore, as AI models become more adept at interpreting these complex data inputs, we can anticipate broader applications in autonomous systems development, biological research, and enhancing human-animal interaction technologies.

Conclusion

EgoPet represents a significant step forward in modeling animal behaviors through machine learning, offering both a unique dataset and challenging tasks that push the boundaries of current AI capabilities. As research progresses in this area, EgoPet will undoubtedly serve as a benchmark for developing more advanced, understanding, and empathetic AI systems.

Markdown Report Issue