Papers
Topics
Authors
Recent
Search
2000 character limit reached

Interactive Fusion of Multi-level Features for Compositional Activity Recognition

Published 10 Dec 2020 in cs.CV | (2012.05689v1)

Abstract: To understand a complex action, multiple sources of information, including appearance, positional, and semantic features, need to be integrated. However, these features are difficult to be fused since they often differ significantly in modality and dimensionality. In this paper, we present a novel framework that accomplishes this goal by interactive fusion, namely, projecting features across different spaces and guiding it using an auxiliary prediction task. Specifically, we implement the framework in three steps, namely, positional-to-appearance feature extraction, semantic feature interaction, and semantic-to-positional prediction. We evaluate our approach on two action recognition datasets, Something-Something and Charades. Interactive fusion achieves consistent accuracy gain beyond off-the-shelf action recognition algorithms. In particular, on Something-Else, the compositional setting of Something-Something, interactive fusion reports a remarkable gain of 2.9% in terms of top-1 accuracy.

Citations (15)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.