Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multi-attention Networks for Temporal Localization of Video-level Labels

Published 15 Nov 2019 in cs.CV | (1911.06866v1)

Abstract: Temporal localization remains an important challenge in video understanding. In this work, we present our solution to the 3rd YouTube-8M Video Understanding Challenge organized by Google Research. Participants were required to build a segment-level classifier using a large-scale training data set with noisy video-level labels and a relatively small-scale validation data set with accurate segment-level labels. We formulated the problem as a multiple instance multi-label learning and developed an attention-based mechanism to selectively emphasize the important frames by attention weights. The model performance is further improved by constructing multiple sets of attention networks. We further fine-tuned the model using the segment-level data set. Our final model consists of an ensemble of attention/multi-attention networks, deep bag of frames models, recurrent neural networks and convolutional neural networks. It ranked 13th on the private leader board and stands out for its efficient usage of resources.

Citations (7)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.