Robust visual object tracking with top-down reasoning

摘要

In generic visual tracking, traditional appearance based trackers suffer from distracting factors like bad lighting or major target deformation, etc., as well as insufficiency of training data. In this work, we propose to exploit the category-specific semantics to boost visual object tracking, and develop a new visual tracking model that augments the appearance based tracker with a top-down reasoning component. The continuous feedback from this reasoning component guides the tracker to reliably identify candidate regions with consistent semantics across frames and localize the target object instance more robustly and accurately. Specifically, a generic object recognition model and a semantic activation map method are deployed to provide effective top-down reasoning about object locations for the tracker. In addition, we develop a voting based scheme for the reasoning component to infer the object semantics. Therefore, even without sufficient training data, the tracker can still obtain reliable top-down clues about the objects. Together with the appearance clues, the tracker can localize objects accurately even in presence of various major distracting factors. Extensive evaluations on two large-scale benchmark datasets, OTB2013 and OTB2015, clearly demonstrate that the top-down reasoning substantially enhances the robustness of the tracker and provides state-of-the-art performance.

精选论文
出版物
ACM international conference on Multimedia (ACM MM)