Multi-human tracking in the crowded environment is a challenging problem due to occlusions, pose change,
viewpoint variation and cluttered background. In this work,
we propose a robust feature learning for tracking-by-detection
methods based on second-order attention network that can
capture higher-order relationships between salient features at
the early stages of Convolutional Neural Network (CNN).
Guided Second-Order Attention Network (GSAN) that, unlike
the existing attention learning methods which are weaklysupervised, uses a supervisory signal based on the quality
of the self-learned attention maps. More specifically, GSAN
looks into the attended maps of a person having the highest
confidence and supervise itself to look into the correct regions
in the images of the person. Attention maps learned this way
are spatially aligned and thus robust to camera-view changes
and body pose variations. We verify the effectiveness of our
approach by comparing with the state-of-the-art methods on
challenging person re-identification and multi object tracking
(MOT) datasets.
License type:
Funding Info:
This research is partially supported by SERC grant No.1622500036 from the National Robotics Programme (NRP),Singapore