HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval

HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval
Title:
HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval
Other Titles:
IEEE Transactions on Multimedia
DOI:
10.1109/TMM.2017.2713410
Publication Date:
01 September 2017
Citation:
J. Lin et al., "HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval," in IEEE Transactions on Multimedia, vol. 19, no. 9, pp. 1968-1983, Sept. 2017. doi: 10.1109/TMM.2017.2713410
Abstract:
With emerging demand for large-scale video analysis, MPEG initiated the compact descriptor for video analysis (CDVA) standardization in 2014. Beyond handcrafted descriptors adopted by the current MPEG-CDVA reference model, we study the problem of deep learned global descriptors for video matching, localization, and retrieval. First, inspired by a recent invariance theory, we propose a nested invariance pooling (NIP) method to derive compact deep global descriptors from convolutional neural networks (CNNs), by progressively encoding translation, scale, and rotation invariances into the pooled descriptors. Second, our empirical studies have shown that a sequence of well designed pooling moments (e.g., max or average) may drastically impact video matching performance, which motivates us to design hybrid pooling operations via NIP (HNIP). HNIP has further improved the discriminability of deep global descriptors. Third, the technical merits and performance improvements by combining deep and handcrafted descriptors are provided to better investigate the complementary effects. We evaluate the effectiveness of HNIP within the well-established MPEG-CDVA evaluation framework. The extensive experiments have demonstrated that HNIP outperforms the state-of-the-art deep and canonical handcrafted descriptors with significant mAP gains of 5.5% and 4.7%, respectively. In particular the combination of HNIP incorporated and handcrafted global descriptors has significantly boosted the performance of CDVA core techniques with comparable descriptor size.
License type:
PublisherCopyrights
Funding Info:
Description:
© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
ISSN:
1941-0077
1520-9210
Files uploaded:

File Size Format Action
tmm-hnip.pdf 14.66 MB PDF Open