Learning Deep Hierarchical Visual Feature Coding

Page view(s)

Checked on Sep 09, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/12797

Title:

Learning Deep Hierarchical Visual Feature Coding

Journal Title:

IEEE Transactions on Neural Networks and Learning Systems

DOI:

10.1109/TNNLS.2014.2307532

Publication URL:

http://dx.doi.org/10.1109/TNNLS.2014.2307532

Authors:

Hanlin Goh, Nicolas Thome, Joo-Hwee Lim, Matthieu Cord

Keywords:

Publication Date:

11 March 2014

Citation:

H. Goh, N. Thome, M. Cord and J. H. Lim, "Learning Deep Hierarchical Visual Feature Coding," in IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 12, pp. 2212-2225, Dec. 2014. doi: 10.1109/TNNLS.2014.2307532

Abstract:

In this paper, we propose a hybrid architecture that combines the image modeling strengths of the bag of words framework with the representational power and adaptability of learning deep architectures. Local gradient-based descriptors, such as SIFT, are encoded via a hierarchical coding scheme composed of spatial aggregating restricted Boltzmann machines (RBM). For each coding layer, we regularize the RBM by encouraging representations to fit both sparse and selective distributions. Supervised fine-tuning is used to enhance the quality of the visual representation for the categorization task. We performed a thorough experimental evaluation using three image categorization data sets. The hierarchical coding scheme achieved competitive categorization accuracies of 79.7% and 86.4% on the Caltech-101 and 15-Scenes data sets, respectively. The visual representations learned are compact and the model's inference is fast, as compared with sparse coding methods. The low-level representations of descriptors that were learned using this method result in generic features that we empirically found to be transferrable between different image data sets. Further analysis reveal the significance of supervised fine-tuning when the architecture has two layers of representations as opposed to a single layer.

License type:

PublisherCopyrights

Funding Info:

Description:

(c) 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.

URI:

https://oar.a-star.edu.sg/communities-collections/articles/12797

ISSN:

2162-237X

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
tnnls-manuscript-final.pdf	1.38 MB	PDF	Open