Learning Deep Hierarchical Visual Feature Coding

Page view(s)
Checked on Mar 27, 2024
Learning Deep Hierarchical Visual Feature Coding
Learning Deep Hierarchical Visual Feature Coding
Journal Title:
IEEE Transactions on Neural Networks and Learning Systems
Publication Date:
11 March 2014
H. Goh, N. Thome, M. Cord and J. H. Lim, "Learning Deep Hierarchical Visual Feature Coding," in IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 12, pp. 2212-2225, Dec. 2014. doi: 10.1109/TNNLS.2014.2307532
In this paper, we propose a hybrid architecture that combines the image modeling strengths of the bag of words framework with the representational power and adaptability of learning deep architectures. Local gradient-based descriptors, such as SIFT, are encoded via a hierarchical coding scheme composed of spatial aggregating restricted Boltzmann machines (RBM). For each coding layer, we regularize the RBM by encouraging representations to fit both sparse and selective distributions. Supervised fine-tuning is used to enhance the quality of the visual representation for the categorization task. We performed a thorough experimental evaluation using three image categorization data sets. The hierarchical coding scheme achieved competitive categorization accuracies of 79.7% and 86.4% on the Caltech-101 and 15-Scenes data sets, respectively. The visual representations learned are compact and the model's inference is fast, as compared with sparse coding methods. The low-level representations of descriptors that were learned using this method result in generic features that we empirically found to be transferrable between different image data sets. Further analysis reveal the significance of supervised fine-tuning when the architecture has two layers of representations as opposed to a single layer.
License type:
Funding Info:
(c) 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
Files uploaded:

File Size Format Action
tnnls-manuscript-final.pdf 1.38 MB PDF Open