Fang, F., Liu, Y., & Xu, Q. (2024). Localizing discriminative regions for fine-grained visual recognition: One could be better than many. Neurocomputing, 610, 128611. https://doi.org/10.1016/j.neucom.2024.128611
Abstract:
Fine-grained visual recognition (FGVR) involves distinguishing between highly similar subcategories. To capture subtle differences among closely related subcategories, prevalent methodologies first localize the discriminative region in the image and then classify it. However, previous detection-based and attention-based methods for discriminative region localization possess inherent limitations, thus limiting the performance. Deep reinforcement learning (DRL) is a good choice as it can autonomously determine optimal actions to achieve a given objective, e.g., localizing the discriminative region. Nevertheless, existing DRL-based approaches learn to simultaneously localize an uncertain number of discriminative regions, which may pose challenges for the DRL agent. Additionally, the optimization of DRL relies on recognition feedback from an FGVR classifier, whereas existing approaches just employ standard networks as the classifier. To address these challenges, we propose a Reinforced Most-Discriminative Region Localization (RMDRL) module to adaptively localize a single, most discriminative region in the image. To provide precise feedback for training the RMDRL module, we propose a Discriminative Knowledge Self-Distillation (DKSD) module to cultivate a robust FGVR classifier. Our extensive experimentation across nine benchmarks validates the efficacy of our approach for FGVR. Our findings also support that one discriminative region localized by DRL could be better than multiple discriminative regions.
License type:
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Funding Info:
This research / project is supported by the Agency for Science, Technology and Research - AME Programmatic Grant
Grant Reference no. : A18A2b0046