Lin, D., Li, Y., Cheng, Y., Prasad, S., Nwe, T. L., Dong, S., & Guo, A. (2022). Multi-view 3D object retrieval leveraging the aggregation of view and instance attentive features. Knowledge-Based Systems, 247, 108754. https://doi.org/10.1016/j.knosys.2022.108754
In the task of multi-view 3D object retrieval, it is pivotal to aggregate visual features extracted from multiple view images to generate a discriminative representation for a 3D object. The existing Multi-View Convolutional Neural Network (MVCNN) paradigm exploits view pooling for feature aggregation which overlooks (i) the local view-relevant discriminative information within each view image and (ii) the global correlative information across all the view images. To leverage both types of information, we propose two self-attention modules, namely View Attention Module (VAM) and Instance Attention Module (IAM) to learn view and instance attentive features, respectively. The final representation of a 3D object is the aggregation of three features: the original, the view-attentive and the instance-attentive features. Furthermore, we propose to employ ArcFace loss together with cosine distance based triplet-center loss as the metric learning guidance to train our model. Since cosine distance is used to rank the retrieval results, our angular metric learning losses strike a consistent objective between the training and testing processes, thereby facilitating discriminative feature learning. Extensive experiments and ablation studies are conducted on four publicly available datasets on 3D object retrieval to show the superiority of the proposed method over multiple state-of-the-art methods.
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
This research / project is supported by the A*STAR - RIE2020 INDUSTRY ALIGNMENT FUND – INDUSTRY COLLABORATION PROJECTS (IAF-ICP)
Grant Reference no. : I2001E0073