SCA-PVNet: Self-and-cross attention based aggregation of point cloud and multi-view for 3D object retrieval

Page view(s)
20
Checked on Nov 05, 2024
SCA-PVNet: Self-and-cross attention based aggregation of point cloud and multi-view for 3D object retrieval
Title:
SCA-PVNet: Self-and-cross attention based aggregation of point cloud and multi-view for 3D object retrieval
Journal Title:
Knowledge-Based Systems
Keywords:
Publication Date:
11 May 2024
Citation:
Lin, D., Cheng, Y., Guo, A., Mao, S., & Li, Y. (2024). SCA-PVNet: Self-and-cross attention based aggregation of point cloud and multi-view for 3D object retrieval. Knowledge-Based Systems, 296, 111920. https://doi.org/10.1016/j.knosys.2024.111920
Abstract:
To address 3D object retrieval, substantial efforts have been made to generate highly discriminative descriptors for 3D objects represented by a single modality, such as voxels, point clouds, or multiview images. It is promising to leverage complementary information from multimodal representations of 3D objects to further improve retrieval performance. However, multimodal 3D object retrieval has rarely been developed or analyzed for large-scale datasets. In this paper, we propose a self-and-cross-attention-based aggregation of point clouds and multiview images (SCA-PVNet) for 3D object retrieval. With deep features extracted from point clouds and multi-view images, we design two types of feature aggregation modules, namely the in-modality aggregation module (IMAM) and the cross-modality aggregation module (CMAM ), for effective feature fusion. IMAM leverages a self-attention mechanism to aggregate multiview features, whereas CMAM exploits a cross-attention mechanism to interact with point-cloud and multiview features. The final descriptor of a 3D object for object retrieval can be obtained by concatenating the aggregated feature outputs of both modules. Extensive experiments and analyses were conducted on four datasets ranging from small to large scales to demonstrate the superiority of the proposed SCA-PVNet over state-of-the-art methods. In addition to achieving state-of-the-art retrieval performance, our method is more robust in challenging scenarios when views or points are missing during inference.
License type:
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Funding Info:
This research / project is supported by the A*STAR - RIE2020 INDUSTRY ALIGNMENT FUND-INDUSTRY COLLABORATION PROJECTS (IAF-ICP)
Grant Reference no. : I2001E0073
Description:
ISSN:
0950-7051
Files uploaded:

File Size Format Action
kbs-main-finalversion.pdf 3.87 MB PDF Request a copy