Lin, D., Li, Y., Cheng, Y., Prasad, S., Guo, A., & Cao, Y. (2023). Multi-Range View Aggregation Network with Vision Transformer Feature Fusion for 3D Object Retrieval. IEEE Transactions on Multimedia, 1–12. https://doi.org/10.1109/tmm.2023.3246229
View-based methods have achieved state-of-the-art
performance in 3D object retrieval. However, view-based methods
still encounter two major challenges. The first is how to leverage
the inter-view correlation to enhance view-level visual features.
The second is how to effectively fuse view-level features into a
discriminative global descriptor. Towards these two challenges,
we propose a multi-range view aggregation network (MRVANet)
with a vision transformer based feature fusion scheme
for 3D object retrieval. Unlike the existing methods which only
consider aggregating neighboring or adjacent views which could
bring in redundant information, we propose a multi-range view
aggregation module to enhance individual view representations
through view aggregation beyond only neighboring views but
also incorporate the views at different ranges. Furthermore,
to generate the global descriptor from view-level features, we
propose to employ the multi-head self-attention mechanism
introduced by vision transformer to fuse the view-level features.
Extensive experiments conducted on three public datasets including
ModelNet40, ShapeNet Core55 and MCB-A demonstrate
the superiority of the proposed network over the state-of-the-art
methods in 3D object retrieval.
This research / project is supported by the A*STAR - INDUSTRY ALIGNMENT FUND - INDUSTRY COLLABORATION PROJECTS (IAF-ICP)
Grant Reference no. : I2001E0073