Chen, S., Zhu, H., Chen, X., Lei, Y., Yu, G., & Chen, T. (2023, June). End-to-End 3D Dense Captioning with Vote2Cap-DETR. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr52729.2023.01070
Abstract:
3D dense captioning aims to generate multiple captions localized with their associated object regions. Existing methods follow a sophisticated “detect-then-describe” pipeline equipped with numerous hand-crafted components. However, these hand-crafted components would yield sub-optimal performance given cluttered object spatial and class distributions among different scenes. In this paper, we propose a simple-yet-effective transformer framework Vote2Cap-DETR based on recent popular DEtection TRansformer (DETR). Compared with prior arts, our framework has several appealing advantages: 1) Without resorting to numerous hand-crafted components, our method is based on a full transformer encoder-decoder architecture with a learnable vote query driven object decoder, and a caption decoder that produces the dense captions in a set-prediction manner. 2) In contrast to the two-stage scheme, our method can perform detection and captioning in one-stage. 3) Without bells and whistles, extensive experiments on two commonly used datasets, ScanRefer and Nr3D, demonstrate that our Vote2Cap-DETR surpasses current state-of-the-arts by 11.13% and 7.11% in CIDEr@0.5IoU, respectively. Codes will be released soon.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the A*STAR - MTC Programmatic
Grant Reference no. : A18A2b0046
This research / project is supported by the A*STAR - RobotHTPO
Grant Reference no. : C211518008
This research / project is supported by the Singapore Economic Development Board (EDB) - Space Technology Development Grant (STDP)
Grant Reference no. : S22-19016- STDP
This work is supported by National Natural Science Foundation of China (No. U1909207, 62071127, and 62276176), Shanghai Natural Science Foundation (No. 23ZR1402900), Zhejiang Lab Project (No.
2021KH0AB05)