Multi-Modal Continual Test-Time Adaptation for 3D Semantic Segmentation

Page view(s)

Checked on Sep 12, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/22171

Title:

Multi-Modal Continual Test-Time Adaptation for 3D Semantic Segmentation

Journal Title:

2023 IEEE/CVF International Conference on Computer Vision (ICCV)

DOI:

10.1109/ICCV51070.2023.01724

Publication URL:

https://doi.org/10.1109/iccv51070.2023.01724

Authors:

Haozhi Cao, Yuecong Xu, Jianfei Yang, Pengyu Yin, Shenghai Yuan, Lihua Xie

Keywords:

Publication Date:

15 January 2024

Citation:

Cao, H., Xu, Y., Yang, J., Yin, P., Yuan, S., & Xie, L. (2023). Multi-Modal Continual Test-Time Adaptation for 3D Semantic Segmentation. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 18763–18773. https://doi.org/10.1109/iccv51070.2023.01724

Abstract:

Continual Test-Time Adaptation (CTTA) generalizes conventional Test-Time Adaptation (TTA) by assuming that the target domain is dynamic over time rather than stationary. In this paper, we explore Multi-Modal Continual Test-Time Adaptation (MM-CTTA) as a new extension of CTTA for 3D semantic segmentation. The key to MM-CTTA is to adaptively attend to the reliable modality while avoiding catastrophic forgetting during continual domain shifts, which is out of the capability of previous TTA or CTTA methods. To fulfill this gap, we propose an MM-CTTA method called Continual Cross-Modal Adaptive Clustering (CoMAC) that addresses this task from two perspectives. On one hand, we propose an adaptive dual-stage mechanism to generate reliable cross-modal predictions by attending to the reliable modality based on the class-wise feature-centroid distance in the latent space. On the other hand, to perform test-time adaptation without catastrophic forgetting, we design class-wise momentum queues that capture confident target features for adaptation while stochastically restoring pseudo-source features to revisit source knowledge. We further introduce two new benchmarks to facilitate the exploration of MM-CTTA in the future. Our experimental results show that our method achieves state-of-the-art performance on both benchmarks.

License type:

Publisher Copyright

Funding Info:

This research / project is supported by the National Research Foundation, Singapore - AI Singapore Programme
Grant Reference no. : AISG2-RP-2021-027

This research / project is supported by the National Research Foundation, Singapore - Medium Sized Center for Advanced Robotics Technology Innovation
Grant Reference no. : NA

Description:

© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

URI:

https://oar.a-star.edu.sg/communities-collections/articles/22171

ISSN:

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
multi-modal-continual.pdf	10.78 MB	PDF	Request a copy