Vision and Language Synergy for Rehearsal Free Continual Learning

Page view(s)

Checked on

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/23251

Title:

Vision and Language Synergy for Rehearsal Free Continual Learning

Journal Title:

International conference on Learning Representations 2025

DOI:

Publication URL:

https://iclr.cc/virtual/2025/poster/30681

Authors:

Muhammad Anwar Masum, Mahardhika Pratama, Savitha Ramasamy , Lin Liu, Habibullah H , Ryszard Kowalczyk

Keywords:

Publication Date:

28 April 2025

Citation:

Masum, M. A., Pratama, M., Ramasamy, S., Liu, L., Habibullah, H., & Kowalczyk, R. (2025). Vision and language synergy for rehearsal free continual learning [Poster presentation]. International Conference on Learning Representations (ICLR) 2025. https://iclr.cc/virtual/2025/poster/30681

Abstract:

The prompt-based approach has demonstrated its success for continual learning problems. However, it still suffers from catastrophic forgetting due to inter-task vector similarity and unfitted new components of previously learned tasks. On the other hand, the language-guided approach falls short of its full potential due to minimum utilized knowledge and participation in the prompt tuning process. To correct this problem, we propose a novel prompt-based structure and algorithm that incorporate 4 key concepts (1) language as input for prompt generation (2) task-wise generators (3) limiting matching descriptors search space via soft taskid prediction (4) generated prompt as auxiliary data. Our experimental analysis shows the superiority of our method to existing SOTAs in CIFAR100, ImageNetR, and CUB datasets with significant margins i.e. up to 30% final average accuracy, 24% cumulative average accuracy, 8% final forgetting measure, and 7% cumulative forgetting measure. Our historical analysis confirms our method successfully maintains the stability-plasticity trade-off in every task. Our robustness analysis shows the proposed method consistently achieves high performances in various prompt lengths, layer depths, and number of generators per task compared to the SOTAs. We provide a comprehensive theoretical analysis, and complete numerical results in appendix sections. The method code is available in https://github.com/anwarmaxsum/LEAPGEN for further study

License type:

Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

Funding Info:

There was no specific funding for the research done

Description:

URI:

https://oar.a-star.edu.sg/communities-collections/articles/23251

ISSN:

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
6684-vision-and-language-syner-1.pdf	975.30 KB	PDF	Open