VQA is an ambitious task aiming to answer any image-related
question. However, in reality, it is hard to build such a system
once for all since the needs of users are continuously updated,
and the system has to implement new functions. Thus, Continual
Learning (CL) ability is a must in developing advanced
VQA systems. Recently, a pioneer work split a VQA dataset
into disjoint answer sets to study this topic. However, CL
on VQA involves not only the expansion of label sets (new
Answer sets). It is crucial to study how to answer questions
when deploying VQA systems to new environments (new
Visual scenes) and how to answer questions requiring new
functions (new Question types). Thus, we propose CLOVE,
a benchmark for Continual Learning On Visual quEstion
answering, which contains scene- and function-incremental
settings for the two aforementioned CL scenarios. In terms
of methodology, the main difference between CL on VQA
and classification is that the former additionally involves expanding
and preventing forgetting of reasoning mechanisms,
while the latter focusing on class representation. Thus, we
propose a real-data-free replay-based method tailored for CL
on VQA, named Scene Graph as Prompt for Symbolic Replay.
Using a piece of scene graph as a prompt, it replays
pseudo scene graphs to represent the past image, along with
correlated QA pair. A unified VQA model is also proposed to
utilize the current and replayed data to enhance its QA ability.
Finally, experimental results reveal the challenges inCLOVE
and demonstrate the effectiveness of our method.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the National Research Foundation - AI Singapore Programme
Grant Reference no. : AISG2-RP-2021-025
This research / project is supported by the National Research Foundation - NRF Fellowship
Grant Reference no. : NRF-NRFF13-2021- 0008
This research / project is supported by the National University of Singapore - Start-Up Grant
Grant Reference no. : N.A