Compositional visual question answering requires reasoning
over both semantic and geometry object relations. We propose a novel
tiered reasoning method that dynamically selects object level candidates
based on language representations and generates robust pairwise rela-
tions within the selected candidate objects. The proposed tiered relation
reasoning method can be compatible with the majority of the existing
visual reasoning frameworks, leading to signi cant performance improve-
ment with very little extra computational cost. Moreover, we propose a
policy network that decides the appropriate reasoning steps based on
question complexity and current reasoning status. In experiments, our
model achieves state-of-the-art performance on two VQA datasets.
License type:
Funding Info:
This research was supported by the National Research Foundation Singapore under its AI Singapore Programme (Award Number: AISG-RP-2018-003) and the MOE Tier-1 research grants: RG28/18 (S) and RG22/19 (S). F. Lv’s participation is supported by National Natural Science Foundation of China (No. 11829101 and 11931014).