Keyword-Aware Relative Spatio-Temporal Graph Networks for Video Question Answering

Page view(s)
23
Checked on Oct 25, 2024
Keyword-Aware Relative Spatio-Temporal Graph Networks for Video Question Answering
Title:
Keyword-Aware Relative Spatio-Temporal Graph Networks for Video Question Answering
Journal Title:
IEEE Transactions on Multimedia
Keywords:
Publication Date:
20 December 2023
Citation:
Cheng, Y., Fan, H., Lin, D., Sun, Y., Kankanhalli, M., & Lim, J.-H. (2024). Keyword-Aware Relative Spatio-Temporal Graph Networks for Video Question Answering. IEEE Transactions on Multimedia, 26, 6131–6141. https://doi.org/10.1109/tmm.2023.3345172
Abstract:
The main challenge in video question answering (VideoQA) is to capture and understand the complex spatial and temporal relations between objects based on given questions. Existing graph-based methods for VideoQA usually ignore keywords in questions and employ a simple graph to aggregate features without considering relative relations between objects, which may lead to inferior performance. In this paper, we propose a Keyword-aware Relative Spatio-Temporal (KRST) graph network for VideoQA. First, to make question features aware of keywords, we employ an attention mechanism to assign high weights to keywords during question encoding. The keyword-aware question features are then used to guide video graph construction. Second, because relations are relative, we integrate the relative relation modeling to better capture the spatio-temporal dynamics among object nodes. Moreover, we disentangle the spatio-temporal reasoning into an object-level spatial graph and a frame-level temporal graph, which reduces the impact of spatial and temporal relation reasoning on each other. Extensive experiments on the TGIF-QA, MSVD-QA and MSRVTT-QA datasets demonstrate the superiority of our KRST over multiple state-of-the-art methods.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the Agency for Science, Technology and Re- search (A*STAR) - AME Programmatic Funding Scheme
Grant Reference no. : A18A2b0046
Description:
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
ISSN:
1941-0077
1520-9210
Files uploaded:

File Size Format Action
final-version.pdf 2.38 MB PDF Request a copy