Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation

Page view(s)

Checked on Aug 10, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/15723

Title:

Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation

Journal Title:

Computer Speech & Language

DOI:

10.1016/j.csl.2020.101095

Publication URL:

https://doi.org/10.1016/j.csl.2020.101095

Authors:

Hung Le, Doyen Sahoo, Nancy F. Chen, Steven C. H. Hoi

Keywords:

Publication Date:

29 March 2020

Citation:

Hung Le, Doyen Sahoo, Nancy F. Chen, Steven C.H. Hoi, Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation, Computer Speech & Language, Volume 63, 2020, 101095, ISSN 0885-2308, https://doi.org/10.1016/j.csl.2020.101095.

Abstract:

Dialogue System Technology Challenge (DSTC7), where we participated in the Audio Visual Scene-aware Dialogue System (AVSD) track. The AVSD track evaluates how dialogue systems understand video scenes and responds to users about the video visual and audio content. We propose a hierarchical attention approach on user queries, video caption, audio and visual features that contribute to improved evaluation results. We also apply a nonlinear feature fusion approach to combine the visual and audio features for better knowledge representation. Our proposed model shows superior performance in terms of both objective evaluation and human rating as compared to the baselines. In this extended work, we also provide a more extensive review of the related work, conduct additional experiments with word-level and context-level pretrained embeddings, and investigate different qualitative aspects of the generated responses.

License type:

http://creativecommons.org/licenses/by-nc-nd/4.0/

Funding Info:

The first author is supported by A*STAR Computing and Information Science scholarship (formerly A*STAR Graduate scholarship). The third author is supported by the Agency for Science, Technology and Research (A*STAR) under its AME Programmatic Funding Scheme (Project #A18A2b0046).

Description:

URI:

https://oar.a-star.edu.sg/communities-collections/articles/15723

ISSN:

0885-2308
1095-8363

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
post-print.pdf	892.50 KB	PDF	Open