Task-Oriented Multi-Modal Question Answering For Collaborative Applications

Task-Oriented Multi-Modal Question Answering For Collaborative Applications
Title:
Task-Oriented Multi-Modal Question Answering For Collaborative Applications
Other Titles:
2020 IEEE International Conference on Image Processing (ICIP)
Publication Date:
30 September 2020
Citation:
H. L. Tan et al., "Task-Oriented Multi-Modal Question Answering For Collaborative Applications," 2020 IEEE International Conference on Image Processing (ICIP), 2020, pp. 1426-1430, doi: 10.1109/ICIP40778.2020.9190659.
Abstract:
Cobots that can work in human workspaces and adapt to human need to understand and respond to human’s inquiry and instruction. In this paper, we propose new question answering (QA) task and dataset for human-robot collaboration on task-oriented operation, i.e., task-oriented collaborative QA (TCQA). Differing from conventional video QA for answering questions about what happened in video clips constrained by scripts and subtitles, TC-QA aims to share common ground for task-oriented operation through question answering. We propose an open-end (OE) format of answer with text reply, image with annotated related objects, and video with operation duration to guide operation execution. Designed for grounding, the TC-QA dataset comprises query videos and questions to seek acknowledgement, correction, attention to task-related objects, and information on objects or operation. Due to the flexibility of real-world task with limited training sample, we propose and evaluate a baseline method based on a hybrid approach. The hybrid approach employs deep learning methods for object detection, hand detection and gesture recognition, and symbolic reasoning to ground question on observation for providing the answer. Our experiments show that the hybrid method is effective for the TC-QA task.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the Agency for Science, Technology and Research - AME Programmatic Funding Scheme
Grant Reference no. : A18A2b0046

This research / project is supported by the National Research Foundation, Singapore - NRF-ISF Joint Call
Grant Reference no. : NRF2015-NRF-ISF001-2541
Description:
© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
ISSN:
2381-8549
1522-4880
ISBN:
978-1-7281-6395-6
978-1-7281-6394-9
978-1-7281-6396-3
Files uploaded:

File Size Format Action
20200522-icip-cameraready-task-oriented-multi-modal-question-answering.pdf 277.24 KB PDF Request a copy