COSM2IC: Optimizing Real-Time Multi-Modal Instruction Comprehension

Page view(s)
Checked on Jan 22, 2023
COSM2IC: Optimizing Real-Time Multi-Modal Instruction Comprehension
COSM2IC: Optimizing Real-Time Multi-Modal Instruction Comprehension
Other Titles:
IEEE Robotics and Automation Letters
Publication Date:
28 July 2022
Weerakoon, D., Subbaraju, V., Tran, T., & Misra, A. (2022). COSM2IC: Optimizing Real-Time Multi-Modal Instruction Comprehension. IEEE Robotics and Automation Letters, 7(4), 10697–10704.
Supporting real-time, on-device execution of multi-modal referring instruction comprehension models is an important challenge to be tackled in embodied Human-Robot Interaction. However, state-of-the-art deep learning models are resource-intensive and unsuitable for real-time execution on embedded devices. While model compression can achieve a reduction in computational resources up to a certain point, further optimizations result in a severe drop in accuracy. To minimize this loss in accuracy, we propose the COSM2IC framework, with a lightweight Task Complexity Predictor, that uses multiple sensor inputs to assess the instructional complexity and thereby dynamically switch between a set of models of varying computational intensity such that computationally less demanding models are invoked whenever possible. To demonstrate the benefits of COSM2IC , we utilize a representative human-robot collaborative “table-top target acquisition” task, to curate a new multi-modal instruction dataset where a human issues instructions in a natural manner using a combination of visual, verbal, and gestural (pointing) cues. We show that COSM2IC achieves a 3-fold reduction in comprehension latency when compared to a baseline DNN model while suffering an accuracy loss of only ∼ 5%. When compared to state-of-the-art model compression methods, COSM2IC is able to achieve a further 30% reduction in latency and energy consumption for a comparable performance.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the A*STAR - AME Programmatic
Grant Reference no. : A18A2b0046

This research / project is supported by the National Research Foundation - NRF Investigatorship grant
Grant Reference no. : NRF-NRFI05-2019-0007

This research / project is supported by the Ministry of Education, Singapore - Academic Research Fund Tier-1 grant
Grant Reference no. : 19-C220-SMU-008
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Files uploaded:

File Size Format Action
ieeeconf-2-2.pdf 17.70 MB PDF Request a copy