Gaze Assisted Visual Grounding

Page view(s)
31
Checked on Nov 28, 2024
Gaze Assisted Visual Grounding
Title:
Gaze Assisted Visual Grounding
Journal Title:
Lecture Notes in Computer Science
Keywords:
Publication Date:
02 November 2021
Citation:
Johari, K., Tong, C. T. Z., Subbaraju, V., Kim, J.-J., & Tan, U.-X. (2021). Gaze Assisted Visual Grounding. Lecture Notes in Computer Science, 191–202. doi:10.1007/978-3-030-90525-5_17
Abstract:
There has been an increasing demand for visual grounding in various human-robot interaction applications. However, the accuracy is often limited by the size of the dataset that can be collected, which is often a challenge. Hence, this paper proposes using the natural implicit input modality of human gaze to assist and improve the visual grounding accuracy of human instructions to robotic agents. To demonstrate the capability, mechanical gear objects are used. To achieve that, we utilized a transformer-based text classifier and a small corpus to develop a baseline phrase grounding model. We evaluate this phrase grounding system with and without gaze input to demonstrate the improvement. Gaze information (obtained from Microsoft Hololens2) improves the performance accuracy from 26% to 65%, leading to more efficient human-robot collaboration and applicable to hands-free scenarios. This approach is data-efficient as it requires only a small training dataset to ground the natural language referring expressions.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the A*STAR - AME Programmatic
Grant Reference no. : A18A2b0046
Description:
This is a post-peer-review, pre-copyedit version of an article published in Social Robotics. The final authenticated version is available online at: http://dx.doi.org/10.1007/978-3-030-90525-5_17
ISSN:
0302-9743
Files uploaded:

File Size Format Action
icsr2021-097.pdf 7.25 MB PDF Open