Weerakoon, D., Subbaraju, V., Tran, T., & Misra, A. (2022). SoftSkip. Proceedings of the 30th ACM International Conference on Multimedia. https://doi.org/10.1145/3503161.3548432
Supporting real-time referring expression comprehension (REC) on pervasive devices is an important capability for human-AI collaborative tasks. Model pruning techniques, applied to DNN models, can enable real-time execution even on resource-constrained devices. However, existing pruning strategies are designed principally for uni-modal applications, and suffer a significant loss of accuracy when applied to REC tasks that require fusion of textual and visual inputs. We thus present a multi-modal pruning model, LGMDP, which uses language as a pivot to dynamically and judiciously select the relevant computational blocks that need to be executed. LGMDP also introduces a new SoftSkip mechanism, whereby 'skipped' visual scales are not completely eliminated but approximated with minimal additional computation. Experimental evaluation, using 3 benchmark REC datasets and an embedded device implementation, shows that LGMDP can achieve 33% latency savings, with an accuracy loss 0.5% - 2%.
This research / project is supported by the A*STAR - AME Programmatic
Grant Reference no. : A18A2b0046
This research / project is supported by the National Research Foundation - NRF Investigatorship
Grant Reference no. : NRF-NRFI05-2019-0007
This research / project is supported by the Ministry of Education - AcRF Tier-1 grant
Grant Reference no. : 19-C220-SMU-008