Forecasting future action sequences with attention: a new approach to weakly supervised action forecasting

Page view(s)
29
Checked on May 15, 2024
Forecasting future action sequences with attention: a new approach to weakly supervised action forecasting
Title:
Forecasting future action sequences with attention: a new approach to weakly supervised action forecasting
Journal Title:
IEEE Transactions on Image Processing
Publication Date:
10 September 2020
Citation:
Abstract:
Future human action forecasting from partial observations of activities is an important problem in many practical applications such as assistive robotics, video surveillance and security. We present a method to forecast actions for the unseen future of the video using a neural machine translation technique that uses encoder-decoder architecture. The input to this model is the observed RGB video, and the objective is to forecast the correct future symbolic action sequence. Unlike prior methods that make action predictions for some unseen percentage of video one for each frame, we predict the complete action sequence that is required to accomplish the activity. We coin this task action sequence forecasting. To cater for two types of uncertainty in the future predictions, we propose a novel loss function. We show a combination of optimal transport and future uncertainty losses help to improve results. We evaluate our model in three challenging video datasets (Charades, MPII cooking and Breakfast). We extend our action sequence forecasting model to perform weakly supervised action forecasting on two challenging datasets, the Breakfast and the 50Salads. Specifically, we propose a model to predict actions of future unseen frames without using frame level annotations during training. Using Fisher vector features, our supervised model outperforms the state-of-the-art action forecasting model by 0.83% and 7.09% on the Breakfast and the 50Salads datasets respectively. Our weakly supervised model is only 0.6% behind the most recent state-of-the-art supervised model and obtains comparable results to other published fully supervised methods, and sometimes even outperforms them on the Breakfast dataset. Most interestingly, our weakly supervised model outperforms prior models by 1.04% leveraging on proposed weakly supervised architecture, and effective use of attention mechanism and loss functions.
License type:
http://creativecommons.org/licenses/by-nc-nd/4.0/
Funding Info:
This research/project is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG-RP-2019-010). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.
Description:
ISSN:
1941-0042
Files uploaded:

File Size Format Action
tip2020.pdf 1.47 MB PDF Open