A novel deep convolution neural network, named as Foveated Neural Network (FNN), is proposed to predict gaze on current frames in egocentric videos. The retina-like visual inputs from the region of interest on the previous frame get analysed and encoded. The fusion of the hidden representation of the previous frame and the feature maps of the current frame guides the gaze prediction process on the current frame. In order to simulate motions, we also include the dense optical flow between these adjacent frames as additional inputs to FNN. Experimental results show that FNN outperforms the state-of-the-art algorithms in the publicly available egocentric dataset. The analysis of FNN demonstrates that the hidden representations of the foveated visual input from the previous frame as well as the motion information between adjacent frames are efficient in improving gaze prediction performance in egocentric videos.
License type:
PublisherCopyrights
Funding Info:
Reverse Engineering Visual Intelligence for cognitive Enhancement (REVIVE) (1335H00098)
Description:
(c) 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.