Self-Supervised Video Representation Learning by Video Incoherence Detection

Page view(s)

Checked on Aug 10, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/19154

Title:

Self-Supervised Video Representation Learning by Video Incoherence Detection

Journal Title:

IEEE Transactions on Cybernetics

DOI:

10.1109/TCYB.2023.3265393

Publication URL:

http://dx.doi.org/10.1109/tcyb.2023.3265393

Authors:

Haozhi Cao, Yuecong Xu, Kezhi Mao, Lihua Xie, Jianxiong Yin, Simon See, Qianwen Xu, Jianfei Yang

Keywords:

Electrical and Electronic Engineering, Computer Science Applications, Control and Systems Engineering, Software, Human-Computer Interaction, Information Systems

Publication Date:

20 April 2023

Citation:

Cao, H., Xu, Y., Mao, K., Xie, L., Yin, J., See, S., Xu, Q., & Yang, J. (2023). Self-Supervised Video Representation Learning by Video Incoherence Detection. IEEE Transactions on Cybernetics, 1–13. https://doi.org/10.1109/tcyb.2023.3265393

Abstract:

This article introduces a novel self-supervised method that leverages incoherence detection for video representation learning. It stems from the observation that the visual system of human beings can easily identify video incoherence based on their comprehensive understanding of videos. Specifically, we construct the incoherent clip by multiple subclips hierarchically sampled from the same raw video with various lengths of incoherence. The network is trained to learn the high-level representation by predicting the location and length of incoherence given the incoherent clip as input. Additionally, we introduce intravideo contrastive learning to maximize the mutual information between incoherent clips from the same raw video. We evaluate our proposed method through extensive experiments on action recognition and video retrieval using various backbone networks. Experiments show that our proposed method achieves remarkable performance across different backbone networks and different datasets compared to previous coherence-based methods.

License type:

Publisher Copyright

Funding Info:

This research / project is supported by the National Research Foundation, Singapore - Medium Sized Center for Advanced Robotics Technology Innovation
Grant Reference no. : NA

This research / project is supported by the Nanyang Technological University, Singapore - NTU Presidential Postdoctoral Fellowship
Grant Reference no. : NA

Description:

© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

URI:

https://oar.a-star.edu.sg/communities-collections/articles/19154

ISSN:

2168-2275
2168-2267

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
self-supervised-video-representation-learning-by-video-incoherence-detection-amend.pdf	4.90 MB	PDF	Open