Online Unsupervised Video Object Segmentation via Contrastive Motion Clustering

Page view(s)
54
Checked on Nov 12, 2024
Online Unsupervised Video Object Segmentation via Contrastive Motion Clustering
Title:
Online Unsupervised Video Object Segmentation via Contrastive Motion Clustering
Journal Title:
IEEE Transactions on Circuits and Systems for Video Technology
Publication Date:
23 June 2023
Citation:
Xi, L., Chen, W., Wu, X., Liu, Z., & Li, Z. (2024). Online Unsupervised Video Object Segmentation via Contrastive Motion Clustering. IEEE Transactions on Circuits and Systems for Video Technology, 34(2), 995–1006. https://doi.org/10.1109/tcsvt.2023.3288878
Abstract:
Online unsupervised video object segmentation (UVOS) uses the previous frames as its input to automatically separate the primary object(s) from a streaming video without using any further manual annotation. A major challenge is that the model has no access to the future and must rely solely on the history, i.e., the segmentation mask is predicted from the current frame as soon as it is captured. In this work, a novel contrastive motion clustering algorithm with an optical flow as its input is proposed for the online UVOS by exploiting the common fate principle that visual elements tend to be perceived as a group if they possess the same motion pattern. We build a simple and effective auto-encoder to iteratively summarize non-learnable prototypical bases for the motion pattern, while the bases in turn help learn the representation of the embedding network. Further, a contrastive learning strategy based on a boundary prior is developed to improve foreground and background feature discrimination in the representation learning stage. The proposed algorithm can be optimized on arbitrarily-scale data (i.e., frame, clip, dataset) and performed in an online fashion. Experiments on DAVIS16, FBMS, and SegTrackV2 datasets show that the accuracy of our method surpasses the previous state-of-the-art (SoTA) online UVOS method by a margin of 0.8%, 2.9%, and 1.1%, respectively. Furthermore, by using an online deep subspace clustering to tackle the motion grouping, our method is able to achieve higher accuracy at 3× faster inference time compared to SoTA online UVOS method, and making a good trade-off between effectiveness and efficiency. Our code will be available upon the acceptance of our paper
License type:
Publisher Copyright
Funding Info:
This work was supported in part by the National Natural Science Foundation of China under grant 51975029 and U1909215, the Key Research and Development Program of Zhejiang Province under Grant 2021C03050, the Scientific Research Project of Agriculture and Social Development of Hangzhou under Grant No. 2020ZDSJ0881, and in part by the National Natural Science Foundation of China under grant 61620106012 and 61573048
Description:
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
ISSN:
1558-2205
1051-8215
Files uploaded:

File Size Format Action
manuscript-20230620.pdf 5.71 MB PDF Request a copy