Xi, L., Chen, W., Wu, X., Liu, Z., & Li, Z. (2024). Online Unsupervised Video Object Segmentation via Contrastive Motion Clustering. IEEE Transactions on Circuits and Systems for Video Technology, 34(2), 995–1006. https://doi.org/10.1109/tcsvt.2023.3288878
Abstract:
Online unsupervised video object segmentation
(UVOS) uses the previous frames as its input to automatically
separate the primary object(s) from a streaming video without
using any further manual annotation. A major challenge is that
the model has no access to the future and must rely solely on
the history, i.e., the segmentation mask is predicted from the
current frame as soon as it is captured. In this work, a novel
contrastive motion clustering algorithm with an optical flow as its
input is proposed for the online UVOS by exploiting the common
fate principle that visual elements tend to be perceived as a
group if they possess the same motion pattern. We build a simple
and effective auto-encoder to iteratively summarize non-learnable
prototypical bases for the motion pattern, while the bases in
turn help learn the representation of the embedding network.
Further, a contrastive learning strategy based on a boundary
prior is developed to improve foreground and background feature
discrimination in the representation learning stage. The proposed
algorithm can be optimized on arbitrarily-scale data (i.e., frame,
clip, dataset) and performed in an online fashion. Experiments on
DAVIS16, FBMS, and SegTrackV2 datasets show that the accuracy
of our method surpasses the previous state-of-the-art (SoTA)
online UVOS method by a margin of 0.8%, 2.9%, and 1.1%,
respectively. Furthermore, by using an online deep subspace
clustering to tackle the motion grouping, our method is able
to achieve higher accuracy at 3× faster inference time compared
to SoTA online UVOS method, and making a good trade-off
between effectiveness and efficiency. Our code will be available
upon the acceptance of our paper
License type:
Publisher Copyright
Funding Info:
This work was supported in part by the National Natural Science Foundation of China under grant 51975029 and U1909215, the Key Research and Development Program of Zhejiang Province under Grant 2021C03050, the Scientific Research Project of Agriculture and Social Development of Hangzhou under Grant No. 2020ZDSJ0881, and in part by the National Natural Science Foundation of China under grant 61620106012 and 61573048