Fang, S., Zhu, Q., Wu, Q., Wu, S., & Xie, S. (2024). Audio–Visual Segmentation based on robust principal component analysis. Expert Systems with Applications, 256, 124885. https://doi.org/10.1016/j.eswa.2024.124885
Abstract:
Audio–Visual Segmentation (AVS) aims to extract the sounding objects from a video. The current learning-based AVS methods are often supervised, which rely on specific task data annotations and expensive model training. Recognizing that the video background captured by a static camera is represented as a low-rank matrix, we introduce the non-convex robust principal component analysis into AVS task in this paper. This approach is unsupervised and only relies on input data patterns. Specifically, the proposed method decomposes each modality into the sum of two parts: the low-rank part that represents the background audio and visual information, and the sparse part that represents the foreground information. Furthermore, CUR decomposition is employed at each iteration to reduce the computational complexity in optimization. The experimental results also show that the proposed AVS outperforms the supervised methods on AVS-Bench Single-Source datasets.
License type:
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Funding Info:
There was no specific funding for the research done