Learning Local and Global Temporal Contexts for Video Semantic Segmentation

Page view(s)
25
Checked on Nov 13, 2024
Learning Local and Global Temporal Contexts for Video Semantic Segmentation
Title:
Learning Local and Global Temporal Contexts for Video Semantic Segmentation
Journal Title:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Keywords:
Publication Date:
10 April 2024
Citation:
Sun, G., Liu, Y., Ding, H., Wu, M., & Van Gool, L. (2024). Learning Local and Global Temporal Contexts for Video Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–15. https://doi.org/10.1109/tpami.2024.3387326
Abstract:
Contextual information plays a core role for video semantic segmentation (VSS). This paper summarizes contexts for VSS in two-fold: local temporal contexts (LTC) which define the contexts from neighboring frames, and global temporal contexts (GTC) which represent the contexts from the whole video. As for LTC, it includes static and motional contexts, corresponding to static and moving content in neighboring frames, respectively. Previously, both static and motional contexts have been studied. However, there is no research about simultaneously learning static and motional contexts (highly complementary). Hence, we propose a Coarse-to-Fine Feature Mining (CFFM) technique to learn a unified presentation of LTC. CFFM contains two parts: Coarse-to-Fine Feature Assembling (CFFA) and Cross-frame Feature Mining (CFM). CFFA abstracts static and motional contexts, and CFM mines useful information from nearby frames to enhance target features. To further exploit more temporal contexts, we propose CFFM++ by additionally learning GTC from the whole video. Specifically, we uniformly sample certain frames from the video and extract global contextual prototypes by k-means. The information within those prototypes is mined by CFM to refine target features. Experimental results on popular benchmarks demonstrate that CFFM and CFFM++ perform favorably against state-of-the-art methods. The code is available at https://github.com/GuoleiSun/VSS-CFFM.
License type:
Publisher Copyright
Funding Info:
There was no specific funding for the research done
Description:
© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
ISSN:
2160-9292
1939-3539
0162-8828
Files uploaded:

File Size Format Action
22pami-cvpr22-extension.pdf 3.16 MB PDF Request a copy