Unsupervised Scale-Consistent Depth Learning from Video

Page view(s)

Checked on Aug 04, 2025

Please use this identifier to cite or link to this item: https://oar.a-star.edu.sg/communities-collections/articles/17765

Title:

Unsupervised Scale-Consistent Depth Learning from Video

Journal Title:

International Journal of Computer Vision

DOI:

10.1007/s11263-021-01484-6

Publication URL:

https://doi.org/10.1007/s11263-021-01484-6

Authors:

Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Zhichao Li, Le Zhang, Chunhua Shen, Ming-Ming Cheng, Ian Reid

Keywords:

Software, artificial intelligence, Computer Vision and Pattern Recognition

Publication Date:

18 June 2021

Citation:

Bian, J.-W., Zhan, H., Wang, N., Li, Z., Zhang, L., Shen, C., … Reid, I. (2021). Unsupervised Scale-Consistent Depth Learning from Video. International Journal of Computer Vision. doi:10.1007/s11263-021-01484-6

Abstract:

We propose a monocular depth estimation method SC-Depth, which requires only unlabelled videos for training and enables the scale-consistent prediction at inference time. Our contributions include: (i) we propose a geometry consistency loss, which penalizes the inconsistency of predicted depths between adjacent views; (ii) we propose a self-discovered mask to automatically localize moving objects that violate the underlying static scene assumption and cause noisy signals during training; (iii) we demonstrate the efficacy of each component with a detailed ablation study and show high-quality depth estimation results in both KITTI and NYUv2 datasets. Moreover, thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into ORB-SLAM2 system for more robust and accurate tracking. The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training. Finally, we provide several demos for qualitative evaluation. The source code is released on GitHub.

License type:

Publisher Copyright

Funding Info:

This work was in part supported by the Australian Centre of Excellence for Robotic Vision CE140100016, and the ARC Laureate Fellowship FL130100102 to Prof. Ian Reid. This work was supported by Major Project for New Generation of AI (No. 2018AAA0100403), Tianjin Natural Science Foundation (No. 18JCYBJC41300 and No. 18ZXZNGX00110), and NSFC (61922046) to Prof. Ming-Ming Cheng.

Description:

URI:

https://oar.a-star.edu.sg/communities-collections/articles/17765

ISSN:

0920-5691
1573-1405

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
sc-depth-journal-2.pdf	4.31 MB	PDF	Open