Shao, S., Pei, Z., Chen, W., Li, R., Liu, Z., & Li, Z. (2024). URCDC-Depth: Uncertainty Rectified Cross-Distillation With CutFlip for Monocular Depth Estimation. IEEE Transactions on Multimedia, 26, 3341–3353. https://doi.org/10.1109/tmm.2023.3310259
Abstract:
This work aims to estimate a high-quality depth
map from a single RGB image. Due to the lack of depth
clues, making full use of the long-range correlation and local
information is critical for accurate depth estimation. To this end,
we introduce an uncertainty rectified cross-distillation between
the Transformer and convolutional neural network (CNN) to
achieve a comprehensive depth estimator. Specifically, we utilize
the depth estimates from the Transformer branch and CNN
branch as pseudo labels to teach each other. At the same time,
the pixel-wise depth uncertainty is modeled to mitigate the
negative impact of noisy pseudo labels. To avoid the large capacity
gap induced by the strong Transformer branch deteriorating
the cross-distillation, we transfer the feature maps from the
Transformer to the CNN and develop coupling units to assist
the weak CNN branch in leveraging the transferred features.
Furthermore, we introduce CutFlip, a surprisingly simple yet
highly effective data augmentation technique, which forces the
model to focus on more valuable depth reasoning clues apart from
the vertical image position. Extensive experiments demonstrate
that our model, termed URCDC-Depth, exceeds in performance
previous state-of-the-art approaches on the KITTI, NYU-Depthv2
and SUN RGB-D datasets, with no additional computational
burden in the evaluation phase. The source code will be publicly
available upon acceptance. The source code is available at
https://github.com/ShuweiShao/URCDC-Depth.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the A*STAR - MTC Programmatic Funds grant
Grant Reference no. : M23L7b0021
This work was supported by the National Natural Science Foundation of
China under grant 61620106012, in part by the Key Research and Development Program of Zhejiang Province under Grant 2021C03050, in part by the Scientific Research Project of Agriculture and Social Development of Hangzhou under Grant 20212013B11, and in
part by the National Natural Science Foundation of China under Grants
61620106012 and 61573048.