He, A., Wang, K., Li, T., Du, C., Xia, S., & Fu, H. (2023). H2Former: An Efficient Hierarchical Hybrid Transformer for Medical Image Segmentation. IEEE Transactions on Medical Imaging, 1–1. https://doi.org/10.1109/tmi.2023.3264513
Accurate medical image segmentation is of great signiﬁcance for computer aided diagnosis. Although methods based on convolutional neural networks (CNNs) have achieved good results, it is weak to model the long-range dependencies, which is very important for segmentation task to build global context dependencies. The Transformers can establish long-range dependencies among pixels by self-attention, providing a supplement to the local convolution. In addition, multi-scale feature fusion and feature selection are crucial for medical image segmentation tasks, which is ignored by Transformers. However, it is challenging to directly apply self-attention to CNNs due to the quadratic computational complexity for high-resolution feature maps. Therefore, to integrate the merits of CNNs, multi-scale channel attention and Transformers, we propose an efﬁcient hierarchical hybrid vision Transformer (H2Former) for medical image segmentation. With these merits, the model can be data-efﬁcient for limited medical data regime. The experimental results show that our approach exceeds previous Transformer, CNNs and hybrid methods on three 2D and two 3D medical image segmentation tasks. Moreover, it keeps computational efﬁciency in model parameters, FLOPs and inference time. For example, H2Former outperforms TransUNet by 2.29% in IoU score on KVASIR-SEG dataset with 30.77% parameters and 59.23% FLOPs.
This research / project is supported by the A*STAR - Career Development Fund
Grant Reference no. : C222812010
This research / project is supported by the AI Singapore - Tech Challenge Funding
Grant Reference no. : AISG2-TC-2021-003
This work is partially supported by the National Natural Science Foundation (62272248), CAAI-Huawei MindSpore Open Fund (CAAIXSJLJJ2021-025A)