<-Previous Article Next Article->

[1]WANG Mengxi,LEI Tao,JIANG Youtao,et al.CNN-Transformer multiorgan segmentation network based on space-frequency collaboration[J].CAAI Transactions on Intelligent Systems,2025,20(5):1266-1280.[doi:10.11992/tis.202409011]

Copy

CNN-Transformer multiorgan segmentation network based on space-frequency collaboration

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 20 Number of periods: 2025 5 Page number: 1266-1280 Column: 人工智能院长论坛 Public date: 2025-09-05

Title:: CNN-Transformer multiorgan segmentation network based on space-frequency collaboration

Author(s):: WANG Mengxi¹; 2; LEI Tao¹; 2; JIANG Youtao¹; 2; LIU Le¹; 2; LIU Shaoqing¹; 2; WANG Yingbo¹; 2; 1. School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China;
2. Shaanxi Joint Laboratory of Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China

Keywords:: multiorgan segmentation; space-frequency collaboration; multiview frequency domain; attention mechanism; CNN; Transformer; coattention; local-global feature fusion

CLC:: TP391.4

DOI:: 10.11992/tis.202409011

Abstract:: Current mainstream medical multi-organ segmentation networks fail to fully exploit the local detail extraction capabilities of convolutional neural network (CNN) and the global information capturing potential of Transformers. Additionally, they lack an effective mechanism for collaboration modeling of spatial and frequency domain features. To address these limitations, we propose a dual-branch encoder-decoder network based on CNN-Transformer with space-frequency collaboration. The network incorporates space-frequency collaborative attention in local branches, allowing the network to capture richer local details from both the frequency and spatial domains. A multi-view frequency domain extractor is designed in the global branch. This module improves the model’s ability to jointly model spatial and frequency features and its generalization performance through joint modeling of spectral layers and self-attention layers. In addition, a local and global feature fusion module is designed to effectively integrate the local detail information of the CNN branch and the global information of the Transformer branch, solving the problem that the network cannot balance local details and global receptive fields. Experimental results demonstrate that this architecture effectively addresses the challenges posed by blurred boundary segmentation in medical images, which often leads to mis-segmentation of organs, significantly enhancing the accuracy of multi-organ segmentation while simultaneously reducing the computational costs and the number of parameters required.

References:: [1] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Assisted Intervention. Cham: Springer International Publishing, 2015: 234-241.
[2] OKTAY O, SCHLEMPER J, LE FOLGOC L, et al. Attention U-Net: learning where to look for the pancreas[EB/OL]. (2018-05-20)[2024-09-06]. https://arxiv.org/abs/1804.03999v3.
[3] LEI Tao, SUN Rui, DU Xiaogang, et al. SGU-Net: shape-guided ultralight network for abdominal image segmentation[J]. IEEE journal of biomedical and health informatics, 2023, 27(3): 1431-1442.
[4] RUAN Jiacheng, XIE Mingye, XIANG Suncheng, et al. MEW-UNet: Multi-axis representation learning in frequency domain for medical image segmentation[EB/OL]. (2022-10-25)[2024-09-06]. https://arxiv.org/abs/2210.14007v1.
[5] 刘万军, 姜岚, 曲海成, 等. 融合CNN与Transformer的MRI脑肿瘤图像分割[J]. 智能系统学报, 2024, 19(4): 1007-1015.
LIU Wanjun, JIANG Lan, QU Haicheng, et al. MRI brain tumor image segmentation by fusing CNN and Transformer[J]. CAAI transactions on intelligent systems, 2024, 19(4): 1007-1015.
[6] 张淑军, 彭中, 李辉. SAU-Net: 基于U-Net和自注意力机制的医学图像分割方法[J]. 电子学报, 2022, 50(10): 2433-2442.
ZHANG Shujun, PENG Zhong, LI Hui. SAU-Net: medical image segmentation method based on U-Net and self-attention[J]. Acta electronica sinica, 2022, 50(10): 2433-2442.
[7] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30: 5998-6008.
[8] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]//International Conference on Learning Representations. NewOrleans: ICLR, 2021: 1-22.
[9] CAO Hu, WANG Yueyue, CHEN J, et al. Swin-Unet: unet-like pure Transformer for medical image segmentation[C]//Computer Vision–ECCV 2022 Workshops. Cham: Springer Nature Switzerland, 2023: 205-218.
[10] KUSHNURE D T, TALBAR S N. MS-UNet: a multi-scale UNet with feature recalibration approach for automatic liver and tumor segmentation in CT images[J]. Computerized medical imaging and graphics, 2021, 89: 101885.
[11] HUANG Xiaohong, DENG Zhifang, LI Dandan, et al. MISSFormer: an effective medical image segmentation Transformer[EB/OL]. (2021-12-19)[2024-09-06]. https://arxiv.org/abs/2109.07162v2.
[12] 雷涛, 张峻铭, 杜晓刚, 等. 基于混洗特征编码与门控解码的医学图像分割网络[J]. 电子学报, 2024, 52(12): 4142-4152.
LEI Tao, ZHANG Junming, DU Xiaogang, et al. Medical image segmentation network based on shuffled feature encoding and gated decoding[J]. Acta electronica sinica, 2024, 52(12): 4142-4152.
[13] 周新民, 熊智谋, 史长发, 等. 基于多尺度卷积调制的医学图像分割[J]. 电子学报, 2024, 52(9): 3159-3171.
ZHOU Xinmin, XIONG Zhimou, SHI Changfa, et al. Medical image segmentation based on multi-scale convolution modulation[J]. Acta electronica sinica, 2024, 52(9): 3159-3171.
[14] 彭雨彤, 梁凤梅. 融合CNN和ViT的乳腺超声图像肿瘤分割方法[J]. 智能系统学报, 2024, 19(3): 556-564.
PENG Yutong, LIANG Fengmei. Tumor segmentation method for breast ultrasound images incorporating CNN and ViT[J]. CAAI transactions on intelligent systems, 2024, 19(3): 556-564.
[15] CHEN Jieneng, LU Yongyi, YU Qihang, et al. TransUNet: Transformers make strong encoders for medical image segmentation[EB/OL]. (2021-02-08)[2024-09-06]. https://arxiv.org/abs/2102.04306v1.
[16] XU Guoping, ZHANG Xuan, HE Xinwei, et al. LeViT-UNet: make faster encoders with Transformer for medical image segmentation[C]//Pattern Recognition and Computer Vision. Singapore: Springer Nature Singapore, 2023: 42-53.
[17] JHA A, KUMAR A, PANDE S, et al. MT-UNET: a novel U-Net based multi-task architecture for visual scene understanding[C]//2020 IEEE International Conference on Image Processing. Abu Dhabi: IEEE, 2020: 2191-2195.
[18] CHEN Yuanbin, WANG Tao, TANG Hui, et al. CoTrFuse: a novel framework by fusing CNN and Transformer for medical image segmentation[J]. Physics in medicine & biology, 2023, 68(17): 175027.
[19] HEIDARI M, KAZEROUNI A, SOLTANY M, et al. HiFormer: hierarchical multi-scale representations using Transformers for medical image segmentation[C]//2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2023: 6191-6201.
[20] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Computer Vision–ECCV 2018. Cham: Springer International Publishing, 2018: 3-19.
[21] 王婷, 宣士斌, 周建亭. 融合小波变换和编解码注意力的异常检测[J]. 计算机应用研究, 2023, 40(7): 2229-2234, 2240.
WANG Ting, XUAN Shibin, ZHOU Jianting. Anomaly detection fusing wavelet transform and encoder-decoder attention[J]. Application research of computers, 2023, 40(7): 2229-2234, 2240.
[22] LEE H, KIM H E, NAM H. SRM: a style-based recalibration module for convolutional neural networks[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 1854-1862.
[23] RAO K R, YIP P C. Discrete cosine transform - algorithms, advantages, applications[J]. Academic press, 2014, 34(4): 6315-6322.
[24] LEE-THORP J, AINSLIE J, ECKSTEIN I, et al. FNet: mixing tokens with Fourier transforms[EB/OL]. (2022-05-26)[2024-09-06]. https://arxiv.org/abs/2105.03824v4.
[25] PATRO B N, NAMBOODIRI V P, AGNEESWARAN V S. SpectFormer: frequency and attention is what you need in a vision Transformer[EB/OL]. (2023-04-14)[2024-09-06]. https://arxiv.org/abs/2304.06446v2.
[26] 赵亮, 刘晨, 王春艳. 位置信息增强的TransUnet医学图像分割方法[J]. 计算机科学与探索, 2025, 19(4): 976-988.
ZHAO Liang, LIU Chen, WANG Chunyan. Positional enhancement TransUnet for medical image segmentation[J]. Journal of frontiers of computer science and technology, 2025, 19(4): 976-988.
[27] 叶晋豫, 李娇, 邓红霞, 等. SwinEA: 融合边缘感知的医学图像分割网络[J]. 计算机工程与设计, 2024, 45(4): 1149-1156.
YE Jinyu, LI Jiao, DENG Hongxia, et al. SwinEA: Medical image segmentation network fused with edge-aware[J]. Computer engineering and design, 2024, 45(4): 1149-1156.
[28] AZAD R, ARIMOND R, AGHDAM E K, et al. DAE-former: dual attention-guided efficient Transformer for medical image segmentation[C]//Predictive Intelligence in Medicine. Cham: Springer Nature Switzerland, 2023: 83-95.
[29] YAN Xiangyi, TANG Hao, SUN Shanlin, et al. AFTer-UNet: axial fusion Transformer UNet for medical image segmentation[C]//2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2022: 3270-3280.
[30] XIE Yutong, ZHANG Jianpeng, SHEN Chunhua, et al. CoTr: efficiently bridging CNN and Transformer for 3D medical image segmentation[C]//Medical Image Computing and Computer Assisted Intervention. Cham: Springer International Publishing, 2021: 171-180.
[31] HONG Zhifang, CHEN Mingzhi, HU Weijie, et al. Dual encoder network with Transformer-CNN for multi-organ segmentation[J]. Medical & biological engineering & computing, 2023, 61(3): 661-671.
[32] LIU Yuzhao, HAN Liming, YAO Bin, et al. STA-Former: enhancing medical image segmentation with Shrinkage Triplet Attention in a hybrid CNN-Transformer model[J]. Signal, image and video processing, 2024, 18(2): 1901-1910.
[33] DIAKOGIANNIS F I, WALDNER F, CACCETTA P, et al. ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data[J]. ISPRS journal of photogrammetry and remote sensing, 2020, 162: 94-114.
[34] GAO Yunhe, ZHOU Mu, METAXAS D N. UTNet: a hybrid Transformer architecture for medical image segmentation[C]//Medical Image Computing and Computer Assisted Intervention. Cham: Springer International Publishing, 2021: 61-71.
[35] ISENSEE F, JAEGER P F, KOHL S A A, et al. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation[J]. Nature methods, 2021, 18(2): 203-211.
[36] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[37] IANDOLA F, MOSKEWICZ M, KARAYEV S, et al. DenseNet: implementing efficient ConvNet descriptor Pyramids[EB/OL]. (2014-04-07)[2024-09-06]. https://arxiv.org/abs/1404.1869v1.

Similar References:

Memo

Last Update: 2025-09-05

CNN-Transformer multiorgan segmentation network based on space-frequency collaboration PDF DownloadHTML

Memo

CNN-Transformer multiorgan segmentation network based on space-frequency collaboration

PDF Download HTML