<-上一篇/Previous Article 下一篇/Next Article->

[1]王梦溪,雷涛,姜由涛,等.基于空频协同的CNN-Transformer多器官分割网络[J].智能系统学报,2025,20(5):1266-1280.[doi:10.11992/tis.202409011]
　WANG Mengxi,LEI Tao,JIANG Youtao,et al.CNN-Transformer multiorgan segmentation network based on space-frequency collaboration[J].CAAI Transactions on Intelligent Systems,2025,20(5):1266-1280.[doi:10.11992/tis.202409011]

点击复制

基于空频协同的CNN-Transformer多器官分割网络

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 20 期数: 2025年第5期页码: 1266-1280 栏目: 人工智能院长论坛出版日期: 2025-09-05

Title:: CNN-Transformer multiorgan segmentation network based on space-frequency collaboration

作者:: 王梦溪^1,2, 雷涛^1,2, 姜由涛^1,2, 刘乐^1,2, 刘少庆^1,2, 王营博^1,2; 1. 陕西科技大学电子信息与人工智能学院, 陕西西安 710021;
2. 陕西科技大学陕西省人工智能联合实验室, 陕西西安 710021

Author(s):: WANG Mengxi^1,2, LEI Tao^1,2, JIANG Youtao^1,2, LIU Le^1,2, LIU Shaoqing^1,2, WANG Yingbo^1,2; 1. School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China;
2. Shaanxi Joint Laboratory of Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China

关键词:: 多器官分割; 空频协同; 多视图频域; 注意力机制; CNN; Transformer; 协同注意力; 局部–全局特征融合

Keywords:: multiorgan segmentation; space-frequency collaboration; multiview frequency domain; attention mechanism; CNN; Transformer; coattention; local-global feature fusion

分类号:: TP391.4

DOI:: 10.11992/tis.202409011

摘要:: 针对目前主流的医学多器官分割网络未能充分利用卷积神经网络(convolutional neural network, CNN)的局部细节提取优势以及Transformer的全局信息捕获潜力，并缺乏空频特征协同建模的问题，提出了一种基于空频协同的CNN-Transformer双分支编解码网络。该网络在局部分支中设计了空频协同注意力，使网络从频域和空间域捕获到更为丰富的局部细节信息；在全局分支设计了多视图频域提取器，该模块通过频谱层和自注意力层联合建模，提高了模型的空频特征协同建模能力和泛化性能。此外，设计了局部与全局特征融合模块，有效整合了CNN分支的局部细节信息和Transformer分支的全局信息，解决了网络无法兼顾局部细节和全局感受野的难题。实验结果表明，该架构克服了医学图像中器官边界模糊导致误分割的问题，有效提升了多器官分割精度，同时计算成本更低，参数量更少。

Abstract:: Current mainstream medical multi-organ segmentation networks fail to fully exploit the local detail extraction capabilities of convolutional neural network (CNN) and the global information capturing potential of Transformers. Additionally, they lack an effective mechanism for collaboration modeling of spatial and frequency domain features. To address these limitations, we propose a dual-branch encoder-decoder network based on CNN-Transformer with space-frequency collaboration. The network incorporates space-frequency collaborative attention in local branches, allowing the network to capture richer local details from both the frequency and spatial domains. A multi-view frequency domain extractor is designed in the global branch. This module improves the model’s ability to jointly model spatial and frequency features and its generalization performance through joint modeling of spectral layers and self-attention layers. In addition, a local and global feature fusion module is designed to effectively integrate the local detail information of the CNN branch and the global information of the Transformer branch, solving the problem that the network cannot balance local details and global receptive fields. Experimental results demonstrate that this architecture effectively addresses the challenges posed by blurred boundary segmentation in medical images, which often leads to mis-segmentation of organs, significantly enhancing the accuracy of multi-organ segmentation while simultaneously reducing the computational costs and the number of parameters required.

参考文献/References:: [1] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Assisted Intervention. Cham: Springer International Publishing, 2015: 234-241.
[2] OKTAY O, SCHLEMPER J, LE FOLGOC L, et al. Attention U-Net: learning where to look for the pancreas[EB/OL]. (2018-05-20)[2024-09-06]. https://arxiv.org/abs/1804.03999v3.
[3] LEI Tao, SUN Rui, DU Xiaogang, et al. SGU-Net: shape-guided ultralight network for abdominal image segmentation[J]. IEEE journal of biomedical and health informatics, 2023, 27(3): 1431-1442.
[4] RUAN Jiacheng, XIE Mingye, XIANG Suncheng, et al. MEW-UNet: Multi-axis representation learning in frequency domain for medical image segmentation[EB/OL]. (2022-10-25)[2024-09-06]. https://arxiv.org/abs/2210.14007v1.
[5] 刘万军, 姜岚, 曲海成, 等. 融合CNN与Transformer的MRI脑肿瘤图像分割[J]. 智能系统学报, 2024, 19(4): 1007-1015.
LIU Wanjun, JIANG Lan, QU Haicheng, et al. MRI brain tumor image segmentation by fusing CNN and Transformer[J]. CAAI transactions on intelligent systems, 2024, 19(4): 1007-1015.
[6] 张淑军, 彭中, 李辉. SAU-Net: 基于U-Net和自注意力机制的医学图像分割方法[J]. 电子学报, 2022, 50(10): 2433-2442.
ZHANG Shujun, PENG Zhong, LI Hui. SAU-Net: medical image segmentation method based on U-Net and self-attention[J]. Acta electronica sinica, 2022, 50(10): 2433-2442.
[7] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30: 5998-6008.
[8] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]//International Conference on Learning Representations. NewOrleans: ICLR, 2021: 1-22.
[9] CAO Hu, WANG Yueyue, CHEN J, et al. Swin-Unet: unet-like pure Transformer for medical image segmentation[C]//Computer Vision–ECCV 2022 Workshops. Cham: Springer Nature Switzerland, 2023: 205-218.
[10] KUSHNURE D T, TALBAR S N. MS-UNet: a multi-scale UNet with feature recalibration approach for automatic liver and tumor segmentation in CT images[J]. Computerized medical imaging and graphics, 2021, 89: 101885.
[11] HUANG Xiaohong, DENG Zhifang, LI Dandan, et al. MISSFormer: an effective medical image segmentation Transformer[EB/OL]. (2021-12-19)[2024-09-06]. https://arxiv.org/abs/2109.07162v2.
[12] 雷涛, 张峻铭, 杜晓刚, 等. 基于混洗特征编码与门控解码的医学图像分割网络[J]. 电子学报, 2024, 52(12): 4142-4152.
LEI Tao, ZHANG Junming, DU Xiaogang, et al. Medical image segmentation network based on shuffled feature encoding and gated decoding[J]. Acta electronica sinica, 2024, 52(12): 4142-4152.
[13] 周新民, 熊智谋, 史长发, 等. 基于多尺度卷积调制的医学图像分割[J]. 电子学报, 2024, 52(9): 3159-3171.
ZHOU Xinmin, XIONG Zhimou, SHI Changfa, et al. Medical image segmentation based on multi-scale convolution modulation[J]. Acta electronica sinica, 2024, 52(9): 3159-3171.
[14] 彭雨彤, 梁凤梅. 融合CNN和ViT的乳腺超声图像肿瘤分割方法[J]. 智能系统学报, 2024, 19(3): 556-564.
PENG Yutong, LIANG Fengmei. Tumor segmentation method for breast ultrasound images incorporating CNN and ViT[J]. CAAI transactions on intelligent systems, 2024, 19(3): 556-564.
[15] CHEN Jieneng, LU Yongyi, YU Qihang, et al. TransUNet: Transformers make strong encoders for medical image segmentation[EB/OL]. (2021-02-08)[2024-09-06]. https://arxiv.org/abs/2102.04306v1.
[16] XU Guoping, ZHANG Xuan, HE Xinwei, et al. LeViT-UNet: make faster encoders with Transformer for medical image segmentation[C]//Pattern Recognition and Computer Vision. Singapore: Springer Nature Singapore, 2023: 42-53.
[17] JHA A, KUMAR A, PANDE S, et al. MT-UNET: a novel U-Net based multi-task architecture for visual scene understanding[C]//2020 IEEE International Conference on Image Processing. Abu Dhabi: IEEE, 2020: 2191-2195.
[18] CHEN Yuanbin, WANG Tao, TANG Hui, et al. CoTrFuse: a novel framework by fusing CNN and Transformer for medical image segmentation[J]. Physics in medicine & biology, 2023, 68(17): 175027.
[19] HEIDARI M, KAZEROUNI A, SOLTANY M, et al. HiFormer: hierarchical multi-scale representations using Transformers for medical image segmentation[C]//2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2023: 6191-6201.
[20] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Computer Vision–ECCV 2018. Cham: Springer International Publishing, 2018: 3-19.
[21] 王婷, 宣士斌, 周建亭. 融合小波变换和编解码注意力的异常检测[J]. 计算机应用研究, 2023, 40(7): 2229-2234, 2240.
WANG Ting, XUAN Shibin, ZHOU Jianting. Anomaly detection fusing wavelet transform and encoder-decoder attention[J]. Application research of computers, 2023, 40(7): 2229-2234, 2240.
[22] LEE H, KIM H E, NAM H. SRM: a style-based recalibration module for convolutional neural networks[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 1854-1862.
[23] RAO K R, YIP P C. Discrete cosine transform - algorithms, advantages, applications[J]. Academic press, 2014, 34(4): 6315-6322.
[24] LEE-THORP J, AINSLIE J, ECKSTEIN I, et al. FNet: mixing tokens with Fourier transforms[EB/OL]. (2022-05-26)[2024-09-06]. https://arxiv.org/abs/2105.03824v4.
[25] PATRO B N, NAMBOODIRI V P, AGNEESWARAN V S. SpectFormer: frequency and attention is what you need in a vision Transformer[EB/OL]. (2023-04-14)[2024-09-06]. https://arxiv.org/abs/2304.06446v2.
[26] 赵亮, 刘晨, 王春艳. 位置信息增强的TransUnet医学图像分割方法[J]. 计算机科学与探索, 2025, 19(4): 976-988.
ZHAO Liang, LIU Chen, WANG Chunyan. Positional enhancement TransUnet for medical image segmentation[J]. Journal of frontiers of computer science and technology, 2025, 19(4): 976-988.
[27] 叶晋豫, 李娇, 邓红霞, 等. SwinEA: 融合边缘感知的医学图像分割网络[J]. 计算机工程与设计, 2024, 45(4): 1149-1156.
YE Jinyu, LI Jiao, DENG Hongxia, et al. SwinEA: Medical image segmentation network fused with edge-aware[J]. Computer engineering and design, 2024, 45(4): 1149-1156.
[28] AZAD R, ARIMOND R, AGHDAM E K, et al. DAE-former: dual attention-guided efficient Transformer for medical image segmentation[C]//Predictive Intelligence in Medicine. Cham: Springer Nature Switzerland, 2023: 83-95.
[29] YAN Xiangyi, TANG Hao, SUN Shanlin, et al. AFTer-UNet: axial fusion Transformer UNet for medical image segmentation[C]//2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2022: 3270-3280.
[30] XIE Yutong, ZHANG Jianpeng, SHEN Chunhua, et al. CoTr: efficiently bridging CNN and Transformer for 3D medical image segmentation[C]//Medical Image Computing and Computer Assisted Intervention. Cham: Springer International Publishing, 2021: 171-180.
[31] HONG Zhifang, CHEN Mingzhi, HU Weijie, et al. Dual encoder network with Transformer-CNN for multi-organ segmentation[J]. Medical & biological engineering & computing, 2023, 61(3): 661-671.
[32] LIU Yuzhao, HAN Liming, YAO Bin, et al. STA-Former: enhancing medical image segmentation with Shrinkage Triplet Attention in a hybrid CNN-Transformer model[J]. Signal, image and video processing, 2024, 18(2): 1901-1910.
[33] DIAKOGIANNIS F I, WALDNER F, CACCETTA P, et al. ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data[J]. ISPRS journal of photogrammetry and remote sensing, 2020, 162: 94-114.
[34] GAO Yunhe, ZHOU Mu, METAXAS D N. UTNet: a hybrid Transformer architecture for medical image segmentation[C]//Medical Image Computing and Computer Assisted Intervention. Cham: Springer International Publishing, 2021: 61-71.
[35] ISENSEE F, JAEGER P F, KOHL S A A, et al. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation[J]. Nature methods, 2021, 18(2): 203-211.
[36] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[37] IANDOLA F, MOSKEWICZ M, KARAYEV S, et al. DenseNet: implementing efficient ConvNet descriptor Pyramids[EB/OL]. (2014-04-07)[2024-09-06]. https://arxiv.org/abs/1404.1869v1.

备注/Memo

收稿日期:2024-9-6。
基金项目:国家自然科学基金项目(62271296,62201334); 陕西省创新能力支撑计划项目(2025RS-CXTD-012); 陕西高校青年创新团队项目(23JP014, 23JP022).
作者简介:王梦溪，硕士研究生，主要研究方向为计算机视觉、机器学习。E-mail：202007020606@sust.edu.cn。;雷涛，教授，博士生导师，陕西科技大学电子信息与人工智能学院副院长，IEEE高级会员。主要研究方向为计算机视觉、机器学习。发表学术论文90余篇。E-mail：leitao@sust.edu.cn。;姜由涛，硕士研究生，主要研究方向为计算机视觉、机器学习。E-mail：2819423992@qq.com。
通讯作者:雷涛. E-mail：leitao@sust.edu.cn

更新日期/Last Update: 2025-09-05

基于空频协同的CNN-Transformer多器官分割网络 PDF下载HTML

备注/Memo

基于空频协同的CNN-Transformer多器官分割网络

PDF下载 HTML