[1]WANG Mengxi,LEI Tao,JIANG Youtao,et al.CNN-Transformer multiorgan segmentation network based on space-frequency collaboration[J].CAAI Transactions on Intelligent Systems,2025,20(5):1266-1280.[doi:10.11992/tis.202409011]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
20
Number of periods:
2025 5
Page number:
1266-1280
Column:
人工智能院长论坛
Public date:
2025-09-05
- Title:
-
CNN-Transformer multiorgan segmentation network based on space-frequency collaboration
- Author(s):
-
WANG Mengxi1; 2; LEI Tao1; 2; JIANG Youtao1; 2; LIU Le1; 2; LIU Shaoqing1; 2; WANG Yingbo1; 2
-
1. School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China;
2. Shaanxi Joint Laboratory of Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China
-
- Keywords:
-
multiorgan segmentation; space-frequency collaboration; multiview frequency domain; attention mechanism; CNN; Transformer; coattention; local-global feature fusion
- CLC:
-
TP391.4
- DOI:
-
10.11992/tis.202409011
- Abstract:
-
Current mainstream medical multi-organ segmentation networks fail to fully exploit the local detail extraction capabilities of convolutional neural network (CNN) and the global information capturing potential of Transformers. Additionally, they lack an effective mechanism for collaboration modeling of spatial and frequency domain features. To address these limitations, we propose a dual-branch encoder-decoder network based on CNN-Transformer with space-frequency collaboration. The network incorporates space-frequency collaborative attention in local branches, allowing the network to capture richer local details from both the frequency and spatial domains. A multi-view frequency domain extractor is designed in the global branch. This module improves the model’s ability to jointly model spatial and frequency features and its generalization performance through joint modeling of spectral layers and self-attention layers. In addition, a local and global feature fusion module is designed to effectively integrate the local detail information of the CNN branch and the global information of the Transformer branch, solving the problem that the network cannot balance local details and global receptive fields. Experimental results demonstrate that this architecture effectively addresses the challenges posed by blurred boundary segmentation in medical images, which often leads to mis-segmentation of organs, significantly enhancing the accuracy of multi-organ segmentation while simultaneously reducing the computational costs and the number of parameters required.