[1]王梦溪,雷涛,姜由涛,等.基于空频协同的CNN-Transformer多器官分割网络[J].智能系统学报,2025,20(5):1266-1280.[doi:10.11992/tis.202409011]
WANG Mengxi,LEI Tao,JIANG Youtao,et al.CNN-Transformer multiorgan segmentation network based on space-frequency collaboration[J].CAAI Transactions on Intelligent Systems,2025,20(5):1266-1280.[doi:10.11992/tis.202409011]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
20
期数:
2025年第5期
页码:
1266-1280
栏目:
人工智能院长论坛
出版日期:
2025-09-05
- Title:
-
CNN-Transformer multiorgan segmentation network based on space-frequency collaboration
- 作者:
-
王梦溪1,2, 雷涛1,2, 姜由涛1,2, 刘乐1,2, 刘少庆1,2, 王营博1,2
-
1. 陕西科技大学 电子信息与人工智能学院, 陕西 西安 710021;
2. 陕西科技大学 陕西省人工智能联合实验室, 陕西 西安 710021
- Author(s):
-
WANG Mengxi1,2, LEI Tao1,2, JIANG Youtao1,2, LIU Le1,2, LIU Shaoqing1,2, WANG Yingbo1,2
-
1. School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China;
2. Shaanxi Joint Laboratory of Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China
-
- 关键词:
-
多器官分割; 空频协同; 多视图频域; 注意力机制; CNN; Transformer; 协同注意力; 局部–全局特征融合
- Keywords:
-
multiorgan segmentation; space-frequency collaboration; multiview frequency domain; attention mechanism; CNN; Transformer; coattention; local-global feature fusion
- 分类号:
-
TP391.4
- DOI:
-
10.11992/tis.202409011
- 摘要:
-
针对目前主流的医学多器官分割网络未能充分利用卷积神经网络(convolutional neural network, CNN)的局部细节提取优势以及Transformer的全局信息捕获潜力,并缺乏空频特征协同建模的问题,提出了一种基于空频协同的CNN-Transformer双分支编解码网络。该网络在局部分支中设计了空频协同注意力,使网络从频域和空间域捕获到更为丰富的局部细节信息;在全局分支设计了多视图频域提取器,该模块通过频谱层和自注意力层联合建模,提高了模型的空频特征协同建模能力和泛化性能。此外,设计了局部与全局特征融合模块,有效整合了CNN分支的局部细节信息和Transformer分支的全局信息,解决了网络无法兼顾局部细节和全局感受野的难题。实验结果表明,该架构克服了医学图像中器官边界模糊导致误分割的问题,有效提升了多器官分割精度,同时计算成本更低,参数量更少。
- Abstract:
-
Current mainstream medical multi-organ segmentation networks fail to fully exploit the local detail extraction capabilities of convolutional neural network (CNN) and the global information capturing potential of Transformers. Additionally, they lack an effective mechanism for collaboration modeling of spatial and frequency domain features. To address these limitations, we propose a dual-branch encoder-decoder network based on CNN-Transformer with space-frequency collaboration. The network incorporates space-frequency collaborative attention in local branches, allowing the network to capture richer local details from both the frequency and spatial domains. A multi-view frequency domain extractor is designed in the global branch. This module improves the model’s ability to jointly model spatial and frequency features and its generalization performance through joint modeling of spectral layers and self-attention layers. In addition, a local and global feature fusion module is designed to effectively integrate the local detail information of the CNN branch and the global information of the Transformer branch, solving the problem that the network cannot balance local details and global receptive fields. Experimental results demonstrate that this architecture effectively addresses the challenges posed by blurred boundary segmentation in medical images, which often leads to mis-segmentation of organs, significantly enhancing the accuracy of multi-organ segmentation while simultaneously reducing the computational costs and the number of parameters required.
备注/Memo
收稿日期:2024-9-6。
基金项目:国家自然科学基金项目(62271296,62201334); 陕西省创新能力支撑计划项目(2025RS-CXTD-012); 陕西高校青年创新团队项目(23JP014, 23JP022).
作者简介:王梦溪,硕士研究生,主要研究方向为计算机视觉、机器学习。E-mail:202007020606@sust.edu.cn。;雷涛,教授,博士生导师,陕西科技大学电子信息与人工智能学院副院长,IEEE高级会员。主要研究方向为计算机视觉、机器学习。发表学术论文90余篇。E-mail:leitao@sust.edu.cn。;姜由涛,硕士研究生,主要研究方向为计算机视觉、机器学习。E-mail:2819423992@qq.com。
通讯作者:雷涛. E-mail:leitao@sust.edu.cn
更新日期/Last Update:
2025-09-05