[1]谭立玮,张淑军,韩琪,等.面向医学影像报告生成的门归一化编解码网络[J].智能系统学报,2024,19(2):411-419.[doi:10.11992/tis.202207013]
TAN Liwei,ZHANG Shujun,HAN Qi,et al.Gate normalized encoder-decoder network for medical image report generation[J].CAAI Transactions on Intelligent Systems,2024,19(2):411-419.[doi:10.11992/tis.202207013]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
19
期数:
2024年第2期
页码:
411-419
栏目:
学术论文—自然语言处理与理解
出版日期:
2024-03-05
- Title:
-
Gate normalized encoder-decoder network for medical image report generation
- 作者:
-
谭立玮1, 张淑军2, 韩琪2, 郭淇1, 王鸿雁3
-
1. 青岛科技大学 信息科学技术学院, 山东 青岛 266061;
2. 青岛科技大学 数据科学学院, 山东 青岛 266061;
3. 青岛市干部保健服务中心, 山东 青岛 266071
- Author(s):
-
TAN Liwei1, ZHANG Shujun2, HAN Qi2, GUO Qi1, WANG Hongyan3
-
1. School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, China;
2. College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China;
3. Qingdao Cadre Health Care Servi
-
- 关键词:
-
医学影像处理; 文本处理; 特征提取; 信息融合; 通道编码; 深度学习; 报告生成器; 灰度差异
- Keywords:
-
medical image processing; text processing; feature extraction; information fusion; channel coding; deep learning; report generator; gray difference
- 分类号:
-
TP391.4;R445
- DOI:
-
10.11992/tis.202207013
- 文献标志码:
-
2023-11-17
- 摘要:
-
医学影像报告的自动生成可以减轻医生的工作强度,减少误诊或漏诊的情况发生。由于医学影像的独特性,通常病灶比较小,与正常区域灰度差异难以分辨,导致文本生成时关键词的缺失,报告不够准确。对此提出一种面向医学影像报告生成的门归一化编解码网络,通过门控通道变换单元优化视觉特征提取,加强特征间的差异,自动筛选关键特征;提出门归一化算法,沿通道维度整合上下文信息,在浅层网络激活、深层网络抑制通道间神经元活性,过滤无效特征,使文本和视觉语义充分交互,提高报告生成质量。在2种广泛使用的基准数据集IU X-Ray和MIMIC-CXR上的试验结果表明,模型能够取得先进的性能,生成的影像报告也具有更好的视觉语义一致性。
- Abstract:
-
Automatic generation of medical image reports can alleviate the workload of doctors and reduce the rate of misdiagnosis or missed diagnosis. Because of the uniqueness of medical images, lesions are usually small, and the gray difference between them and normal areas is hard to differentiate, resulting in loss of keywords in text generation and inaccurate reporting. Herein, a gated normalized encoder–decoder network for medical image report generation is developed, which optimizes visual feature extraction through the gated channel transformation unit, enhances the difference between features, and automatically screens key features. A gate normalization algorithm is designed to combine contextual information along with the channel dimensions, activate the neurons between channels in the shallow network, inhibit the neuron activity in the deep network, and filter invalid features, allowing full interaction between text and visual semantics to enhance the quality of report generation. Experimental results on two widely used reference datasets, IU X-Ray and MIMIC-CXR, reveal that the model can achieve advanced performance and generate image reports with better visual semantic consistency.
更新日期/Last Update:
1900-01-01