<-上一篇/Previous Article 下一篇/Next Article->

[1]谭立玮,张淑军,韩琪,等.面向医学影像报告生成的门归一化编解码网络[J].智能系统学报,2024,19(2):411-419.[doi:10.11992/tis.202207013]
　TAN Liwei,ZHANG Shujun,HAN Qi,et al.Gate normalized encoder-decoder network for medical image report generation[J].CAAI Transactions on Intelligent Systems,2024,19(2):411-419.[doi:10.11992/tis.202207013]

点击复制

面向医学影像报告生成的门归一化编解码网络

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 19 期数: 2024年第2期页码: 411-419 栏目: 学术论文—自然语言处理与理解出版日期: 2024-03-05

Title:: Gate normalized encoder-decoder network for medical image report generation

作者:: 谭立玮¹, 张淑军², 韩琪², 郭淇¹, 王鸿雁³; 1. 青岛科技大学信息科学技术学院, 山东青岛 266061;
2. 青岛科技大学数据科学学院, 山东青岛 266061;
3. 青岛市干部保健服务中心, 山东青岛 266071

Author(s):: TAN Liwei¹, ZHANG Shujun², HAN Qi², GUO Qi¹, WANG Hongyan³; 1. School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, China;
2. College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China;
3. Qingdao Cadre Health Care Servi

关键词:: 医学影像处理; 文本处理; 特征提取; 信息融合; 通道编码; 深度学习; 报告生成器; 灰度差异

Keywords:: medical image processing; text processing; feature extraction; information fusion; channel coding; deep learning; report generator; gray difference

分类号:: TP391.4;R445

DOI:: 10.11992/tis.202207013

文献标志码:: 2023-11-17

摘要:: 医学影像报告的自动生成可以减轻医生的工作强度，减少误诊或漏诊的情况发生。由于医学影像的独特性，通常病灶比较小，与正常区域灰度差异难以分辨，导致文本生成时关键词的缺失，报告不够准确。对此提出一种面向医学影像报告生成的门归一化编解码网络，通过门控通道变换单元优化视觉特征提取，加强特征间的差异，自动筛选关键特征；提出门归一化算法，沿通道维度整合上下文信息，在浅层网络激活、深层网络抑制通道间神经元活性，过滤无效特征，使文本和视觉语义充分交互，提高报告生成质量。在2种广泛使用的基准数据集IU X-Ray和MIMIC-CXR上的试验结果表明，模型能够取得先进的性能，生成的影像报告也具有更好的视觉语义一致性。

Abstract:: Automatic generation of medical image reports can alleviate the workload of doctors and reduce the rate of misdiagnosis or missed diagnosis. Because of the uniqueness of medical images, lesions are usually small, and the gray difference between them and normal areas is hard to differentiate, resulting in loss of keywords in text generation and inaccurate reporting. Herein, a gated normalized encoder–decoder network for medical image report generation is developed, which optimizes visual feature extraction through the gated channel transformation unit, enhances the difference between features, and automatically screens key features. A gate normalization algorithm is designed to combine contextual information along with the channel dimensions, activate the neurons between channels in the shallow network, inhibit the neuron activity in the deep network, and filter invalid features, allowing full interaction between text and visual semantics to enhance the quality of report generation. Experimental results on two widely used reference datasets, IU X-Ray and MIMIC-CXR, reveal that the model can achieve advanced performance and generate image reports with better visual semantic consistency.

参考文献/References:: [1] 姜婷, 袭肖明, 岳厚光. 基于分布先验的半监督FCM的肺结节分类[J]. 智能系统学报, 2017, 12(5): 729–734
JIANG Ting, XI Xiaoming, YUE Houguang. Classification of pulmonary nodules by semi-supervised FCM based on prior distribution[J]. CAAI transactions on intelligent systems, 2017, 12(5): 729–734
[2] 杨晓兰, 强彦, 赵涓涓, 等. 基于医学征象和卷积神经网络的肺结节CT图像哈希检索[J]. 智能系统学报, 2017, 12(6): 857–864
YANG Xiaolan, QIANG Yan, ZHAO Juanjuan, et al. Hashing retrieval for CT images of pulmonary nodules based on medical signs and convolutional neural networks[J]. CAAI transactions on intelligent systems, 2017, 12(6): 857–864
[3] ROBINSON A E, HAMMON P S, DE SA V R. Explaining brightness illusions using spatial filtering and local response normalization[J]. Vision research, 2007, 47(12): 1631–1644.
[4] WU Yuxin, HE Kaiming. Group normalization[C]//European Conference on Computer Vision. Cham: Springer, 2018: 3-19.
[5] YANG Zongxin, ZHU Linchao, WU Yu, et al. Gated channel transformation for visual recognition[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11791-11800.
[6] DEMNER-FUSHMAN D, KOHLI M D, ROSENMAN M B, et al. Preparing a collection of radiology examinations for distribution and retrieval[J]. Journal of the American medical informatics association, 2015, 23(2): 304–310.
[7] JOHNSON A E W, POLLARD T J, GREENBAUM N R, et al. MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs[EB/OL]. (2019-11-14)[2022-07-11]. https://arxiv.org/abs/1901.07042.pdf.
[8] KISILEV P, WALACH E, BARKAN E, et al. From medical image to automatic medical report generation[J]. IBM journal of research and development, 2015, 59(2/3): 1-7.
[9] KISILEV P, SASON E, BARKAN E, et al. Medical image description using multi-task-loss CNN[M]//Deep Learning and Data Labeling for Medical Applications. Cham: Springer International Publishing, 2016: 121-129.
[10] SHIN H C, ROBERTS K, LU Le, et al. Learning to read chest X-rays: recurrent neural cascade model for automated image annotation[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 2497-2506.
[11] ZHANG Zizhao, XIE Yuanpu, XING Fuyong, et al. MDNet: a semantically and visually interpretable medical image diagnosis network[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3549-3557.
[12] WANG Xiaosong, PENG Yifan, LU Le, et al. TieNet: text-image embedding network for common thorax disease classification and reporting in chest X-rays[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 9049-9058.
[13] LIAO Fangzhou, LIANG Ming, LI Zhe, et al. Evaluate the malignancy of pulmonary nodules using the 3-D deep leaky noisy-OR network[J]. IEEE transactions on neural networks and learning systems, 2019, 30(11): 3484–3495.
[14] XUE Yuan, XU Tao, RODNEY LONG L, et al. Multimodal recurrent model with attention for automated radiology report generation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2018: 457-466.
[15] YANG Shaokang, NIU Jianwei, WU Jiyan, et al. Automatic ultrasound image report generation with adaptive multimodal attention mechanism[J]. Neurocomputing, 2021, 427: 40–49.
[16] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010.
[17] ALFARGHALY O, KHALED R, ELKORANY A, et al. Automated radiology report generation using conditioned transformers[J]. Informatics in medicine unlocked, 2021, 24: 100557.
[18] CHEN Zhihong, SONG Yan, CHANG T H, et al. Generating radiology reports via memory-driven transformer[EB/OL]. (2022-04-28)[2022-07-11]. https://arxiv.org/abs/2010.16056.pdf.
[19] CHEN Zhihong, SHEN Yaling, SONG Yan, et al. Cross-modal memory networks for radiology report generation[EB/OL]. (2022-04-28)[2022-07-11]. https://arxiv.org/abs/2204.13258.pdf.
[20] LIU Fenglin, YIN Changchang, WU Xian, et al. Contrastive attention for automatic chest X-ray report generation[EB/OL]. (2022-01-09)[2022-07-11]. https://arxiv.org/abs/2106.06965.pdf.
[21] LIU Fenglin, WU Xian, GE Shen, et al. Exploring and distilling posterior and prior knowledge for radiology report generation[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13748-13757.
[22] PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics-ACL ’02. Morristown: Association for Computational Linguistics, 2001: 311-318.
[23] DENKOWSKI M, LAVIE A. Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems[C]//Proceedings of the Sixth Workshop on Statistical Machine Translation. Stroudsburg: ACL Press, 2011: 85-91.
[24] LIN C Y. Rouge: A package for automatic evaluation of summaries[C]//Proceedings of Workshop on Text Summarization Branches Out. Barcelona: ACL, 2004: 74-81.
[25] KRAUSE J, JOHNSON J, KRISHNA R, et al. A hierarchical approach for generating descriptive image paragraphs[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3337-3345.
[26] JING Baoyu, XIE Pengtao, XING E. On the automatic generation of medical imaging reports[EB/OL]. (2018-07-20)[2022-07-11]. https://arxiv.org/abs/1711.08195.pdf.
[27] JING Baoyu, WANG Zeya, XING E. Show, describe and conclude: on exploiting the structure information of chest X-ray reports[EB/OL]. (2020-07-23)[2022-07-11].https://arxiv.org/abs/2004.12274.pdf.
[28] WANG Fuyu, LIANG Xiaodan, XU Lin, et al. Unifying relational sentence generation and retrieval for medical image report composition[J]. IEEE transactions on cybernetics, 2022, 52(6): 5015–5025.
[29] VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3156-3164.
[30] XU K, BA J, KIROS R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]//International Conference on Machine Learning. New York: ACM Press, 2015: 2048-2057.
[31] LU Jiasen, XIONG Caiming, PARIKH D, et al. Knowing when to look: adaptive attention via A visual sentinel for image captioning[EB/OL]. (2017-06-07)[2022-07-11].https://arxiv.org/abs/1612.01887.pdf.
[32] RENNIE S J, MARCHERET E, MROUEH Y, et al. Self-critical sequence training for image captioning[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1179-1195.
[33] ANDERSON P, HE Xiaodong, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering[EB/OL]. (2018-03-14)[2022-07-11]. https://arxiv.org/abs/1707.07998.pdf.

备注/Memo

收稿日期:2022-07-11。
基金项目:山东省高等学校青创人才引育计划“人工智能与医学影像分析创新团队”建设项目
作者简介:谭立玮，硕士研究生，主要研究方向为计算机视觉。E-mail： 2020110009@qust.edu.cn;张淑军，副教授，主要研究方向为计算机视觉、虚拟现实技术。以第一作者发表学术论文27篇。E-mail：zhangsj@qust.edu.cn;韩琪，硕士研究生，主要研究方向为计算机视觉。E-mail：hanqi@mails.qust.edu.cn
通讯作者:张淑军. E-mail：zhangsj@qust.edu.cn

更新日期/Last Update: 1900-01-01

面向医学影像报告生成的门归一化编解码网络 PDF下载HTML

备注/Memo

面向医学影像报告生成的门归一化编解码网络

PDF下载 HTML