[1]TAN Liwei,ZHANG Shujun,HAN Qi,et al.Gate normalized encoder-decoder network for medical image report generation[J].CAAI Transactions on Intelligent Systems,2024,19(2):411-419.[doi:10.11992/tis.202207013]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
19
Number of periods:
2024 2
Page number:
411-419
Column:
学术论文—自然语言处理与理解
Public date:
2024-03-05
- Title:
-
Gate normalized encoder-decoder network for medical image report generation
- Author(s):
-
TAN Liwei1; ZHANG Shujun2; HAN Qi2; GUO Qi1; WANG Hongyan3
-
1. School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, China;
2. College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China;
3. Qingdao Cadre Health Care Servi
-
- Keywords:
-
medical image processing; text processing; feature extraction; information fusion; channel coding; deep learning; report generator; gray difference
- CLC:
-
TP391.4;R445
- DOI:
-
10.11992/tis.202207013
- Abstract:
-
Automatic generation of medical image reports can alleviate the workload of doctors and reduce the rate of misdiagnosis or missed diagnosis. Because of the uniqueness of medical images, lesions are usually small, and the gray difference between them and normal areas is hard to differentiate, resulting in loss of keywords in text generation and inaccurate reporting. Herein, a gated normalized encoder–decoder network for medical image report generation is developed, which optimizes visual feature extraction through the gated channel transformation unit, enhances the difference between features, and automatically screens key features. A gate normalization algorithm is designed to combine contextual information along with the channel dimensions, activate the neurons between channels in the shallow network, inhibit the neuron activity in the deep network, and filter invalid features, allowing full interaction between text and visual semantics to enhance the quality of report generation. Experimental results on two widely used reference datasets, IU X-Ray and MIMIC-CXR, reveal that the model can achieve advanced performance and generate image reports with better visual semantic consistency.