[1]王璐,丁慕菲,周鹤,等.医学大语言模型的研发与应用系统综述[J].智能系统学报,2025,20(6):1295-1303.[doi:10.11992/tis.202410020]
 WANG Lu,DING Mufei,ZHOU He,et al.Developing and employing large language models in medicine[J].CAAI Transactions on Intelligent Systems,2025,20(6):1295-1303.[doi:10.11992/tis.202410020]
点击复制

医学大语言模型的研发与应用系统综述

参考文献/References:
[1] OpenAI. Introducing ChatGPT[EB/OL]. (2022-11-30)[2025-08-29]. https://openai.com/index/chatgpt.
[2] BISWAS S S. Role of chat GPT in public health[J]. Annals of biomedical engineering, 2023, 51(5): 868-869.
[3] SHEN Yongliang, SONG Kaitao, TAN Xu, et al. HuggingGPT: solving ai tasks with chatgpt and its friends in hugging face[J]. Advances in neural information processing systems, 2023, 36: 1-27
[4] SINGHAL K, TU T, GOTTWEIS J, et al. Towards expert-level medical question answering with large language models[EB/OL]. (2023-05-16)[2025-08-29]. https://arxiv.org/abs/2305.09617.
[5] YANG Xi, CHEN A, POURNEJATIAN N, et al. GatorTron: a large clinical language model to unlock patient information from unstructured electronic health records[EB/OL]. (2022-12-16)[2025-08-29]. https://arxiv.org/abs/2203.03540.
[6] HUANG Zhi, BIANCHI F, YUKSEKGONUL M, et al. A visual-language foundation model for pathology image analysis using medical Twitter[J]. Nature medicine, 2023, 29(9): 2307-2316.
[7] TIU E, TALIUS E, PATEL P, et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning[J]. Nature biomedical engineering, 2022, 6(12): 1399-1406.
[8] YALAMANCHILI A, SENGUPTA B, SONG J, et al. Quality of large language model responses to radiation oncology patient care questions[J]. JAMA network open, 2024, 7(4): e244630.
[9] CUI Yiming, YANG Ziqiang, YAO Xin. Efficient and effective text encoding for Chinese llama and alpaca[EB/OL]. (2024-02-23)[2025-08-29]. https://arxiv.org/abs/2304.08177.
[10] YANG Songhua, ZHAO Hanjie, ZHU Senbin, et al. Zhongjing: enhancing the Chinese medical capabilities of large language model through expert feedback and real-world multi-turn dialogue[J]. Proceedings of the AAAI conference on artificial intelligence, 2024, 38(17): 19368-19376.
[11] PIERI S, MULLAPPILLY S S, KHAN F S, et al. BiMediX: bilingual medical mixture of experts LLM[EB/OL]. (2024-12-10)[2025-08-29]. https://arxiv.org/abs/2402.13253.
[12] SUKEDA I, SUZUKI M, SAKAJI H, et al. JMedLoRA: medical domain adaptation on Japanese large language models using instruction-tuning[EB/OL]. (2023-12-01)[2025-08-29]. https://arxiv.org/abs/2310.10083.
[13] DEVLIN J, CHANG Mingwei, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding[EB/OL]. (2019-05-24)[2025-08-29]. https://arxiv.org/abs/1810.04805.
[14] LIN Zeming, AKIN H, RAO R, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model[J]. Science, 2023, 379(6637): 1123-1130.
[15] TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: open and efficient foundation language models[EB/OL]. (2023-02-27)[2025-08-29]. https://arxiv.org/abs/2302.13971.
[16] BOLTON E, VENIGALLA A, YASUNAGA M, et al. BioMedLM: A 2.7B parameter language model trained on biomedical text[EB/OL]. (2024-03-27)[2025-08-29]. https://arxiv.org/abs/2403.18421.
[17] LEE J, YOON W, KIM S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240.
[18] JIANG L Y, LIU X C, NEJATIAN N P, et al. Health system-scale language models are all-purpose prediction engines[J]. Nature, 2023, 619(7969): 357-362.
[19] NIJKAMP E, RUFFOLO J A, WEINSTEIN E N, et al. ProGen2: exploring the boundaries of protein language models[J]. Cell systems, 2023, 14(11): 968-978.
[20] Facebook Research. ESM (evolutionary scale modeling): ESMFold code and model release[EB/OL]. (2024-08-01)[2025-08-29]. https://github.com/facebookresearch/esm.
[21] LUO Renqian, SUN Liai, XIA Yingce, et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining[J]. Briefings in bioinformatics, 2022, 23(6): bbac409.
[22] HAN Tianyu, ADAMS L C, PAPAIOANNOU J M, et al. MedAlpaca: an open-source collection of medical conversational AI models and training data[J]. (2023-04-14)[2025-08-29]. https://arxiv.org/abs/2304.08247.
[23] GE Jin, SUN S, OWENS J, et al. Development of a liver disease-specific large language model chat interface using retrieval-augmented generation[J]. Hepatology, 2024, 80(5): 1158-1168.
[24] XU Xuhai, YAO Bingsheng, DONG Yuanzhe, et al. Leveraging large language models for mental health prediction via online text data[EB/OL]. (2024-01-28)[2025-08-29]. https://arxiv.org/abs/2307.14385.
[25] ZHOU Yukun, CHIA M A, WAGNER S K, et al. A foundation model for generalizable disease detection from retinal images[J]. Nature, 2023, 622(7981): 156-163.
[26] PENG Cheng, YANG Xi, CHEN Aokun, et al. A study of generative large language model for medical research and healthcare[J]. NPJ digital medicine, 2023, 6(1): 210.
[27] LU M Y, CHEN B, WILLIAMSON D F K, et al. A visual-language foundation model for computational pathology[J]. Nature medicine, 2024, 30(3): 863-874.
[28] CHEN R J, DING Tong, LU M Y, et al. Towards a general-purpose foundation model for computational pathology[J]. Nature medicine, 2024, 30(3): 850-862.
[29] PAN A, MUSHEYEV D, BOCKELMAN D, et al. Assessment of artificial intelligence chatbot responses to top searched queries about cancer[J]. JAMA oncology, 2023, 9(10): 1437-1440.
[30] HAVER H L, AMBINDER E B, BAHL M, et al. Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT[J]. Radiology, 2023, 307(4): e230424.
[31] AYERS J W, ZHU Z, POLIAK A, et al. Evaluating artificial intelligence responses to public health questions[J]. JAMA network open, 2023, 6(6): e2317517.
[32] PUGLIESE N, WAI-SUN WONG V W, SCHATTENBERG J M, et al. Accuracy, reliability, and comprehensibility of ChatGPT-generated medical responses for patients with nonalcoholic fatty liver disease[J]. Clin gastroenterol hepatol, 2024, 22(4): 886-889.
[33] HENSON J B, GLISSEN BROWN J R, LEE J P, et al. Evaluation of the potential utility of an artificial intelligence chatbot in gastroesophageal reflux disease management[J]. The American journal of gastroenterology, 2023, 118(12): 2276-2279.
[34] FINK M A, BISCHOFF A, FINK C A, et al. Potential of ChatGPT and GPT-4 for data mining of free-text CT reports on lung cancer[J]. Radiology, 2023, 308(3): e231362.
[35] TAYEBI ARASTEH S, HAN Tianyu, LOTFINIA M, et al. Large language models streamline automated machine learning for clinical studies[J]. Nature communications, 2024, 15(1): 1603.
[36] YAN Chao, GRABOWSKA M E, DICKSON A L, et al. Leveraging generative AI to prioritize drug repurposing candidates for Alzheimer’s disease with real-world clinical validation[J]. NPJ digital medicine, 2024, 7(1): 46.
[37] UEDA D, MITSUYAMA Y, TAKITA H, et al. ChatGPT’s diagnostic performance from patient history and imaging findings on the diagnosis please quizzes[J]. Radiology, 2023, 308(1): e231040.
[38] KUNG T H, CHEATHAM M, MEDENILLA A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models[J]. PLoS digital health, 2023, 2(2): e0000198.
[39] MIHALACHE A, HUANG R S, POPOVIC M M, et al. Performance of an upgraded artificial intelligence chatbot for ophthalmic knowledge assessment[J]. JAMA ophthalmology, 2023, 141(8): 798-800.
[40] BHAYANA R, KRISHNA S, BLEAKNEY R R. Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations[J]. Radiology, 2023, 307(5): e230582.
[41] ALI R, TANG O Y, CONNOLLY I D, et al. Performance of ChatGPT and GPT-4 on neurosurgery written board examinations[J]. Neurosurgery, 2023, 93(6): 1353-1365.
[42] SCHUBERT M C, WICK W, VENKATARAMANI V. Performance of large language models on a neurology board-style examination[J]. JAMA network open, 2023, 6(12): e2346721.
[43] SUCHMAN K, GARG S, TRINDADE A J. Chat generative pretrained transformer fails the multiple-choice American college of gastroenterology self-assessment test[J]. The American journal of gastroenterology, 2023, 118(12): 2280-2282.
[44] GOODMAN R S, RANDALL PATRINELY J, STONE C A Jr, et al. Accuracy and reliability of chatbot responses to physician questions[J]. JAMA network open, 2023, 6(10): e2336483.
[45] ZACK T, LEHMAN E, SUZGUN M, et al. Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study[J]. The lancet digital health, 2024, 6(1): e12-e22.
[46] MCGOWAN A, GUI Yunlai, DOBBS M, et al. ChatGPT and Bard exhibit spontaneous citation fabrication during psychiatry literature search[J]. Psychiatry research, 2023, 326: 115334.
[47] BENARY M, WANG X D, SCHMIDT M, et al. Leveraging large language models for decision support in personalized oncology[J]. JAMA network open, 2023, 6(11): e2343689.
[48] MUKHERJEE P, HOU B, LANFREDI R B, et al. Feasibility of using the privacy-preserving large language model vicuna for labeling radiology reports[J]. Radiology, 2023, 309(1): e231147.
[49] RAHSEPAR A A, TAVAKOLI N, KIM G H J, et al. How AI responds to common lung cancer questions: ChatGPT vs Google Bard[J]. Radiology, 2023, 307(5): e230922.
[50] SANDMANN S, RIEPENHAUSEN S, PLAGWITZ L, et al. Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks[J]. Nature communications, 2024, 15(1): 2050.
[51] WU Shaohong, TONG Wenjuan, LI Mingde, et al. Collaborative enhancement of consistency and accuracy in US diagnosis of thyroid nodules using large language models[J]. Radiology, 2024, 310(3): e232255.
[52] SAVAGE T, NAYAK A, GALLO R, et al. Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine[J]. NPJ digital medicine, 2024, 7(1): 20.
[53] AMIN K S, DAVIS M A, DOSHI R, et al. Accuracy of ChatGPT, Google Bard, and microsoft Bing for simplifying radiology reports[J]. Radiology, 2023, 309(2): e232561.
[54] LI D J, KAO Yuchen, TSAI S J, et al. Comparing the performance of ChatGPT GPT-4, Bard, and Llama-2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi-center psychiatrists[J]. Psychiatry and clinical neurosciences, 2024, 78(6): 347-352.
[55] LIM Z W, PUSHPANATHAN K, YEW S M E, et al. Benchmarking large language models’ performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard[J]. eBioMedicine, 2023, 95: 104770.
[56] OMIYE J A, LESTER J C, SPICHAK S, et al. Large language models propagate race-based medicine[J]. NPJ digital medicine, 2023, 6(1): 195.
[57] GERTZ R J, BUNCK A C, LENNARTZ S, et al. GPT-4 for automated determination of radiological study and protocol based on radiology request forms: a feasibility study[J]. Radiology, 2023, 307(5): e230877.
[58] GARCIA P, MA S P, SHAH S, et al. Artificial intelligence-generated draft replies to patient inbox messages[J]. JAMA network open, 2024, 7(3): e243201.
[59] BERNSTEIN I A, ZHANG Y V, GOVIL D, et al. Comparison of ophthalmologist and large language model chatbot responses to online patient eye care questions[J]. JAMA network open, 2023, 6(8): e2330320.
[60] DECKER H, TRANG K, RAMIREZ J, et al. Large language model-based chatbot vs surgeon-generated informed consent documentation for common procedures[J]. JAMA network open, 2023, 6(10): e2336997.
[61] RAU A, RAU S, ZOELLER D, et al. A context-based chatbot surpasses trained radiologists and generic ChatGPT in following the ACR appropriateness guidelines[J]. Radiology, 2023, 308(1): e230970.
[62] AYERS J W, POLIAK A, DREDZE M, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum[J]. JAMA internal medicine, 2023, 183(6): 589-596.
[63] LEE T C, STALLER K, BOTOMAN V, et al. ChatGPT answers common patient questions about colonoscopy[J]. Gastroenterology, 2023, 165(2): 509-511.
[64] BLEASE C, WORTHEN A, TOROUS J. Psychiatrists’ experiences and opinions of generative artificial intelligence in mental healthcare: an online mixed methods survey[J]. Psychiatry research, 2024, 333: 115724.
[65] EPPLER M, GANJAVI C, RAMACCIOTTI L S, et al. Awareness and use of ChatGPT and large language models: a prospective cross-sectional global survey in urology[J]. European urology, 2024, 85(2): 146-153.
[66] JIN J Q, DOBRY A S. ChatGPT for healthcare providers and patients: practical implications within dermatology[J]. Journal of the American Academy of Dermatology, 2023, 89(4): 870-871.
[67] YOUNG J N, O’HAGAN R, POPLAUSKY D, et al. The utility of ChatGPT in generating patient-facing and clinical responses for melanoma[J]. Journal of the American Academy of Dermatology, 2023, 89(3): 602-604.
[68] BENJAMENS S, DHUNNOO P, MESK? B. The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database[J]. NPJ digital medicine, 2020, 3: 118.
[69] BROWN S. Partnerships between health authorities and Amazon Alexa raise many possibilities: and just as many questions[J]. Canadian medical association journal, 2019, 191(41): E1141-E1142.
[70] SCHULMAN K A, NIELSEN P K Jr, PATEL K. AI alone will not reduce the administrative burden of health care[J]. JAMA, 2023, 330(22): 2159-2160.
[71] MELLO M M, ROSE S. Denial: artificial intelligence tools and health insurance coverage decisions[J]. JAMA health forum, 2024, 5(3): e240622.
[72] HARRER S. Attention is not all you need: the complicated case of ethically using large language models in healthcare and medicine[J]. eBioMedicine, 2023, 90: 104512.
[73] MINSSEN T, VAYENA E, GLENN COHEN I. The challenges for regulating medical use of ChatGPT and other large language models[J]. JAMA, 2023, 330(4): 315-316.
[74] SEZGIN E, SIRRIANNI J, LINWOOD S L. Operationalizing and implementing pretrained, large artificial intelligence linguistic models in the US health care system: outlook of generative pretrained transformer 3 (GPT-3) as a service model[J]. JMIR medical informatics, 2022, 10(2): e32875.
[75] SUN L, HUANG Y, WANG H, et al. Trustllm: trustworthiness in large language models[EB/OL]. (2024-01-10)[2025-08-29]. https://arxiv.org/abs/2401.05561.
[76] CABRAL S, RESTREPO D, KANJEE Z, et al. Clinical reasoning of a generative artificial intelligence model compared with physicians[J]. JAMA internal medicine, 2024, 184(5): 581-583.
[77] BHAYANA R. Chatbots and large language models in radiology: a practical primer for clinical and research applications[J]. Radiology, 2024, 310(1): e232756.
[78] LI Hanzhou, MOON J T, PURKAYASTHA S, et al. Ethics of large language models in medicine and medical research[J]. The lancet digital health, 2023, 5(6): e333-e335.
相似文献/References:
[1]李德毅.网络时代人工智能研究与发展[J].智能系统学报,2009,4(1):1.
 LI De-yi.AI research and development in the network age[J].CAAI Transactions on Intelligent Systems,2009,4():1.
[2]赵克勤.二元联系数A+Bi的理论基础与基本算法及在人工智能中的应用[J].智能系统学报,2008,3(6):476.
 ZHAO Ke-qin.The theoretical basis and basic algorithm of binary connection A+Bi and its application in AI[J].CAAI Transactions on Intelligent Systems,2008,3():476.
[3]徐玉如,庞永杰,甘?? 永,等.智能水下机器人技术展望[J].智能系统学报,2006,1(1):9.
 XU Yu-ru,PANG Yong-jie,GAN Yong,et al.AUV—state-of-the-art and prospect[J].CAAI Transactions on Intelligent Systems,2006,1():9.
[4]王志良.人工心理与人工情感[J].智能系统学报,2006,1(1):38.
 WANG Zhi-liang.Artificial psychology and artificial emotion[J].CAAI Transactions on Intelligent Systems,2006,1():38.
[5]赵克勤.集对分析的不确定性系统理论在AI中的应用[J].智能系统学报,2006,1(2):16.
 ZHAO Ke-qin.The application of uncertainty systems theory of set pair analysis (SPU)in the artificial intelligence[J].CAAI Transactions on Intelligent Systems,2006,1():16.
[6]秦裕林,朱新民,朱? 丹.Herbert Simon在最后几年里的两个研究方向[J].智能系统学报,2006,1(2):11.
 QIN Yu-lin,ZHU Xin-min,ZHU Dan.Herbert Simons two research directions in his lost years[J].CAAI Transactions on Intelligent Systems,2006,1():11.
[7]谷文祥,李 丽,李丹丹.规划识别的研究及其应用[J].智能系统学报,2007,2(1):1.
 GU Wen-xiang,LI Li,LI Dan-dan.Research and application of plan recognition[J].CAAI Transactions on Intelligent Systems,2007,2():1.
[8]杨春燕,蔡 文.可拓信息-知识-智能形式化体系研究[J].智能系统学报,2007,2(3):8.
 YANG Chun-yan,CAI Wen.A formalized system of extension information-knowledge-intelligence[J].CAAI Transactions on Intelligent Systems,2007,2():8.
[9]赵克勤.SPA的同异反系统理论在人工智能研究中的应用[J].智能系统学报,2007,2(5):20.
 ZHAO Ke-qin.The application of SPAbased identicaldiscrepancycontrary system theory in artificial intelligence research[J].CAAI Transactions on Intelligent Systems,2007,2():20.
[10]王志良,杨?? 溢,杨?? 扬,等.一种周期时变马尔可夫室内位置预测模型[J].智能系统学报,2009,4(6):521.[doi:10.3969/j.issn.1673-4785.2009.06.009]
 WANG Zhi-liang,YANG Yi,YANG Yang,et al.A periodic time-varying Markov model for indoor location prediction[J].CAAI Transactions on Intelligent Systems,2009,4():521.[doi:10.3969/j.issn.1673-4785.2009.06.009]

备注/Memo

收稿日期:2024-10-15。
基金项目:国家自然科学基金项目(92259104).
作者简介:王璐,博士研究生,主要研究方向为医学数据分析、自然语言处理。发表学术论文论文7篇。 E-mail:luwang@sj-hospital.org。;丁慕菲,硕士研究生,主要研究方向为医学图像处理。 E-mail:dingmuou@163.com。;宋江典,副教授,博士,中国计算机学会数字医学分会执行委员,主要研究方向为医学图像处理与人工智能,主持国家自然科学基金项目2项。发表学术论文34篇。 E-mail:jdsong@cmu.edu.cn。
通讯作者:宋江典. E-mail:jdsong@cmu.edu.cn

更新日期/Last Update: 1900-01-01
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com