HUANG Heyan,LI Silin,LAN Tianwei,et al.A survey on the safety of large language model: classification, evaluation, attribution, mitigation and prospect[J].CAAI Transactions on Intelligent Systems,2025,20(1):2-32.[doi:10.11992/tis.202401006]
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
- Title:
A survey on the safety of large language model: classification, evaluation, attribution, mitigation and prospect
- 作者:
黄河燕1, 李思霖1, 兰天伟1, 邱昱力1, 柳泽明2, 姚嘉树1, 曾理1, 单赢宇1, 施晓明3, 郭宇航1
1. 北京理工大学 计算机学院, 北京 100081;
2. 北京航空航天大学 计算机学院, 北京 100191;
3. 哈尔滨工业大学 计算机学院社会计算与信息检索研究中心, 黑龙江 哈尔滨 150001
- Author(s):
HUANG Heyan1, LI Silin1, LAN Tianwei1, QIU Yuli1, LIU Zeming2, YAO Jiashu1, ZENG Li1, SHAN Yingyu1, SHI Xiaoming3, GUO Yuhang1
1. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China;
2. School of Computer Science and Engineering, Beihang University, Beijing 100191, China;
3. Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, Harbin 150001, China
- 关键词:
大语言模型; 模型自身安全性; 生成内容安全性; 安全性分类; 安全性风险评估; 安全性风险归因; 安全性风险缓解措施; 安全性研究展望
- Keywords:
large language model; model safety; generated content safety; safety classification; safety risk evaluation; safety risk attribution; safety risk mitigation measures; safety research prospect
- 分类号:
- DOI:
- 摘要:
- Abstract:
Large language models can provide answers comparable to human levels in multiple fields. It demonstrates a wealth of emergent capabilities in fields and tasks that have not been trained. However, at present, there are many hidden dangers in artificial intelligence system based on large language model. The artificial intelligence systems based on large language model have many potential safety hazard. For example, large language models are vulnerable to undetectable attacks, including intricately elusive ones. The content generated by those models may have problems such as illegality, leaks, hatred, bias, errors, etc. What’s more, in practical applications, the abuse of large language models is also an important issue. The content generated by the model may cause troubles at multiple levels such as countries, social groups, and fields. This paper aims to deeply explore and classify the safety risks faced by large language models, review existing evaluation methods, study the causal mechanisms behind the safety risks, and summarizes existing solutions. Specifically, this paper identifies 10 safety risks of large language models and categorizes them into two aspects: the safety risks of the model itself and the safety risks of the generated content. What’s more, this paper systematically analyzes the safety risks of the large language model itself from two perspectives of life cycle and hazard level, and introduces the methods for risk assessment of existing large language models, the causes for occurrence of safety risks of large language model and corresponding mitigation methods. The safety risk of large language models is an important issue that needs to be solved urgently.
通讯作者:郭宇航. E-mail:guoyuhang@bit.edu.cn
更新日期/Last Update: