[1]罗玲,李硕凯,何清,等.基于知识图谱、TF-IDF和BERT模型的冬奥知识问答系统[J].智能系统学报,2021,16(4):819-826.[doi:10.11992/tis.202105047]
 LUO Ling,LI Shuokai,HE Qing,et al.Winter Olympic Q & A system based on knowledge map, TF-IDF and BERT model[J].CAAI Transactions on Intelligent Systems,2021,16(4):819-826.[doi:10.11992/tis.202105047]
点击复制

基于知识图谱、TF-IDF和BERT模型的冬奥知识问答系统(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第16卷
期数:
2021年4期
页码:
819-826
栏目:
吴文俊人工智能科学技术奖论坛
出版日期:
2021-07-05

文章信息/Info

Title:
Winter Olympic Q & A system based on knowledge map, TF-IDF and BERT model
作者:
罗玲12 李硕凯12 何清12 杨骋骐2 王宇洋恒2 陈天宇2
1. 中国科学院计算技术研究所 智能信息处理重点实验室,北京 100190;
2. 中国科学院大学,北京 100049
Author(s):
LUO Ling12 LI Shuokai12 HE Qing12 YANG Chengqi2 WANG Yuyangheng2 CHEN Tianyu2
1. Key Lab of Intelligent Information Processing, Institute of Computing Technology of Chinese Academy of Sciences, Beijing 100190, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China
关键词:
智能问答冬奥问答对话模型知识图谱TF-IDFBERT
Keywords:
Intelligent Q & AWinter Olympics Q & Adialogue modelknowledge mapTF-IDFBERT
分类号:
TP391
DOI:
10.11992/tis.202105047
摘要:
传统信息检索技术已经不能满足人们对信息获取效率的要求,智能问答系统应运而生,并成为自然语言处理领域一个非常重要的研究热点。本文针对中文的冬奥问答领域,提出了基于知识图谱、词频-逆文本频率指数 (term frequency-inverse document frequency,TF-IDF)和自注意力机制的双向编码表示(bidirectional encoder representation from transformers,BERT)的3种冬奥问答系统模型。本文首次构建了冬奥问答数据集,并将上述3种方法集成在一起,应用于冬奥问答领域,用户可以使用本系统来快速准确地获取冬奥内容相关的问答知识。进一步,对3种模型的效果进行了测评,测量了3种模型各自的回答可接受率。实验结果显示BERT模型的整体效果略优于知识图谱和TDIDF模型,BERT模型对3类问题的回答可接受率都超过了96%,知识图谱和TDIDF模型对于复合统计问答对的回答效果不如BERT模型。
Abstract:
With the advent of the information age, traditional information retrieval technology can no longer meet people’s requirements for the efficiency in information acquisition, so intelligent question answering systems are proposed and have become a very important research hotspot in natural language processing. This paper proposes three Winter Olympics Q&A system models based on knowledge graph, TFIDF and BERT for the Chinese Winter Olympics Q&A, constructing the Winter Olympics Q&A data set for the first time and integrating the above three methods into the Winter Olympics Q&A. Users can use this system to quickly and accurately obtain the Q&A knowledge related to the Winter Olympics content. Furthermore, this paper evaluates the effects of the three models and measures the acceptance rate of each model. The experimental results show that overall the BERT model is slightly better than the knowledge graph and TDIDF model. The acceptance rate of the BERT model for each of the three types of questions exceeds 96%. The knowledge graph and TDIDF model are not so effective as the BERT model for the answer to the composite statistical question and answer pair.

参考文献/References:

[1] 鞠晓峰, 都军, 覃军, 等. 人工智能在智能问答系统中的应用[J]. 智能建筑与智慧城市, 2021(3): 36-37
JU Xiaofeng, DU Jun, QIN Jun, et al. Application of artificial intelligence in intelligent question answering system[J]. Smart building and smart city, 2021(3): 36-37
[2] 王银丽. 限定领域内智能问答系统的研究与实现[D]. 内蒙古: 内蒙古大学, 2008.
WANG Yinli. Research and implementation of intelligent question answering system in limited domain[D]. Inner Mongolia: Inner Mongolia University, 2008.
[3] 张宁, 朱礼军. 中文问答系统问句分析研究综述[J]. 情报工程, 2016, 2(1): 32-42
ZHANG Ning, ZHU Lijun. A survey of the research on question and answer system in Chinese[J]. Technology intelligence engineering, 2016, 2(1): 32-42
[4] MISHRA A, JAIN S K. A survey on question answering systems with classification[J]. Journal of king saud university-computer and information sciences, 2016, 28(3): 345-361.
[5] 姚冬, 李舟军, 陈舒玮, 等. 面向任务的基于深度学习的多轮对话系统与技术[J]. 计算机科学, 2021, 48(5): 232-238
YAO Dong, LI Danjun, CHEN Shuwei, et al. Task oriented multi round dialogue system and technology based on deep learning[J]. Computer science, 2021, 48(5): 232-238
[6] FENG Minwei, XIANG Bing, GLASS M R, et. al. Applying deep learning to answer selection: a study and an open task[C]// 2015 IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE, Piscataway, 2015: 813-820.
[7] 张涛, 贾真, 李天瑞. 等. 基于知识库的开放领域问答系统[J]. 智能系统学报, 2018, 13(4): 557-563
ZHANG Tao, JIA Zhen, LI Tianrui, et al. Open-domain question-answering system based on large-knowledge base[J]. CAAI transactions on intelligent systems, 2018, 13(4): 557-563
[8] NORASET T, LOWPHANSIRIKUL L, TUAROB S. Wabiqa: A wikipedia-based thai question-answering system[J]. Information processing & management, 2021, 58(1): 102431.
[9] H?FFNER K, WALTER S, MARX E, et al. Survey on challenges of question answering in the semantic web[J]. Semantic web, 2017, 8(6): 895-920.
[10] 李涛, 王次臣, 李华康. 知识图谱的发展与构建[J]. 南京理工大学学报(自然科学版), 2017, 41(1): 22-34
LI Tao, WANG Cichen, LI Huakang. Development and construction of knowledge map[J]. Journal of Nanjing University of Science and Technology (natural science edition), 2017, 41(1): 22-34
[11] 刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述[J]. 计算机研究与发展, 2016, 53(3): 582-600
LIU Qiao, LI Yang, DUAN Hong, et al. Overview of knowledge mapping technology[J]. Journal of computer research and developmen, 2016, 53(3): 582-600
[12] 徐梦婷. 基于知识图谱的多轮问答系统[D]. 南京: 南京邮电大学, 2020.
XU Mengting. Multi round question answering system based on knowledge map[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2020.
[13] 陈勇. 基于知识图谱的智能系统在电力行业的应用[D]. 南京: 南京师范大学, 2020.
CHEN Yong. Application of intelligent system based on knowledge map in power industry[D]. Nanjing: Nanjing Normal University, 2020.
[14] PRZYBY?A P. Boosting question answering by deep entity recognition[J]. arXiv preprint arXiv: 1605.08675, 2016.
[15] YIH Wentau, CHANG Mingwei. Semantic parsing via staged query graph generation: question answering with knowledge base[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing, China. 2015: 1321-1331.
[16] 贾中浩, 宾辰忠, 古天龙, 等. 基于知识图谱和用户长短期偏好的个性化景点推荐[J]. 智能系统学报, 2020, 15(5): 990-997
JIA Zhonghao, BIN Chenzhong, GU Tianlong, et al. Personalized attraction recommendation based on the knowledge graph and users’ long-term and short-term preferences[J]. CAAI transactions on intelligent systems, 2020, 15(5): 990-997
[17] 陆亚辉. 面向服务机器人的口语对话系统研究与实现[D]. 哈尔滨: 哈尔滨工业大学, 2017.
LU Yahui. Research and implementation of oral dialogue system for service robot[D]. Harbin: Harbin Institute of Technology, 2017.
[18] 张辛. 基于TFIDF算法的全面从严治党重要论述关键词共现分析[J]. 现代盐化工, 2019(7): 150-152
ZHANG Xin. Key words co-occurrence analysis of comprehensive and strict party governance based on TFIDF algorithm[J]. Modern salt chemical industry, 2019(7): 150-152
[19] 苏林萍, 林小倩, 陈飞, 等. 基于N-Gram和TFIDF的SQL注入检测方法[J]. 计算机与数字工程, 2021(6): 1177-1181
SU Linping, LIN Xiaoqian, CHEN Fei, et al. SQL injection detection method based on N-gram and TFIDF[J]. Computer and digital engineerin, 2021(6): 1177-1181
[20] 刘娟, 郝云强. 尹雪雪 网络舆情信息挖掘关键技术分析[J]. 信息科技, 2021(3): 94-95
LIU Juan, HAO Yunqiang, YIN Xuexue. Analysis on key technologies of network public opinion information mining[J]. Information technology, 2021(3): 94-95
[21] 吴思慧, 陈世平. 结合TFIDF的Self-Attention-Based Bi-LSTM的垃圾短信识别[J]. 计算机系统应用, 2020, 29(9): 171-177
WU Sihui, CHEN Shiping. Spam message recognition based on self attention based Bi LSTM combined with TFIDF[J]. Computer systems & applications, 2020, 29(9): 171-177
[22] 李海林, 邹金串. 基于分类词典的文本相似性度量方法[J]. 智能系统学报, 2017, 12(4): 556-562
LI Hailin, ZOU Jinchuan. Text similarity measure method based on classified dictionary[J]. CAAI transactions on intelligent systems, 2017, 12(4): 556-562
[23] 曹旭友, 周志平, 王利, 等. 基于BERT+ATT和DBSCAN的长三角专利匹配算法[J]. 信息技术, 2020, 44(3): 1-5, 12
CAO Xuyou, ZHOU Zhiping, WANG Zhao, et al. Patent matching algorithm in Yangtze River Delta Based on Bert + ATT and DBSCAN[J]. Information technology, 2020, 44(3): 1-5, 12
[24] 吴炎, 王儒敬. 基于BERT的语义匹配算法在问答系统中的应用[J]. 仪表技术, 2020(6): 19-22, 30
WU Yan, WANG Rujing. Application of semantic matching algorithm based on Bert in question answering system[J]. Instrumentation technology, 2020(6): 19-22, 30
[25] 朱鹤, 陆小锋, 薛雷. 基于BERT的金融文本情感分析模型[J], 上海大学学报:自然科学版. https://kns.cnki.net/kcms/detail/31.1718.n.20210616.1757.002.html.
ZHU He, LU Xiaofeng, XUE Lei. Financial text sentiment analysis model based on BERT[J]. Journal of Shanghai University (natural science edition). https://kns.cnki.net/kcms/detail/31.1718.n.20210616.1757.002.html.
[26] 孙士琦, 汤鲲. 基于BERT的中文地址分词方法[J]. 信息科技, 2021(9): 155-159
SUN Shiqi, TANG Kun. Chinese address segmentation method based on Bert[J]. Information technology, 2021(9): 155-159
[27] 彭宇, 李晓瑜, 胡世杰,等. 基于BERT的三阶段式问答模型[J]. 计算机应用, 2021(8): 1-8
PENG Yu, LI Xiaoyu, HU Shijie, et al. Three stage question answering model based on Bert[J]. Journal of computer applications, 2021(8): 1-8
[28] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in neural information processing systems. 2017: 5998-6008.

备注/Memo

备注/Memo:
收稿日期:2021-05-31。
基金项目:国家重点研发计划项目(2017YFB1002104)
作者简介:罗玲,女,硕士,主要研究方向为自然语言处理与强化学习;李硕凯,博士研究生,主要研究方向为数据挖掘、推荐系统与元学习;何清,研究员,博士生导师,中国人工智能学会副秘书长、常务理事、知识工程与分布智能专业委员会秘书长、机器学习专业委员会常务委员,中国计算机学会高级会员、人工智能与模式识别专业委员会委员,中国电子学会云计算专家委员会委员.主要研究方向为机器学习、数据挖掘、文本挖掘、基于云计算的分布式并行数据挖掘。主持和参与国家“863”和“973”计划、国家自然科学基金等科研项目多项, 2008年底,何清研究员带领他的中科院计算所数据挖掘团队,受中国移动研究院委托,合作开发完成了基于云计算的并行数据挖掘平台,用于TB级实际数据的挖掘,实现了高性能、低成本的数据挖掘。发表学术论文近百篇
通讯作者:何清.E-mail:heqing@ict.ac.cn
更新日期/Last Update: 1900-01-01