[1]柳泽明,程子豪,刘晶晶,等.中文多技能对话评估[J].智能系统学报,2025,20(5):1281-1293.[doi:10.11992/tis.202411001]
 LIU Zeming,CHENG Zihao,LIU Jingjing,et al.Evaluation of Chinese multiskill dialogues[J].CAAI Transactions on Intelligent Systems,2025,20(5):1281-1293.[doi:10.11992/tis.202411001]
点击复制

中文多技能对话评估

参考文献/References:
[1] BAI Jinze, BAI Shuai, CHU Yunfei, et al. Qwen technical report[EB/OL]. (2023-09-28)[2024-11-01]. https://arxiv.org/pdf/2309.16609.
[2] YANG Aiyuan, XIAO Bin, WANG Bingning, et al. Baichuan 2: open large-scale language models[EB/OL]. (2023-09-19)[2024-11-01]. https://arxiv.org/abs/2309.10305.
[3] TOUVRON H, MARTIN L, STONE K, et al. Llama 2: Open foundation and fine-tuned chat models[EB/OL]. (2023-07-18)[2024-11-01]. https://arxiv.org/abs/2307.09288.
[4] ZENG Aohan, XU Bin, WANG Bowen, et al. ChatGLM: a family of large language models from GLM-130B to GLM-4 all tools[EB/OL]. (2024-07-30)[2024-11-01]. https://arxiv.org/abs/2406.12793v2.
[5] ADIWARDANA D, LUONG M T, SO D R, et al. Towards a human-like open-domain chatbot[EB/OL]. (2020-02-27)[2024-11-01]. https://arxiv.org/abs/2001.09977.
[6] ROLLER S, DINAN E, GOYAL N, et al. Recipes for building an open-domain chatbot[EB/OL]. (2020-04-30)[2024-11-01]. https://arxiv.org/abs/2004.13637v2.
[7] SHUSTER K, JU Da, ROLLER S, et al. The dialogue dodecathlon: open-domain knowledge and image grounded conversational agents[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 2453–2470.
[8] 马中红, 吴熙倡. 社交聊天机器人的性别偏见: 基于小冰系列的对话测试研究[J]. 国际新闻界, 2024, 46(4): 72-89.
MA Zhonghong, WU Xichang. Gender bias in social chatbots: a conversation test study based on xiaoice series of chatbots[J]. Chinese journal of journalism & communication, 2024, 46(4): 72-89.
[9] 赵妍妍, 陆鑫, 赵伟翔, 等. 情感对话技术综述[J]. 软件学报, 2024, 35(3): 1377-1402.
ZHAO Yanyan, LU Xin, ZHAO Weixiang, et al. Survey on emotional dialogue techniques[J]. Journal of software, 2024, 35(3): 1377-1402.
[10] 房小绵. 基于语音识别的英语智能对话机器人人机交互系统设计[J]. 自动化与仪器仪表, 2023(4): 225-228, 232.
FANG Xiaomian. Design of human-computer interaction system for English intelligent conversation robot based on speech recognition[J]. Automation & instrumentation, 2023(4): 225-228, 232.
[11] 车万翔, 窦志成, 冯岩松, 等. 大模型时代的自然语言处理: 挑战、机遇与发展[J]. 中国科学: 信息科学, 2023, 53(9): 1645-1687.
CHE Wanxiang, DOU Zhicheng, FENG Yansong, et al. Towards a comprehensive understanding of the impact of large language models on natural language processing: challenges, opportunities and future directions[J]. Scientia sinica (informationis), 2023, 53(9): 1645-1687.
[12] 王曦, 曾广平, 乔柱. 面向心理健康的服务机器人设计与实现[J]. 制造业自动化, 2021, 43(6): 137-141.
WANG Xi, ZENG Guangping, QIAO Zhu. Design and implementation of mental health oriented service robot[J]. Manufacturing automation, 2021, 43(6): 137-141.
[13] SMITH E M, WILLIAMSON M, SHUSTER K, et al. Can you put it all together: evaluating conversational agents’ ability to blend skills[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 2021-2030.
[14] LIU Zeming, WANG Haifeng, NIU Zhengyu, et al. Towards conversational recommendation over multi-type dialogs[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 1036-1049.
[15] LIU C W, LOWE R, SERBAN I V, et al. How NOT to evaluate your dialogue system: an empirical study of unsupervised evaluation metrics for dialogue response generation[EB/OL]. (2016-03-25)[2024-11-01]. https://arxiv.org/abs/1603.08023.
[16] YEH Y T, ESKENAZI M, MEHRI S. A comprehensive assessment of dialog evaluation metrics[EB/OL]. (2021-07-07)[2024-11-01]. https://arxiv.org/abs/2106.03706v4.
[17] SELLAM T, DAS D, PARIKH A. BLEURT: learning robust metrics for text generation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. [S. l. ]: Association for Computational Linguistics, 2020: 7881-7892.
[18] PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Philadelphia: ACL, 2001: 311-318.
[19] BANERJEE S, LAVIE A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Ann Arbor: Association for Computational Linguistics, 2005: 65-72.
[20] 刘阳阳, 董涛. 基于对话模型的聊天机器人结构研究[J]. 信息技术与信息化, 2023(1): 13-16.
LIU Yangyang, DONG Tao. Research on the structure of chat robot based on dialogue model[J]. Information technology and informatization, 2023(1): 13-16.
[21] LI Yanran, SU Hui, SHEN Xiaoyu, et al. DailyDialog: a manually labelled multi-turn dialogue dataset[EB/OL]. (2017-10-11)[2024-11-01]. https://arxiv.org/abs/1710.03957v1.
[22] GOPALAKRISHNAN K, HEDAYATNIA B, CHEN Qinlang, et al. Topical-chat: towards knowledge-grounded open-domain conversations[C]//Interspeech 2019. Graz: ISCA, 2019: 1891-1895.
[23] ZHANG Saizheng, DINAN E, URBANEK J, et al. Personalizing dialogue agents: I have a dog, do you have pets too?[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne: Association for Computational Linguistics, 2018: 2204-2213.
[24] DINAN E, LOGACHEVA V, MALYKH V, et al. The second conversational intelligence challenge (ConvAI2)[C]//The NeurIPS’18 Competition: From Machine Learning to Intelligent Conversations. Cham: Springer International Publishing, 2020: 187-208.
[25] 魏泽林, 张帅, 王建超. 基于知识图谱问答系统的技术实现[J]. 软件工程, 2021, 24(2): 38-44.
WEI Zelin, ZHANG Shuai, WANG Jianchao. Implementation of question answering based on knowledge graph[J]. Software engineering, 2021, 24(2): 38-44.
[26] 叶健辉, 韩博文, 周帆, 等. 基于自然语言处理的人机对话调控机器人设计[J]. 中国科技信息, 2020(22): 63-65.
YE Jianhui, HAN Bowen, ZHOU Fan, et al. Design of man-machine dialogue control robot based on natural language processing[J]. China science and technology information, 2020(22): 63-65.
[27] 张雨璇, 沙立成, 王海霞, 等. 电网调度智能对话机器人的系统架构和关键技术研究[J]. 电子设计工程, 2022, 30(11): 45-49.
ZHANG Yuxuan, SHA Licheng, WANG Haixia, et al. Research on system architecture and key technologies of intelligent conversation robot for power grid dispatching[J]. Electronic design engineering, 2022, 30(11): 45-49.
[28] LI Jiwei, GALLEY M, BROCKETT C, et al. A diversity-promoting objective function for neural conversation models[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego: ACL, 2016: 110-119.
[29] VEDANTAM R, ZITNICK C L, PARIKH D. CIDEr: consensus-based image description evaluation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 4566-4575.
[30] MEHRI S, ESKENAZI M. USR: an unsupervised and reference free evaluation metric for dialog generation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. [S. l. ]: Association for Computational Linguistics, 2020: 681–707.
[31] HUANG Lishan, YE Zheng, QIN Jinghui, et al. GRADE: automatic graph-enhanced coherence metric for evaluating open-domain dialogue systems[EB/OL]. (2020-10-08)[2024-11-01]. https://arxiv.org/abs/2010.03994v1.
[32] PANG Bo, NIJKAMP E, HAN Wenjuan, et al. Towards holistic and automatic evaluation of open-domain dialogue generation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 3619–3629.
[33] GHAZARIAN S, WEISCHEDEL R, GALSTYAN A, et al. Predictive engagement: an efficient metric for automatic evaluation of open-domain dialogue systems[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New York: AAAI, 2020: 7789-7796.
[34] HORI C, HORI T. End-to-end conversation modeling track in DSTC6[EB/OL]. (2018-01-30)[2024-11-01]. https://arxiv.org/abs/1706.07440v2.
[35] MEHRI S, ESKENAZI M. USR: an unsupervised and reference free evaluation metric for dialog generation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 681–707.
[36] GUNASEKARA C, KIM S, D’HARO L F, et al. Overview of the ninth dialog system technology challenge: DSTC9[J]. IEEE/ACM transactions on audio, speech, and language processing, 2024, 32: 4066-4076.
[37] ZHENG Lianmin, CHIANG W L, SHENG Ying, et al. Judging LLM-as-a-judge with MT-bench and Chatbot Arena[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2023: 46595-46623.
[38] 中国计算机学会, 中国中文信息学会, 百度. 2021语言与智能技术竞赛: 多技能对话任务[EB/OL]. (2021-05-16)[2024-11-01]. https://aistudio.baidu.com/aistudio/competition/detail/67.
[39] 中国计算机学会. 千言: 多技能对话[EB/OL]. (2021-01-24)[2024-11-01]. https://www.datafountain.cn/competitions/470.
[40] WANG Yida, KE Pei, ZHENG Yinhe, et al. A large-scale Chinese short-text conversation dataset[EB/OL]. (2022-04-26)[2024-11-01]. https://arxiv.org/abs/2008.03946v2.
[41] WU Wenquan, GUO Zhen, ZHOU Xiangyang, et al. Proactive human-machine conversation with explicit conversation goal[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: ACL, 2019: 3794–3804.
[42] XU Xinchao, GOU Zhibin, WU Wenquan, et al. Long time No see! open-domain conversation with long-term persona memory[C]//Findings of the Association for Computational Linguistics: ACL 2022. Dublin: ACL, 2022: 2639–2650.
[43] LIN C Y. ROUGE: a package for automatic evaluation of summaries[C]//Text Summarization Branches Out. Barcelona: Association for Computational Linguistics, 2004: 74–81.
[44] ZHANG Tianyi, KISHORE V, WU F, et al. BERTscore: evaluating text generation with BERT[C]//Proceedings of the International Conference on Learning Representations. New Orleans: OpenReview.net, 2019: 1-43.
[45] KIROS R, ZHU Yukun, SALAKHUTDINOV R, et al. Skip-thought vectors[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal: Curran Associates Inc., 2015: 3294-3302.
[46] FORGUES G, PINEAU J, LARCHEVêQUE J M, et al. Bootstrapping dialog systems with word embeddings[C]//Proceedings of NIPS Modern Machine Learning and Natural Language Processing Workshop. Montreal: Curran Associates Inc., 2014: 1-5.

备注/Memo

收稿日期:2024-11-1。
基金项目:国家重点研发计划项目(2023YFF0725600);国家自然科学基金项目(62406015).
作者简介:柳泽明,助理教授,博士,中国中文信息学会大模型与生成专业委员会委员,中国中文信息学会具身智能专业委员会(筹)副秘书长和创始委员。主要研究方向为自然语言处理、对话系统、大模型、具身智能。主持国家自然科学基金、国家重点研发计划青年科学家项目任务、CCF-百度松果基金、多个校企科研合作项目等。获北航卓越青年学者、中国国际大学生创新大赛北京赛区“优秀创新创业导师”等。获发明专利授权10项,发表学术论文40余篇,包括第一作者和通信作者论文20余篇。E-mail:zmliu@buaa.edu.cn。;程子豪,主要研究方向为自然语言处理和工具学习。E-mail:zihaocheng@buaa.edu.c。;王蕴红, 教授,北京航空航天大学计算机学院院长,中国人工智能学会智能交互专委会主任、中国人工智能学会常务理事、中国图象图形学学会常务理事,国际电气与电子工程师学会会士、国际模式识别协会会士、中国计算机学会会士、中国人工智能学会会士。先后主持国家高技术研究发展计划项目、 国家重点基础研究发展计划项目、国家自然科学基金项目等项目。曾获得国家技术发明二等奖、中国青年科技奖、北京市教学成果一等奖,曾被科技部授予 863 计划先进个人,入选教育部新世纪优秀人才计划。获得国际模式识别学会女性科学家Maria Petrou 奖,是该奖设立以来第一位获得此奖项的华人。获发明专利授权 30 余项,发表学术论文 200 余篇。E-mail:yhwang@buaa.edu.cn。
通讯作者:王蕴红. E-mail:yhwang@buaa.edu.cn

更新日期/Last Update: 2025-09-05
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com