[1]LIU Zeming,CHENG Zihao,LIU Jingjing,et al.Evaluation of Chinese multiskill dialogues[J].CAAI Transactions on Intelligent Systems,2025,20(5):1281-1293.[doi:10.11992/tis.202411001]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
20
Number of periods:
2025 5
Page number:
1281-1293
Column:
人工智能院长论坛
Public date:
2025-09-05
- Title:
-
Evaluation of Chinese multiskill dialogues
- Author(s):
-
LIU Zeming; CHENG Zihao; LIU Jingjing; YANG Xiao; GUO Yuanfang; WANG Yunhong
-
School of Computer Science and Engineering, Beihang University, Beijing 100191, China
-
- Keywords:
-
multiskill dialogue; dialogue evaluation; chit-chat; open domain dialogue; conversational recommendation; persona-chat; knowledge-grounded dialogue; large language model
- CLC:
-
TP39
- DOI:
-
10.11992/tis.202411001
- Abstract:
-
The accurate evaluation of the capabilities of a multiskilled dialogue system is important to satisfy the different demands of users, including social banter, profound knowledge-based discussions, role-playing conversations, and dialogue recommendations. Current benchmarks concentrate on assessing specific dialogue skills and cannot efficiently evaluate multiple dialogue skills concurrently. To facilitate the evaluation of multiskill dialogues, this study establishes a Chinese multiskill evaluation benchmark, which is the Multi-Skill Dialogue Evaluation Benchmark (MSDE). MSDE contains 1,781 dialogues and 21,218 utterances, which cover four common dialogue tasks: chit-chat, knowledge dialog, persona-based dialog, and dialog recommendations. We performed extensive experiments on MSDE and examined the correlation between automatic and human evaluation metrics. Results indicate that (1) among the four dialogue tasks, chit-chat is the most difficult to analyze, while knowledge dialogue is the easiest; (2) significant differences exist in the performance of various metrics on MSDE; (3) for human evaluation, the analysis complexity of each metric differs across varying dialogue tasks. Certain data will be made available on https://github.com/IRIP-LLM/MSDE, and all data will be released after sorting.