[1]ZHANG Hongxi,CAI Zhijie.Construction of a Tibetan verb-ending type dataset for automatic question answering[J].CAAI Transactions on Intelligent Systems,2025,20(5):1207-1216.[doi:10.11992/tis.202410002]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
20
Number of periods:
2025 5
Page number:
1207-1216
Column:
学术论文—自然语言处理与理解
Public date:
2025-09-05
- Title:
-
Construction of a Tibetan verb-ending type dataset for automatic question answering
- Author(s):
-
ZHANG Hongxi1; 2; CAI Zhijie1; 2
-
1. College of Computer Science and Technology, Qinghai Normal University, Xining 810016, China;
2. The State Key Laboratory of Tibetan Intelligence, Xining 810008, China
-
- Keywords:
-
natural language processing; Tibetan; automatic Q& A; TiQuAD_36414 dataset; Q& A template; verb; la case auxiliary word; effectiveness
- CLC:
-
TP391
- DOI:
-
10.11992/tis.202410002
- Abstract:
-
The Tibetan automatic question answering (Q&A) dataset serves as a crucial data foundation for advancing research in Tibetan automatic Q&A technologies. To solve the problem of the lack of automatic Q&A datasets in Tibetan, this paper first examines the features of the most common verb-ending type sentences in Tibetan based on an analysis of the current status of automatic Q&A dataset construction in English, Chinese, and Tibetan. Then, this study constructs templates for sentences and questions and proposes a template-based method for building a Tibetan automatic Q&A dataset with “verb-ending + La case auxiliary word” sentences. Then, a new Tibetan automatic Q&A dataset (TiQuAD_36414) is generated according to this approach. Finally, the validity of this dataset is verified using the MOS(mean opinion score) method, along with the F1 and EM(exact match) scores of the BiDAF(bidirectional attention flow), RNet(Gated Self-Matching Networks), and QANet(question answering net) models. The experimental results show that the performance of the TiQuAD_36414 dataset is better than that of the baseline Tibetan Q&A dataset.