[1]仁青吉,才智杰.一种基于形容词知识库的藏文文本数据增强方法[J].智能系统学报,2026,21(2):519-528.[doi:10.11992/tis.202503033]
 REN Qingji,CAI Zhijie.A method for enhancing Tibetan text data based on adjective knowledge base[J].CAAI Transactions on Intelligent Systems,2026,21(2):519-528.[doi:10.11992/tis.202503033]
点击复制

一种基于形容词知识库的藏文文本数据增强方法

参考文献/References:
[1] SHORTEN C, KHOSHGOFTAAR T M, FURHT B. Text data augmentation for deep learning[J]. Journal of big data, 2021, 8(1): 101
[2] 江荻. 藏语形容词的音节数形态与形态类型[J]. 中国语言学报, 2020(00): 1-27 JIANG Di. Syllable number morphology and morphological types of Tibetan adjectives[J]. Journal of Chinese linguistics, 2020(00): 1-27
[3] LITAKE O, YAGNIK N, LABHSETWAR S. IndiText boost: text augmentation for low resource India languages[EB/OL]. (2024-01-23) [2025-03-24]. https://arxiv.org/abs/2401.13085.
[4] 张虎, 张颖, 杨陟卓, 等. 基于数据增强的高考阅读理解自动答题研究[J]. 中文信息学报, 2021, 35(9): 132-140 ZHANG Hu, ZHANG Ying, YANG Zhizhuo, et al. Data augmentation based automatic answering of reading comprehension in college entrance examination[J]. Journal of Chinese information processing, 2021, 35(9): 132-140
[5] 葛轶洲, 许翔, 杨锁荣, 等. 序列数据的数据增强方法综述[J]. 计算机科学与探索, 2021, 15(7): 1207-1219 GE Yizhou, XU Xiang, YANG Suorong, et al. Survey on Sequence Data Augmentation[J]. Journal of frontiers of computer science & technology, 2021, 15(7): 1207-1219
[6] GHOSH S, TYAGI U, SURI M, et al. ACLM: a selective-denoising based generative data augmentation approach for low-resource complex NER[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Toronto: Association for Computational Linguistics, 2023: 104-125.
[7] YAN Ge, LI Yu, ZHANG Shu, et al. Data augmentation for deep learning of judgment documents[C]//Intelligence Science and Big Data Engineering. Big Data and Machine Learning. Cham: Springer International Publishing, 2019: 232-242.
[8] 王可超, 郭军军, 张亚飞, 等. 基于回译和比例抽取孪生网络筛选的汉越平行语料扩充方法[J]. 计算机工程与科学, 2022, 44(10): 1861-1868 WANG Kechao, GUO Junjun, ZHANG Yafei, et al. A Chinese-Vietnamese parallel corpus expansion method based on back translation and proportional extraction Siamese network screening[J]. Computer engineering and science, 2022, 44(10): 1861-1868
[9] ZHANG Jinyi, TIAN Ye, MAO Jiannan, et al. WCC-JC: a web-crawled corpus for Japanese-Chinese neural machine translation[J]. Applied sciences, 2022, 12(12): 6002
[10] HOANG V C D, KOEHN P, HAFFARI G, et al. Iterative back-translation for neural machine translation[C]//Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. Melbourne: Association for Computational Linguistics, 2018: 18-24.
[11] 祁瑞艳, 李龙杰, 徐世琤, 等. 基于跨度与类别增强的中文新闻命名实体识别[J]. 智能科学与技术学报, 2024, 6(4): 495-508 QI Ruiyan, LI Longjie, XU Shicheng, et al. Named entity recognition based on span and category enhancement for Chinese news[J]. Chinese journal of intelligent science and technology, 2024, 6(4): 495-508
[12] ZHOU Chunting, MA Xuezhe, HU Junjie, et al. Handling syntactic divergence in low-resource machine translation[EB/OL]. (2019-08-30)[2025-03-24]. https://arxiv.org/abs/1909.00040.
[13] 廖俊伟. 深度学习大模型时代的自然语言生成技术研究[D]. 成都: 电子科技大学, 2023. LIAO Junwei. Research on natural language generation techniques in the large language model era of deep learning[D]. Chengdu: University of Electronic Science and Technology of China, 2023.
[14] ZHANG Xiang, ZHAO Junbo, LECUN Y. Character-level convolutional networks for text classification[J]. Advances in neural information processing systems, 2015: 649-657.
[15] WEI J, ZOU Kai. EDA: easy data augmentation techniques for boosting performance on text classification tasks[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong: Association for Computational Linguistics, 2019: 6382-6388.
[16] COULOMBE C. Text data augmentation made simple by leveraging NLP cloud APIs[EB/OL]. (2018-12-05)[2025-03-24]. https://arxiv.org/abs/1812.04718.
[17] FADAEE M, BISAZZA A, MONZ C. Data augmentation for low-resource neural machine translation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Vancouver: Association for Computational Linguistics, 2017: 567-573.
[18] 张蓉, 刘渊. 适用于方面级情感分析的多级数据增强方法[J]. 数据与计算发展前沿, 2023, 5(5): 140-153 ZHANG Rong, LIU Yuan. Multi-level data augmentation method for aspect-based sentiment analysis[J]. Frontiers of data & computing, 2023, 5(5): 140-153
[19] 尤丛丛, 高盛祥, 余正涛, 等. 基于同义词数据增强的汉越神经机器翻译方法[J]. 计算机工程与科学, 2021, 43(8): 1497-1502 YOU Congcong, GAO Shengxiang, YU Zhengtao, et al. A Chinese-Vietnamese neural machine translation method based on synonym data augmentation[J]. Computer engineering and science, 2021, 43(8): 1497-1502
[20] 汪超. 基于数据增强技术的藏汉机器翻译方法研究[D]. 拉萨: 西藏大学, 2023. WANG Chao. A study on Tibetan-Chinese machine translation method based on data enhancement technology[D]. Lasa: Xizang University, 2023.
[21] 色差甲, 班马宝, 才让加, 等. 结合数据增强方法的藏文预训练语言模型[J]. 中文信息学报, 2024, 38(9): 66-72 SE Chajia, BAN Mabao, CAI Rangjia, et al. Tibetan pre-training language model combined with data enhancement method[J]. Journal of Chinese information processing, 2024, 38(9): 66-72
[22] 马进武. 藏语语法四种结构明晰[M]. 北京: 民族出版社, 2008.
[23] 吉太加. 现代藏语语法通论[M]. 西宁: 青海民族出版社, 2022.
[24] 马拉毛草. 基于语料库的藏语形容词功能属性研究[D]. 兰州: 西北民族大学, 2013. MA Lamaocao. Corpus of Tibetan words describe attributes based on function[D]. Lanzhou: Northwest University for Nationalities, 2013.
[25] 周毛太. 藏语形容词的功能分类及其情感研究[D]. 兰州: 西北民族大学, 2020. ZHOU Maotai. The research on the classification of Tibetan adjectives and it’s emotion[D]. Lanzhou: Northwest University for Nationalities, 2020.
[26] QUN Nuo, LI Xing, QIU Xipeng, et al. End-to-end neural text classification for Tibetan[C]//Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. Cham: Springer International Publishing, 2017: 472-480.
[27] CER D, DIAB M, AGIRRE E, et al. SemEval-2017 task 1: semantic textual similarity multilingual and Crosslingual focused evaluation[C]//Proceedings of the 11th International Workshop on Semantic Evaluation(SemEval-2017). Vancouver: ACL, 2017: 1-14.
[28] GAO Tianyu, YAO Xingcheng, CHEN Danqi. SimCSE: simple contrastive learning of sentence embeddings[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Punta Cana: Association for Computational Linguistics, 2021: 6894-6910.
[29] SANGJEE D. Sangjeedondrub/tibetan-roberta-basehuggin-gace[EB/OL]. (2024-06-25) [2025-03-24]. https://huggingface.co/sangjeedondrub/Tibetan-roberta-base.
[30] 青海师范大学省部共建藏语智能信息处理及应用国家重点实验室和兰州大学开源软件与实时系统教育部工程研究中心. 藏文预训练语言模型TBERT github[EB/OL]. (2023-10-08)[2025-03-24]. https://github.com/Dslab-NLP/Tibetan-PLM.
[31] LIU Sisi, DENG Junjie, SUN Yuan, et al. TiBERT: Tibetan pre-trained language model[C]//2022 IEEE International Conference on Systems, Man, and Cybernetics. Prague: IEEE, 2022: 2956-2961.
[32] YANG Ziqing, XU Zihang, CUI Yiming, et al. CINO: a Chinese minority pre-trained language model[C]//Proceedings of the 29th International Conference on Computational Linguistics. Gyeongju: COLING, 2022: 3937–3949.
[33] 林荣华. 基于卷积神经网络的句子分类算法[D]. 杭州: 浙江大学, 2015. LIN Ronghua. Convolutional neural network based sentence classification algorithm. Hangzhou: Zhejiang University, 2015.
相似文献/References:
[1]李 蕾,周延泉,钟义信.基于语用的自然语言处理研究与应用初探[J].智能系统学报,2006,1(2):1.
 LI Lei,ZHOU Yan-quan,ZHONG Yi-xin.Pragmatic Information Based NLP Research and Application[J].CAAI Transactions on Intelligent Systems,2006,1():1.
[2]李德毅.AI——人类社会发展的加速器[J].智能系统学报,2017,12(5):583.[doi:10.11992/tis.201710016]
 LI Deyi.Artificial intelligence:an accelerator for the development of human society[J].CAAI Transactions on Intelligent Systems,2017,12():583.[doi:10.11992/tis.201710016]
[3]陈培,景丽萍.融合语义信息的矩阵分解词向量学习模型[J].智能系统学报,2017,12(5):661.[doi:10.11992/tis.201706012]
 CHEN Pei,JING Liping.Word representation learning model using matrix factorization to incorporate semantic information[J].CAAI Transactions on Intelligent Systems,2017,12():661.[doi:10.11992/tis.201706012]
[4]张森,张晨,林培光,等.基于用户查询日志的网络搜索主题分析[J].智能系统学报,2017,12(5):668.[doi:10.11992/tis.201706096]
 ZHANG Sen,ZHANG Chen,LIN Peiguang,et al.Web search topic analysis based on user search query logs[J].CAAI Transactions on Intelligent Systems,2017,12():668.[doi:10.11992/tis.201706096]
[5]王一成,万福成,马宁.融合多层次特征的中文语义角色标注[J].智能系统学报,2020,15(1):107.[doi:10.11992/tis.201910012]
 WANG Yicheng,WAN Fucheng,MA Ning.Chinese semantic role labeling with multi-level linguistic features[J].CAAI Transactions on Intelligent Systems,2020,15():107.[doi:10.11992/tis.201910012]
[6]毛明毅,吴晨,钟义信,等.加入自注意力机制的BERT命名实体识别模型[J].智能系统学报,2020,15(4):772.[doi:10.11992/tis.202003003]
 MAO Mingyi,WU Chen,ZHONG Yixin,et al.BERT named entity recognition model with self-attention mechanism[J].CAAI Transactions on Intelligent Systems,2020,15():772.[doi:10.11992/tis.202003003]
[7]胡康,何思宇,左敏,等.基于CNN-BLSTM的化妆品违法违规行为分类模型[J].智能系统学报,2021,16(6):1151.[doi:10.11992/tis.202104001]
 HU Kang,HE Siyu,ZUO Min,et al.Classification model for judging illegal and irregular behavior for cosmetics based on CNN-BLSTM[J].CAAI Transactions on Intelligent Systems,2021,16():1151.[doi:10.11992/tis.202104001]
[8]喻波,王志海,孙亚东,等.非结构化文档敏感数据识别与异常行为分析[J].智能系统学报,2021,16(5):932.[doi:10.11992/tis.202104028]
 YU Bo,WANG Zhihai,SUN Yadong,et al.Unstructured document sensitive data identification and abnormal behavior analysis[J].CAAI Transactions on Intelligent Systems,2021,16():932.[doi:10.11992/tis.202104028]
[9]于润羽,杜军平,薛哲,等.面向科技学术会议的命名实体识别研究[J].智能系统学报,2022,17(1):50.[doi:10.11992/tis.202107010]
 YU Runyu,DU Junping,XUE Zhe,et al.Research on named entity recognition for scientific and technological conferences[J].CAAI Transactions on Intelligent Systems,2022,17():50.[doi:10.11992/tis.202107010]
[10]黄河燕,刘啸.面向新领域的事件抽取研究综述[J].智能系统学报,2022,17(1):201.[doi:10.11992/tis.202109045]
 HUANG Heyan,LIU Xiao.A survey on event extraction in new domains[J].CAAI Transactions on Intelligent Systems,2022,17():201.[doi:10.11992/tis.202109045]

备注/Memo

收稿日期:2025-3-24。
基金项目:国家自然科学基金项目(61866032, 61966031);青海省科技厅项目(2019-SF-129);藏文信息处理教育部重点实验室项目(2020-ZJ-Y05).
作者简介:仁青吉,博士研究生,主要研究方向为藏文信息处理和藏语自然语言处理。E-mail:1054808891@qq.com。;才智杰,教授,博士生导师,博士,主要研究方向为藏文信息处理和藏语自然语言处理。发表学术论文64篇。E-mail:Czjqhsd@163.com。
通讯作者:才智杰. E-mail:Czjqhsd@163.com

更新日期/Last Update: 1900-01-01
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com