[1]朱叶芬,线岩团,余正涛,等.基于局部Transformer的泰语分词和词性标注联合模型[J].智能系统学报,2024,19(2):401-410.[doi:10.11992/tis.202209034]
 ZHU Yefen,XIAN Yantuan,YU Zhengtao,et al.Joint model for Thai word segmentation and part-of-speech tagging via a local Transformer[J].CAAI Transactions on Intelligent Systems,2024,19(2):401-410.[doi:10.11992/tis.202209034]
点击复制

基于局部Transformer的泰语分词和词性标注联合模型

参考文献/References:
[1] JOUSIMO J, LAOKULRAT N, CARR B, et al. Thai word segmentation with bi-directional RNN [EB/OL]. (2019-10-03)[2023-11-14]. https://github.com/sertiscorp.
[2] KITTINARADORN R, TITIPAT A, CHAOVAVANICH K, et al. DeepCut: A Thai word tok enization library using Deep Neural Network [EB/OL]. (2019-11-11) [2023-11-14]. http://doi.org/10.5281/zenodo.345770, accessed on.
[3] CHORMAI P, PRASERTSOM P, RUTHERFORD A. AttaCut: a fast and accurate neural Thai word segmenter [EB/OL]. (2019-12-16) [2023-11-14]. https://arxiv.org/abs/1911.07056.
[4] SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE transactions on signal processing, 1997, 45(11): 2673–2681.
[5] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84–90.
[6] DONG Chuanhai, ZHANG Jiajun, ZONG Chengqing, et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[M]//Natural Language Understanding and Intelligent Applications. Cham: Springer International Publishing, 2016: 239-250.
[7] DEVLIN J, CHANG Mingwei, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. (2018-11-11) [2023-11-14]. https://arxiv.org/abs/1810.04805.pdf.
[8] LAFFERTY J, MCCALLUM A, PEREIRA F. Conditional random fields: probabilistic models for segmenting and labeling sequence data [C]//Proceedings of the 18th Eighteenth International Conference on Machine Learning. Williamstown: ICML, 2001: 282–289.
[9] LIU Yinhan, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach [EB/OL]. (2019-07-26) [2023-11-14]. https://arxiv.org/abs/1907.11692.
[10] HONG T, KIM D, JI M, et al. BROS: a pre-trained language model focusing on text and layout for better key information extraction from documents[C]//Proceedings of the AAAI Conference on Artificial Intelligence. [S.l.]: AAAI, 2022: 10767-10775.
[11] ZHANG Taolin, WANG Chengyu, HU Nan, et al. DKPLM: decomposable knowledge-enhanced pre-trained language model for natural language understanding[J]. Proceedings of the AAAI Conference on Artificial Intelligence. [S.l.]: AAAI, 2022: 11703-11711.
[12] LOWPHANSIRIKUL L, POLPANUMAS C, JANTRAKULCHAI N, et al. WangchanBERTa: pretraining transformer-based Thai language models [EB/OL]. (2021-05-20) [2023-11-14]. https://arxiv.org/abs/2101.09635.
[13] NG H, LOW J K. Chinese part-of-speech tagging: one-at-a-time or all-at-once? word-based or character-based? [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Barcelona: EMNLP, 2004: 277-284.
[14] S?GAARD A, GOLDBERG Y. Deep multi-task learning with low level tasks supervised at lower layers[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics . Berlin: Association for Computational Linguistics, 2016: 231-235.
[15] JIANG Wenbin, HUANG Liang, LIU Qun, et al. A cascaded linear model for joint Chinese word segmentation and part-of-speech tagging[C]// Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. Columbus: [s.n.], 2008: 897-904.
[16] SUN Weiwei. A stacked sub-word model for joint Chinese word segmentation and Part-of-Speech tagging[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Portland: [s.n.], 2011: 1385-1394.
[17] ZENG Xiaodong, WONG D F, CHAO L S, et al. Graph-based semi-supervised model for joint Chinese word segmentation and part-of-speech tagging[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia: [s.n.], 2013: 770-779.
[18] 潘华山, 严馨, 周枫, 等. 基于层叠条件随机场的高棉语分词及词性标注方法[J]. 中文信息学报, 2016, 30(4): 110–116
PAN Huashan, YAN Xin, ZHOU Feng, et al. A Khmer word segmentation and part-of-speech tagging method based on cascaded conditional random fields[J]. Journal of Chinese information processing, 2016, 30(4): 110–116
[19] TIAN Yuanhe, SONG Yan, AO Xiang, et al. Joint Chinese word segmentation and part-of-speech tagging via two-way attentions of auto-analyzed knowledge[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Seattle: [s.n.], 2020: 8286-8296.
[20] BUOY R, TAING N, KOR S. Joint Khmer word segmentation and part-of-speech tagging using deep learning[EB/OL]. (2021-03-31)[2022-01-01]. https://arxiv.org/abs/2103.16801.pdf.
[21] LI Y, LI Xiaomin, WANG Yiru, et al. Character-based joint word segmentation and part-of-speech tagging for Tibetan based on deep learning[J]. Transactions on Asian and low-resource language information processing, 2022: 2375-4699.
[22] YUAN Lichi. A joint method for Chinese word segmentation and part-of-speech labeling based on deep neural network[J]. Soft Computing, 2022, 26(12): 5607–5616.
[23] 林颂凯, 毛存礼, 余正涛, 等. 基于卷积神经网络的缅甸语分词方法[J]. 中文信息学报, 2018, 32(6): 62–70,79
LIN Songkai, MAO Cunli, YU Zhengtao, et al. A method of Myanmar word segmentation based on convolution neural network[J]. Journal of Chinese information processing, 2018, 32(6): 62–70,79
[24] XIANG Yan, XU Ying, YU Zhengtao, et al. CNN-based text multi-classifier using filters initialised by N-gram vector[J]. International journal of information and communication technology, 2019, 15(4): 419.
[25] 郭振, 张玉洁, 苏晨, 等. 基于字符的中文分词、词性标注和依存句法分析联合模型[J]. 中文信息学报, 2014, 28(6): 1–8
GUO Zhen, ZHANG Yujie, SU Chen, et al. Character-level dependency model for joint word segmentation, POS tagging, and dependency parsing in Chinese[J]. Journal of Chinese information processing, 2014, 28(6): 1–8
[26] 刘一佳, 车万翔, 刘挺, 等. 基于序列标注的中文分词、词性标注模型比较分析[J]. 中文信息学报, 2013, 27(4): 30–36
LIU Yijia, CHE Wanxiang, LIU Ting, et al. A comparison study of sequence labeling methods for Chinese word segmentation, POS tagging models[J]. Journal of Chinese information processing, 2013, 27(4): 30–36
[27] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need [EB/OL]. (2017-06-12) [2023-01-01]. https://arxiv.org/abs/1706.03762.
[28] PHATTHIYAPHAIBUN W, CHAOVAVANICH K, POLPANUMAS C, et al. Pythainlp: Thai natural language processing in python [EB/OL]. (2022-07-03) [2023-01-01] . https://github.com/PyThaiNLP/pythainlp.
[29] UDOMCHAROENCHAIKIT C, BOONKWAN P, VATEEKUL P. Adversarial evaluation of robust neural sequential tagging methods for Thai language[J]. Transactions on Asian and low-resource language information processing. 2020: 1-25.
[30] KINGMA D P, BA J. Adam: a method for stochastic optimization [EB/OL]. (2014-12-24) [2023-11-14]. https://arxiv.org/abs/1412.6980.

备注/Memo

收稿日期:2022-09-16。
基金项目:国家自然科学基金项目(62266028);云南省重大科技专项计划(202002AD080001)
作者简介:朱叶芬,硕士研究生,主要研究方向为自然语言处理、词法分析。E-mail:846415516@qq.com;线岩团,副教授,主要研究方向为自然语言处理、信息抽取。主持和参与国家自然基金项目和云南省自然科学基金项目及其他纵向课题 10 项,主持横向课题 2 余项,获专利授权和软件著作权 10 余项。发表学术论文 20 余篇。E-mail:xianyt@kust.edu.cn;余正涛,教授,主要研究方向为自然语言处理、信息检索、机器翻译、机器学习。主持和参与国家自然基金项目和云南省自然科学基金项目及其他纵向课题 30 项,主持横向课题 20 余项,获专利授权和软件著作权 50 余项。发表学术论文 80 余篇。E-mail: ztyu@hotmail.com
通讯作者:线岩团. E-mail:xianyt@kust.edu.cn

更新日期/Last Update: 1900-01-01
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com