[1]刘艳超,郭剑毅,余正涛,等.融合实体特性识别越南语复杂命名实体的混合方法[J].智能系统学报,2016,11(4):503-512.[doi:10.11992/tis.201606009]
 LIU Yanchao,GUO Jianyi,YU Zhengtao,et al.A hybrid method to recognize complex vietnamese named entity incorporating entity properties[J].CAAI Transactions on Intelligent Systems,2016,11(4):503-512.[doi:10.11992/tis.201606009]
点击复制

融合实体特性识别越南语复杂命名实体的混合方法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第11卷
期数:
2016年4期
页码:
503-512
栏目:
出版日期:
2016-07-25

文章信息/Info

Title:
A hybrid method to recognize complex vietnamese named entity incorporating entity properties
作者:
刘艳超1 郭剑毅12 余正涛12 周兰江12 严馨12 陈秀琴3
1. 昆明理工大学 信息工程与自动化学院, 云南 昆明 650500;
2. 昆明理工大学 智能信息处理重点实验室, 云南 昆明 650500;
3. 昆明理工大学 国际教育学院, 云南 昆明 650093
Author(s):
LIU Yanchao1 GUO Jianyi12 YU Zhengtao12 ZHOU Lanjiang12 YAN Xin12 CHEN Xiuqin3
1. School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China;
2. Key Laboratory of Pattern recognition and Intelligent computing of Yunnan College, Kunming 650500, China;
3. The School of International Educaton, Kunming University of Science and Technology, Kunming, 650093, China
关键词:
越南语实体库构建实体识别最大熵规则实体特点
Keywords:
vietnameseentity library constructionentity recognitionmaximum entropyrules setentity characters
分类号:
TP391
DOI:
10.11992/tis.201606009
摘要:
命名实体识别是自然语言处理过程中的基础任务。本文针对越南语的复杂命名实体难识别及F值不够高的问题,提出了一种结合实体库的越南语命名实体识别混合方法。首先,本文根据越南语的语言和实体特点,选取有效的局部特征和全局特征,应用最大熵模型进行越南语命名实体识别;其次,根据本文制定的命名实体的规则进行越南语命名实体识别;然后,结合两者的识别结果,以规则为主,统计为辅原则;最后经过人工校对,把获取到的正确标记的实体加入到实体库,动态扩增实体库,为规则制定和特征选取提供丰富的语料和依据。实验表明,该方法能够有效地结合规则与统计的方法优点,互相弥补不足;明显提高了识别的正确率、召回率和F值。
Abstract:
NER (Named entity recognition) is the basic task in natural language processing. Aimed at the problems of low F values and the difficulty with complex Vietnamese named entity recognition, a hybrid method incorporating entity properties is proposed. Firstly, according to the Vietnamese language and entity characteristics, local and global features were selected and a maximum entropy model built to recognize Vietnamese named entities. Secondly, according to the named entity rules obtained, the Vietnamese entity was recognized. Then, combining the recognition results, this paper uses the rule as the main principle and statistics as the supplementary principle. Finally, the obtained correct entity was added to the entity corpus after manual correction, dynamically expanding the entity corpus, which provided a rich corpus and a basis for determining rules and selecting features. Experimental results show that the method can effectively take advantage of rules and statistics, and that recognition accuracy, recall, and F are all significantly improved.

参考文献/References:

[1] ZHOU Guodong, SU Jian. Named entity recognition using an HMM-based chunk tagger[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA, USA:Association for Computational Linguistics, 2002:473-480.
[2] SUNNY T A, SUNDAR G N. An efficient information extraction model for personal named entity[J]. International journal of computer trends and technology, 2013, 4(3):446-449.
[3] VIRGA P, KHUDANPUR S. Transliteration of proper names in cross-lingual information retrieval[C]//Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-Language Named Entity Recognition-Volume 15. Stroudsburg, PA, USA:Association for Computational Linguistics, 2003:57-64.
[4] 尹凌, 姚天昉, 张冬茉, 等. 一种基于混合分析的汉语文本句法语义分析方法[J]. 中文信息学报, 2002, 16(4):45-51. YIN Ling, YAO Tianfang, ZHANG Dongmo, et al. A hybrid analysis based Chinese text syntactic and semantic analysis method[J]. Journal of Chinese information processing, 2002, 16(4):45-51.
[5] 于江德, 樊孝忠, 庞文博. 事件信息抽取中语义角色标注研究[J]. 计算机科学, 2008, 35(3):155-157.YU Jiangde, FAN Xiaozhong, PANG Wenbo. Research on semantic role labeling for event information extraction[J]. Computer science, 2008, 35(3):155-157.
[6] 于海滨, 秦兵, 刘挺, 等. 命名实体识别和指代消解在文摘系统中的应用[J]. 计算机应用研究, 2006, 23(4):180-182, 195. YU Haibin, QIN Bing, LIU Ting, et al. Application of named entity and coreference resolution to summarization system[J]. Application research of computers, 2006, 23(4):180-182, 195.
[7] LU Yonghe, LIANG Minghui. Answer extraction model based on named entity recognition[J]. Applied mechanics & materials, 2014, 571-572:339-344.
[8] BABYCH B, HARTLEY A. Improving machine translation quality with automatic named entity recognition[C]//Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools:Resources and Tools for Building MT. Stroudsburg, PA, USA:Association for Computational Linguistics, 2003:1-8.
[9] ALFRED R, LEONG L C, ON C K, et al. Malay named entity recognition based on rule-based approach[J]. International journal of machine learning and computing, 2014, 4(3):300-306.
[10] 李楠, 郑荣廷, 吉久明, 等. 基于启发式规则的中文化学物质命名识别研究[J]. 现代图书情报技术, 2010(5):13-17. LI Nan, ZHENG Rongting, JI Jiuming, et al. Research on Chinese chemical name recognition based on heuristic rules[J]. New technology of library and information service, 2010(5):13-17.
[11] ELSEBAI A. A rules based system for named entity recognition in modern standard Arabic[D]. Manchester:University of Salford, 2009.
[12] MORWAL S, JAHAN N, CHOPRA D. Named entity recognition using Hidden Markov Model (HMM)[J]. International journal on natural language computing, 2012, 1(4):15-23.
[13] AHMED I, SATHYARAJ R. Named entity recognition by using maximum entropy[J]. International journal of database theory and application, 2015, 8(2):43-50.
[14] 张玥杰, 徐智婷, 薛向阳. 融合多特征的最大熵汉语命名实体识别模型[J]. 计算机研究与发展, 2008, 45(6):1004-1010. ZHANG Yuejie, XU Zhiting, XUE Xiangyang. Fusion of multiple features for Chinese named entity recognition based on maximum entropy model[J]. Journal of computer research and development, 2008, 45(6):1004-1010.
[15] BENAJIBA Y, DIAB M, ROSSO P. Arabic named entity recognition:an SVM-based approach[J]. IEEE transactions on audio, speech and language processing. special issue on processing morphologically rich languages, 2009, 15(5):926-934.
[16] 潘正高. 基于规则和统计相结合的中文命名实体识别研究[J]. 情报科学, 2012, 30(5):708-712, 786. PAN Zhenggao. Research on the recognition of Chinese named entity based on rules and statistics[J]. Information science, 2012, 30(5):708-712, 786.
[17] 蔡月红, 朱倩, 程显毅. 基于Tri-training半监督学习的中文组织机构名识别[J]. 计算机应用研究, 2010, 27(1):193-195.CAI Yuehong, ZHU Qian, CHENG Xianyi. Chinese organization names recognition with Tri-training learning[J]. Application research of computers, 2010, 27(1):193-195.
[18] BISWAS S, MOHANTY S, MISHRA S P. A Hybrid Oriya named entity recognition system:integrating HMM with MaxEnt[C]//Proceedings of the Second International Conference on Emerging Trends in Engineering & Technology. Nagpur:IEEE, 2009:639-643.
[19] MESELHI M A, BAKR H M A, ZIEDAN I, et al. A novel hybrid approach to Arabic named entity recognition[M]//SHI Xiaodong, CHEN Yidong. Machine Translation. Communications in Computer and Information Science. Berlin Heidelberg:Springer, 2014, 493(1):93-103.
[20] 尹继豪, 樊孝忠, 赵攀超, 等. 基于组块分析技术的中文机构名称识别[J]. 哈尔滨工程大学学报, 2006, 27(S1):466-470. YIN Jihao, FAN Xiaozhong, ZHAO Panchao, et al. Identification of Chinese organization name based on Chinese chunking[J]. Journal of Harbin engineering university, 2006, 27(S1):466-470.
[21] NGUYEN V H, NGUYEN H T, SNASEL V. Named entity recognition in Vietnamese tweets[M]//THAI M T, NGUYEN N P, SHEN Huawei. Computational Social Networks. Switzerland:Springer International Publishing, 2015:205-215.
[22] SAM R C, LE H T, NGUYEN T T, et al. Combining proper name-coreference with conditional random fields for semi-supervised named entity recognition in Vietnamese text[M]//HUANG J Z, CAO Longbing, SRIVASTAVA J. Advances in Knowledge Discovery and Data Mining. Berlin Heidelberg:Springer, 2011:512-524.
[23] 闫丹辉, 毕玉德. 基于规则的越南语命名实体识别研究[J]. 中文信息学报, 2014, 28(5):198-205, 214. YAN Danhui, BI Yude. Rule-based recognition of Vietnamese named entities[J]. Journal of Chinese information processing, 2014, 28(5):198-205, 214.
[24] 潘清清, 周枫, 余正涛, 等. 基于条件随机场的越南语命名实体识别方法[J]. 山东大学学报:理学版, 2014(1):76-79. PAN Qingqing, ZHOU Feng, YU Zhengtao, et al. Recognition method of Vietnamese named entity based on conditional random fields[J]. Journal of Shandong university:natural science, 2014(1):76-79.

备注/Memo

备注/Memo:
收稿日期:2014-04-01。
作者简介:刘艳超,男,1990年生,硕士研究生,主要研究方向为自然语言处理与信息抽取;郭剑毅,女,1964年生,教授,硕士生导师,主要研究方向为自然语言处理、信息抽取、机器学习。主持并参与了国家自然科学基金、云南省信息技术重大专项基金、云南省自然科学基金多项,获得云南省科技进步一等奖1项、云南省自然科学二等奖1项。发表学术论文70余篇,主编教材2部;余正涛,男,1970年生,教授,博士生导师,博士,主要研究方向为自然语言处理、信息检索、机器学习。以排名第一获得云南省科技进步一等奖、云南省自然科学二等奖、云南省科技进步三等奖各1项。发表学术论文150余篇,被SCI、EI检索80余篇。
通讯作者:郭剑毅.E-mail:gjade86@hotmail.com.
更新日期/Last Update: 1900-01-01