[1]LIU Yanchao,GUO Jianyi,YU Zhengtao,et al.A hybrid method to recognize complex vietnamese named entity incorporating entity properties[J].CAAI Transactions on Intelligent Systems,2016,11(4):503-512.[doi:10.11992/tis.201606009]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
11
Number of periods:
2016 4
Page number:
503-512
Column:
学术论文—知识工程
Public date:
2016-07-25
- Title:
-
A hybrid method to recognize complex vietnamese named entity incorporating entity properties
- Author(s):
-
LIU Yanchao1; GUO Jianyi1; 2; YU Zhengtao1; 2; ZHOU Lanjiang1; 2; YAN Xin1; 2; CHEN Xiuqin3
-
1. School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China;
2. Key Laboratory of Pattern recognition and Intelligent computing of Yunnan College, Kunming 650500, China;
3. The School of International Educaton, Kunming University of Science and Technology, Kunming, 650093, China
-
- Keywords:
-
vietnamese; entity library construction; entity recognition; maximum entropy; rules set; entity characters
- CLC:
-
TP391
- DOI:
-
10.11992/tis.201606009
- Abstract:
-
NER (Named entity recognition) is the basic task in natural language processing. Aimed at the problems of low F values and the difficulty with complex Vietnamese named entity recognition, a hybrid method incorporating entity properties is proposed. Firstly, according to the Vietnamese language and entity characteristics, local and global features were selected and a maximum entropy model built to recognize Vietnamese named entities. Secondly, according to the named entity rules obtained, the Vietnamese entity was recognized. Then, combining the recognition results, this paper uses the rule as the main principle and statistics as the supplementary principle. Finally, the obtained correct entity was added to the entity corpus after manual correction, dynamically expanding the entity corpus, which provided a rich corpus and a basis for determining rules and selecting features. Experimental results show that the method can effectively take advantage of rules and statistics, and that recognition accuracy, recall, and F are all significantly improved.