[1]鉴?? 萍,宗成庆.基于双向标注融合的汉语最长短语识别方法[J].智能系统学报,2009,4(5):406-413.[doi:10.3969/j.issn.1673-4785.2009.05.004]
JIAN Ping,ZONG Cheng-qing.A new approach to identifying Chinese maximal-length phrases using bidirectional labeling[J].CAAI Transactions on Intelligent Systems,2009,4(5):406-413.[doi:10.3969/j.issn.1673-4785.2009.05.004]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
4
期数:
2009年第5期
页码:
406-413
栏目:
学术论文—自然语言处理与理解
出版日期:
2009-10-25
- Title:
-
A new approach to identifying Chinese maximal-length phrases using bidirectional labeling
- 文章编号:
-
1673-4785(2009)05-0406-08
- 作者:
-
鉴?? 萍,宗成庆
-
中国科学院自动化研究所模式识别国家重点实验室,北京100190
- Author(s):
-
JIAN Ping, ZONG Cheng-qing
-
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
-
- 关键词:
-
最长名词短语识别; 介词短语识别; 序列标注; 双向标注; 分歧点
- Keywords:
-
maximal-length noun phrase identification; prepositional phrase identification; sequence labeling; bidirectional labeling; fork position
- 分类号:
-
TP391
- DOI:
-
10.3969/j.issn.1673-4785.2009.05.004
- 文献标志码:
-
A
- 摘要:
-
汉语最长短语(最长名词短语和介词短语)具有显著的语言学特点.采用基于分类器的确定性标注方法进行双向标注,其结果能够显示最长短语识别在汉语句子正(由左至右)反(由右至左)2个方向上的互补性.基于此,利用确定性的双向标注技术来识别汉语最长短语,并提出了一种基于“分歧点”的概率融合策略以融合该双向标注结果.实验表明,这一融合算法能够有效发掘这2个方向的互补特性,从而获得较好的短语识别效果.
- Abstract:
-
Chinese maximal-length phrases (maximal-length noun phrases and prepositional phrases) possess remarkable linguistic properties. Bidirectional labeling results of Chinese maximal-length phrases obtained using sequential classifiers reveal complementary properties in both directions. In this paper, both left-right and right-left sequential labeling were employed to identify the Chinese maximal-length noun phrases and prepositional phrases. Then a novel “fork position” based probabilistic algorithm was developed to fuse the bidirectional results. Experiments were carried out on the Penn Chinese Treebank, a segmented, part-of-speech tagged, and fully bracketed corpus. The results confirmed that the proposed algorithm is able to effectively exploit the complementary strengths of the two directions.
备注/Memo
作者简介:
鉴??? 萍,女,1982年生,博士研究生,主要研究方向为自然语言处理、依存句法分析.
宗成庆,男,1963年生,研究员、博士生导师.中国科学院自动化研究所模式识别国家重点实验室副主任,国际学术期刊 IEEE Intelligent Systems 副主编,清华大学特邀学术顾问和讲座教授,中国科学院研究生院兼职教授,亚洲自然语言处理联合会(AFNLP)执行理事,中国人工智能学会理事及自然语言处理专业委员会副主任,中国中文信息学会理事及机器翻译专业委员会副主任,担任若干国际学术会议的程序委员会主席、委员等职务.主要研究方向为自然语言处理理论与方法、机器翻译、人机对话等技术.作为项目负责人承担国家及国际合作项目10余项,申请国家发明专利多项.发表学术论文70余篇,出版学术专著1部.
更新日期/Last Update:
2009-12-29