[1]杨志豪,洪 莉,林鸿飞,等.基于支持向量机的生物医学文献蛋白质关系抽取[J].智能系统学报,2008,3(04):361-369.
 YANG Zhi-hao,HONG L i,L IN Hong-fei,et al.Extraction of information on prote in2prote in interaction from biomedical literatures using an SVM[J].CAAI Transactions on Intelligent Systems,2008,3(04):361-369.
点击复制

基于支持向量机的生物医学文献蛋白质关系抽取(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第3卷
期数:
2008年04期
页码:
361-369
栏目:
出版日期:
2008-08-25

文章信息/Info

Title:
Extraction of information on prote in2prote in interaction from biomedical literatures using an SVM
文章编号:
1673-4785 (2008) 04-0361-09
作者:
杨志豪1 洪 莉2 林鸿飞1 李彦鹏1
1. 大连理工大学电子与信息工程学院,辽宁大连116024;
2. 朝阳师范高等专科学校数学计算机系,辽宁朝阳122000
Author(s):
YANG Zhi-hao1 HONG L i2 L IN Hong-fei1 L I Yan-peng1
1. College of Electronic and Information Engineering, Dalian University of Technology, Dalian 116024, China;
2. Department ofMath2 ematics and Computer, Chaoyang Teachers College, Chaoyang 122000, China
关键词:
关系抽取链接语法支持向量机
Keywords:
interaction extraction link grammar support vectormachine ( SVM)
分类号:
TP391
文献标志码:
A
摘要:
从生物医学文献中抽取蛋白质(基因)交互作用关系对蛋白质知识网络的建立、蛋白质关系的预测以及新药的研制等均具有重要的意义. 提出了一种基于支持向量机( SVM)的蛋白质(基因)交互作用关系抽取方法. 该方法除了选取词项特征、关键词特征、实体距离特征、链接特征外,还利用链接语法分析方法可以获得较高准确率的特性, 引入链接语法分析方法抽取结果特征. 实验结果表明,该方法的召回率性能与使用同一测试语料的其他系统相比具有明显的优势,综合分类率F指标也高于其他系统.
Abstract:
Automated extraction of p rotein2p rotein interaction information from biomedical literature is helpful when building a p rotein knowledge network, p redicting p rotein functions and designing new drugs. This paper p resents a method for p rotein2p rotein interaction extraction from biomedical literature using a support vectormachine ( SVM). In thismethod, besides common index parameters such asword features, keyword features, entity distance features and link path features, a link grammar extraction feature is used to imp rove p recision when identifying p rotein2p ro2 tein interactions. Experimental results indicated that the recall rate and the F2score of thismethod are much higher than that of other extraction systems for the same dataset

参考文献/References:

[ 1 ] PUSTEJOVSKY J, CASTANO, ZHANG J. Robust relation2 al parsing over biomedical literature: extracting inhibit rela2 tions[ C ] / / Proceedings of the Seventh Pacific Symposium on Bio2Computing. [ S. l. ] , 2002: 3622373.
 [ 2 ]LEROY G, CHEN H, MARTINEZ J D. A shallow parser based on closed2class words to cap ture relations in biomed2 ical text [ J ]. Journal of Biomedical Informatics, 2003, 36 (3) : 1452158.
[ 3 ] PARK J C, KIM H S, KIM J J. Bidirectional incremental parsing for automatic pathway identification with combinatory categorical grammar[C ] / / Proceedings of the Pacific Sym2 posium on Bio2Computing. Hawaii, USA, 2001: 3962407.
 [ 4 ] TEMKIN J M, GILDER M R. Extraction of p rotein interac2 tion information from unstructured text using a context2free grammar[ J ]. Bioinformatics, 2003, 19: 204622053.
 [ 5 ]AHMED S T, CH INDAMBARAM D, DAVULCU H, et al. IntEx: a syntactic role driven p rotein2p rotein interaction ex2 tractor for bio2medical text [ C ] / / Proceeding of the ACL2 ISMBWorkshop on L inkingBiologicalL iterature, Ontologies and Databases: Mining Biological Semantics. Detroit, Michigan, USA, 2005: 54261.
[ 6 ]ONO T, H ISH IGAKI H, TAN IGAM Ii A, et al. Automatic extraction of information on p rotein2p rotein interactions from the biological literature [ J ]. Bioinformatics, 2001, 17 (2) : 1552161.
 [ 7 ]HUANGM L, ZHU X Y, HAO Y, et al. Discovering pat2 terns to extract p rotein2p rotein interactions from full texts [ J ]. Bioinformatics, 2004, 20 (18) : 360423612.
 [ 8 ]DAV ID C, BEMARD B, W ILL IAM L, et al. BioRAT: ex2 tracting biological information from full2length papers [ J ]. Bioinformatics, 2004, 20 (17) : 320623213.
 [ 9 ]ANDRADE M A, VALEN ICA A. Automatic extraction of keywords from scientific text: app lication to the knowledge domain of p rotein families [ J ]. Bioinformatic, 1998, 14 (7) : 6002607.
 [ 10 ]CRAVEN M, KUML IEN J. Constructing biological knowl2 edge bases by extracting information from text sources [C ] / / Proceedings of the 7 th International Conference on Intelligent Systems for Molecular Biology. Heidelberg, Germany, 1999: 77286.
[ 11 ] STAPLEY B, BENO IT G. Biobibliometrics: information retrieval and visualization from co2occurrences of gene names in medline abstracts [ C ] / / Proceedings of the Pa2 cific Symposium on Biocomputing. [ S. l. ] , 2000: 5292 540.
[ 12 ]JENSSEN T K, LAEGREID A, KOMOROWSKII J , et al. A literature network of human genes for high2throughput a2 nalysis of gene exp ression [ J ]. Nature Genetics, 2001, 28 (1) : 21228.
[ 13 ]MARCOTTE E M, XENAR IOS I, EISENBERG D, et al. Mining literature for p rotein2p rotein interactions [ J ]. Bioinformatics, 2001, 17 (4) : 3592363.
[ 14 ]BLASCHKE C, VALENCIA A. Can bibliographic pointers for known biological data be found automatically? Protein interactions as a case study [ J ]. Comparative and Func2 tional Genomics, 2001 (2) : 1962206.
[ 15 ]LUKASZ S, CHR ISTOPHER SM, ADAM J S, et al. The database of interacting p roteins: 2004 update[ J ]. Nucleic Acids Research, 2004, 32 (1) : 4492451.
[ 16 ]YANG Zhihao, L IN Hongfei, WU Baodong. BioPP IExtrac2 tor: a p rotein2p rotein interaction extraction system for bio2 medical literature [ J ]. Expert Systems with App lications, 2007 (12) : 14219.
 [ 17 ]D ING J, BERLEANT D, NETTETON D, et al. Mining MEDL INE: abstracts, sentences, or phrases? [C ] / / Pro2 ceedings of the Pacific Symposium on Biocomputing. Ha2 waii, USA, 2002: 326237.
 [ 18 ]王厚峰. 指代消解的基本方法和实现技术[ J ]. 中文信息学报, 2002, 16 (6) : 9217. WANG Houfeng. Survey: computational models and tech2 nologies in anaphora resolution[ J ]. Journal of Chinese In2 formation Processing, 2002, 16 (6) : 9217.
 [ 19 ]TSURUOKA Y, TATEISH Ii Y, KIM J D, et al. Develo2 p ing a robust part2of2speech tagger for biomedical text [C ] / / Proceedings of Advances in Informatics210 th Pan2 hellenic Conference on Informatics. Volos, Greece, 2005: 3822392.
 [ 20 ]YANG Zhihao, L IN Hongfei, L I Yanpeng. Exp loiting the contextual cues for bio2entity name recognition in biomedic2 al literature [ J ]. Journal of Biomedical Informatics, 2008 (1) : 36242.
 [ 21 ] VAPN IK V N. The nature of statistical learning theory [M ]. New York: Sp ringer2Verlag, 1995.
 [ 22 ]阎 辉,张学工,李衍达. 应用SVM方法进行沉积微相识别[ J ]. 物探化探计算技术, 2000, 22 (2) : 1582164.
YAN Hui, ZHANG Xuegong, L I Yanda. Support vector machine methods in pattern recognition of sedimentary faci2 es [ J ]. Computing Techniques for Giophysical and Geochenical Exp loration, 2000, 22 (2) : 1582164.
 [ 23 ]张学工. 关于统计学习理论与支持向量机[ J ]. 自动化学报, 2000, 26 (1) : 32242.
ZHANG Xuegong. Introduction to statistical learning theory and support vectormachines[ J ]. Acta Automatica Sinica, 2000, 26 (1) : 32242.
 [ 24 ]李 凯,郭子雪. 一种基于SVM的函数模拟方法[ J ]. 微机发展, 2001 (3) : 526.
 L I Kai, GUO Zixue. A function simulation based on sup2 port vector machine [ J ]. Microcomputer Development, 2001 (3) : 526.
[ 25 ]马云潜,张学工. 支持向量机函数拟合在分形插值中的应用[J ]. 清华大学学报, 2000, 40 (3) : 76278.
MA Yunqian, ZHANG Xuegong. App lication of support vector machines function regression in fractal interpolation [ J ]. Journal of Tsinghua University, 2000, 40 (3) : 76278.
[ 26 ]MU¨LLER K R, SMOLA A J , RATSCH G, et al. Predic2 ting time series with support vector machines [ C ] / / Pro2 ceedings of the 7 th International Conference on Artificial NeuralNetworks. Lausanne, Switzerland, 1997.
 [ 27 ]BURGES C J C. A tutorial on support vectormachines for pattern recognition [ J ]. Data Mining and Knowledge Dis2 covery, 1998, 2 (2) : 1212167.
 [ 28 ] SLEATOR D, TEMPERLEY D. Parsing English with a link grammar [ C ] / / Proceedings of Third International Workshop on Parsing Technologies. Tilburg, Netherlands, 1993.

相似文献/References:

[1]贾真,何大可,杨燕,等.基于弱监督学习的中文网络百科关系抽取[J].智能系统学报,2015,10(01):113.[doi:10.10.3969/j.issn.1673-4785.201311017]
 JIA Zhen,HE Dake,YANG Yan,et al.Relation extraction from Chinese online encyclopedia based on weakly supervised learnin[J].CAAI Transactions on Intelligent Systems,2015,10(04):113.[doi:10.10.3969/j.issn.1673-4785.201311017]

备注/Memo

备注/Memo:
收稿日期: 2008-05-07.
 基金项目:国家自然科学基金资助项目( 60373095, 60673039) ;国家 “863”高科技计划资助项目(2006AA01Z151) .
作者简介:
杨志豪,男, 1973年生,讲师,主要研究方向为文本挖掘和中文信息处理, 发表学术论文20 余篇.
洪 莉,女, 1962年生,副教授,主要研究方向为智能信息处理.
林鸿飞,男, 1962年生,教授,博士生导师,主要研究方向为搜索引擎、文本挖掘、情感计算、中文信息处理以及商业智能的研究. 主持2项国家自然科学基金和1 项国家863 高科技计划研究项目. 发表学术论文百余篇.
通信作者:杨志豪. E-mail: Yangzh@dlut. edu. cn.
更新日期/Last Update: 2009-05-18