[1]陈晓峰,王士同,曹苏群.半监督多标记学习的基因功能分析[J].智能系统学报,2008,3(1):83-90.
CHEN Xiao-feng,WANG Shi-tong,CAO Su-qun.Gene function analysis of semisupervised multilabel learning[J].CAAI Transactions on Intelligent Systems,2008,3(1):83-90.
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
3
期数:
2008年第1期
页码:
83-90
栏目:
学术论文—机器学习
出版日期:
2008-02-25
- Title:
-
Gene function analysis of semisupervised multilabel learning
- 文章编号:
-
1673-4785(2008)01-0083-08
- 作者:
-
陈晓峰1,王士同1,曹苏群1,2
-
1.江南大学信息工程学院,江苏无锡214122;
2.淮阴工学院机械系,江苏淮安223001
- Author(s):
-
CHEN Xiao-feng1, WANG Shi-tong1, CAO Su-qun1,2
-
1.School of Information Technology, Jiangnan University, Wuxi 214122 , Ch ina;
?2.Department of Mechanical Engineering, Huaiyin Institute of Technology, H uai’an 223001,China
-
- 关键词:
-
半监督; 多标记; 自训练; 支持向量机
- Keywords:
-
semisupervised; multilabel; selftraining; support vector machine
- 分类号:
-
TP181
- 文献标志码:
-
A
- 摘要:
-
传统的机器学习主要解决单标记学习,即一个样本仅有一个标记.在生物信息学中,一个基因通常至少具有一个功能,即至少具有一个标记,与传统学习方法相比,多标记学习能更有效地识别生物相关基因组的功能.目前的研究主要集中在监督多标记学习算法.然而,研究半监督多标记学习算法,从已标记和未标记的基因表达数据中学习,仍然是未解决问题.提出一种有效的基因功能分析的半监督多标记学习算法SML_SVM.首先,SML_SVM根据PT 4方法,将半监督多标记学习问题转化为半监督单标记学习问题,然后根据最大后验概率原则(MAP)和K近邻方法估计未标记样本的标记,最后,用SVM求解单标记学习问题.在yeast 基因数据和genbase蛋白质数据上的实验表明,SML_SVM性能比基于PT4方法的MLSVM和自训练 MLSVM更优
- Abstract:
-
Conventional machine learning is used only for single label learning, implying that every sample has only one label. However, in bioinformatics, a gen e has more than one function, so it needs more than one label. Therefore, multi label learning is more effective for identifying gene groups than conventional l earning approach. Current research mainly focuses on supervised multilabel lea r ning. The problem of effective semisupervised multilabel learning strategies f or labeled examples and unlabeled examples of gene expression datasets still rem ains unsolved. In this paper, a semisupervised multilabel learning algorithm , named SML_SVM, is presented as an effective multilabel learner for analysis of gene expressions with at least one function. First, the proposed SML_SVM algorit hm transforms the semisupervised multilabel learning into corresp ond ing semisupervised singlelabel learning by the PT4 method, then it labels un la beled examples using the maximum a posteriori (MAP) principle in combination wit h the Knearest neighbor method, and finally, it solves the corresponding singl e label learning problem using SVM. The distinctive characteristic of the propos e d algorithm is its efficient integration of SVMbased singlelabel learning wi th MAP and Knearest neighbor methods. Experimental results with a real Yeast gen e expression dataset and a Genbase protein dataset show that the proposed SML_S VM algorithm outperforms the PT4based MLSVM method and selftraining MLSVM.
备注/Memo
收稿日期:2007-04-13.
基金项目:
国家“863”基金资助项目(2006AA10Z313);
国家自然科学基金资助项目(6077320 6/F020106,60704047/F030304);国防应用基础研究基金资助项目(A1420461266);
教育部跨世纪优秀人才支持计划基金资助项目(NCET040496);
教育部科学研究重点基金资助项目(105087).
作者简介:
陈晓峰,男,1977年生,博士研究生,主要研究方向为机器学习、模式识别.
王士同,男,1964年生,教授,博士生导师,主要研究方向为模糊人工智能、模式识别、图像处理和生物信息学等,先后十多次留学英国、日本和香港地区,在国内外重要杂志上发表学术论文数十篇.
曹苏群,男,1976年生,博士研究生,主要研究方向为模式识别,图像处理、软件工程等.
通讯作者:王士同.wxwangst@yahoo.com.cn
更新日期/Last Update:
2009-05-10