[1]陈晓峰,王士同,曹苏群.半监督多标记学习的基因功能分析[J].智能系统学报,2008,3(01):83-90.
 CHEN Xiao-feng,WANG Shi-tong,CAO Su-qun.Gene function analysis of semisupervised multilabel learning[J].CAAI Transactions on Intelligent Systems,2008,3(01):83-90.
点击复制

半监督多标记学习的基因功能分析(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第3卷
期数:
2008年01期
页码:
83-90
栏目:
出版日期:
2008-02-25

文章信息/Info

Title:
Gene function analysis of semisupervised multilabel learning
文章编号:
1673-4785(2008)01-0083-08
作者:
陈晓峰1王士同1曹苏群12
1.江南大学信息工程学院,江苏无锡214122;
2.淮阴工学院机械系,江苏淮安223001
Author(s):
CHEN Xiao-feng1 WANG Shi-tong1 CAO Su-qun12
1.School of Information Technology, Jiangnan University, Wuxi 214122 , Ch ina;
 2.Department of Mechanical Engineering, Huaiyin Institute of Technology, H uai’an 223001,China
关键词:
半监督多标记自训练支持向量机
Keywords:
semisupervised multilabel selftraining support vector machine
分类号:
TP181
文献标志码:
A
摘要:
传统的机器学习主要解决单标记学习,即一个样本仅有一个标记.在生物信息学中,一个基因通常至少具有一个功能,即至少具有一个标记,与传统学习方法相比,多标记学习能更有效地识别生物相关基因组的功能.目前的研究主要集中在监督多标记学习算法.然而,研究半监督多标记学习算法,从已标记和未标记的基因表达数据中学习,仍然是未解决问题.提出一种有效的基因功能分析的半监督多标记学习算法SML_SVM.首先,SML_SVM根据PT 4方法,将半监督多标记学习问题转化为半监督单标记学习问题,然后根据最大后验概率原则(MAP)和K近邻方法估计未标记样本的标记,最后,用SVM求解单标记学习问题.在yeast 基因数据和genbase蛋白质数据上的实验表明,SML_SVM性能比基于PT4方法的MLSVM和自训练 MLSVM更优
Abstract:
Conventional machine learning is used only for single label learning, implying that every sample has only one label. However, in bioinformatics, a gen e has more than one function, so it needs more than one label. Therefore, multi  label learning is more effective for identifying gene groups than conventional l earning approach. Current research mainly focuses on supervised multilabel lea r ning. The problem of effective semisupervised multilabel learning strategies f or labeled examples and unlabeled examples of gene expression datasets still rem ains unsolved. In this paper, a semisupervised multilabel learning algorithm , named SML_SVM, is presented as an effective multilabel learner for analysis of gene expressions with at least one function. First, the proposed SML_SVM algorit hm transforms the semisupervised multilabel learning into corresp ond ing semisupervised singlelabel learning by the PT4 method, then it labels un la beled examples using the maximum a posteriori (MAP) principle in combination wit h the Knearest neighbor method, and finally, it solves the corresponding singl e label learning problem using SVM. The distinctive characteristic of the propos e d algorithm is its efficient integration of SVMbased singlelabel learning wi th MAP and Knearest neighbor methods. Experimental results with a real Yeast gen e expression dataset and a Genbase protein dataset show that the proposed SML_S VM algorithm outperforms the PT4based MLSVM method and selftraining MLSVM.

参考文献/References:

[1]EISEN M B, SPELLMAN P T, BROWN P O, et al. Cluster analysis and dis p lay of genomewide expression patterns[C]// Proceedings of the National Acad em y of Science of the United States of America. Washington,D.C,USA, 1998.
[2]TAMAYO P, SLONIM D, MESIROV J, et al. Interpreting patterns of gene expres sion with selforganizing maps[C]// Proceedings of the National Academy of S ciences of the United States of America. Washington,D.C,USA, 1999.
[3]WU S, LIEW A W C, YAN H, et al. Cluster analysis of gene expression data b ased on selfsplitting and merging competitive learning[J]. IEEE Transactions on Information Technology in Biomedicine, 2004, 8(1):515.
[4]MCCALLUM A K. Multilabel text classification with a mixture model trained by EM[C]// Working Notes of the AAAI’99 Workshop on Text Learning. Orl ando,USA,1999.
[5]SCHAPIRE R E, SINGER Y. Boostexter: a boostingbased system for text categ orization[J]. Machine Learning, 2000, 39(23):135168.
[6]ELISSEEFF A, WESTON J. A kernel method for multilabeled classification[C ]// Advances in Neural Information Processing Systems 14. Cambridge: MI T Press,2002.
[7]BOUTELL M R, LUO J, SHEN X, et al. Learning multilabel scene classifica tion[J]. Pattern Recognition, 2004, 37(9): 17571771.
[8]OGIHARA LI T M. Detecting emotion in music[C]// Proceedings of the Inter national Symposium on Music Information Retrieval. Maryland, USA: ISMIR Pre ss,2003.
[9]ZHU X J. Semisupervised learning literature survey[R]. Department of Computer Sciences, University of Wisconsin, Madison, 2005.
[10]ZHANG M L, ZHOU Z H. MLKNN: a lazy learning approach to multil abel lea rning[J]. Pattern Recognition, 2007, 40(7): 20382048.
[11]TSOUMAKAS G, KATAKIS I. Multilabel classification: an overview[J]. Int ernational Journal of Data Warehousing and Mining, 2007, 3(3):113.
[12]CLARE A, KING R D. Knowledge discovery in multilabel phenotype data[C] // Proceedings of the 5th European Conference on Principles of Data Mining and Kn owledge Discovery (PKDD 2001). Freiberg, Germany: Springer, 2001.
[13]LUO X, ZINCIR H. Evaluation of two systems on multiclass multilabel do cument classification[C]// Lecture Notes in Computer Science. Freiberg,Germany : Springer,2005.
[14]GODBOLE S, SARAWAGI S. Discriminative methods for multilabeled classific ation[C]// Lecture Notes in Computer Science. Germany: Springer,2004.
[15]ZHOU Z H, ZHANG M L. Multiinstance multilabel learning with applicat ion to scene classification[C]// Advances in Neural Information Processing Sy stems.Cambridge: MIT Press,2007.
[16]ZHANG M L, ZHOU Z H. Multilabel neural networks with applications to func tional genomics and text categorization[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(10): 13381351.
[17]施彤年, 卢忠良, 荣  融,等.多类多标签汉语文本自动分类的研究[J].情报学报, 2003, 22(3): 306309.
 SHI Tongnian, LU Zhongliang, RONG Rong ,et al. Research on the Chinese text c ategorization of multiclassification and multilabel[J]. Jou rnal of the China Society for Scientific and Technical Information, 2003, 22(3): 306309.
[18]LIU Y, JIN R, LIU Y. Semisupervised multilabel learning by cons trained nonnegative matrix factorization[C]// Proceeding of the TwentyFir st National Conference on Artificial Intelligence, Eighteenth Conference on Innova tive Applications of Artificial Intelligence. Boston: AAAI Press, 2006.
[19]宫秀军, 史忠植. 基于Bayes潜在语义模型的半监督Web挖掘[J]. 软件学报, 2002, 12(8):15081514.
 GONG Xiujun, SHI Zhongzhi. Semisupervised web mining based on bayes late nt sem antic model[J]. Journal of Software, 2002, 12(8): 15081514.
[20]彭 雅, 林亚平, 陈治平. TFIDF_NB协同训练算法[J]. 小型微型计算机, 2004, 2 5(12): 22432246.
PENG Ya, LIN Yaping, CHEN Zhiping. TFIDFNB cooperative train ing algorithm[J]. Minimicro Systems, 2004, 25(12): 22432246.
[21]KLAUS B, JOHANNS F, EYKE H. A unified model for multilabel classifi cation and ranking[C]// Proceeding of the 15th Eureopean Conference on Artifi ci al Intelligence. Riva del Garda, Italy: IOS Press, 2006.
[22]PAVLIDIS P, WESTON J, CAI J, et al. Combining microarray expressio n data and phylogenetic protellles to learn functional categories using support vector machines[R].CUCS011000, Department of Computer Sc ience , Columbia University, Columbia, 2000.
[23]DIPLARIS S, TSOUMAKAS G, MITKAS P, et al. Protein classification w ith multiple algorithms[C]// Lecture Notes in Computer Science.Volo s, Greece: Springer, 2005.

相似文献/References:

[1]王跃,杨燕,王红军.一种基于少量标签的改进迁移模糊聚类[J].智能系统学报,2016,11(3):310.[doi:10.11992/tis.201603046]
 WANG Yue,YANG Yan,WANG Hongjun.An improved transfer fuzzy clustering with few labels[J].CAAI Transactions on Intelligent Systems,2016,11(01):310.[doi:10.11992/tis.201603046]
[2]郭雨萌,李国正.一种多标记数据的过滤式特征选择框架[J].智能系统学报,2014,9(03):292.[doi:10.3969/j.issn.1673-4785.201403064]
 GUO Yumeng,LI Guozheng.A filtering framework for the multi-label feature selection[J].CAAI Transactions on Intelligent Systems,2014,9(01):292.[doi:10.3969/j.issn.1673-4785.201403064]
[3]邵东恒,杨文元,赵红.应用k-means算法实现标记分布学习[J].智能系统学报,2017,12(03):325.[doi:10.11992/tis.201704024]
 SHAO Dongheng,YANG Wenyuan,ZHAO Hong.Label distribution learning based on k-means algorithm[J].CAAI Transactions on Intelligent Systems,2017,12(01):325.[doi:10.11992/tis.201704024]
[4]卞则康,王士同.基于混合距离学习的鲁棒的模糊C均值聚类算法[J].智能系统学报,2017,12(04):450.[doi:10.11992/tis.201607019]
 BIAN Zekang,WANG Shitong.Robust FCM clustering algorithm based on hybrid-distance learning[J].CAAI Transactions on Intelligent Systems,2017,12(01):450.[doi:10.11992/tis.201607019]
[5]闵帆,王宏杰,刘福伦,等.SUCE:基于聚类集成的半监督二分类方法[J].智能系统学报,2018,13(06):974.[doi:10.11992/tis.201711027]
 MIN Fan,WANG Hongjie,LIU Fulun,et al.SUCE: semi-supervised binary classification based on clustering ensemble[J].CAAI Transactions on Intelligent Systems,2018,13(01):974.[doi:10.11992/tis.201711027]
[6]曲昭伟,吴春叶,王晓茹.半监督自训练的方面提取[J].智能系统学报,2019,14(04):635.[doi:10.11992/tis.201806006]
 QU Zhaowei,WU Chunye,WANG Xiaoru.Aspects extraction based on semi-supervised self-training[J].CAAI Transactions on Intelligent Systems,2019,14(01):635.[doi:10.11992/tis.201806006]

备注/Memo

备注/Memo:
收稿日期:2007-04-13.
基金项目:
国家“863”基金资助项目(2006AA10Z313);
国家自然科学基金资助项目(6077320 6/F020106,60704047/F030304);国防应用基础研究基金资助项目(A1420461266);
教育部跨世纪优秀人才支持计划基金资助项目(NCET040496);
教育部科学研究重点基金资助项目(105087).
作者简介:
陈晓峰,男,1977年生,博士研究生,主要研究方向为机器学习、模式识别.
王士同,男,1964年生,教授,博士生导师,主要研究方向为模糊人工智能、模式识别、图像处理和生物信息学等,先后十多次留学英国、日本和香港地区,在国内外重要杂志上发表学术论文数十篇.
曹苏群,男,1976年生,博士研究生,主要研究方向为模式识别,图像处理、软件工程等.
通讯作者:王士同.wxwangst@yahoo.com.cn
更新日期/Last Update: 2009-05-10