CHEN Xiao-feng,WANG Shi-tong,CAO Su-qun.Gene function analysis of semisupervised multilabel learning[J].CAAI Transactions on Intelligent Systems,2008,3(01):83-90.





Gene function analysis of semisupervised multilabel learning
CHEN Xiao-feng1 WANG Shi-tong1 CAO Su-qun12
1.School of Information Technology, Jiangnan University, Wuxi 214122 , Ch ina;
 2.Department of Mechanical Engineering, Huaiyin Institute of Technology, H uai’an 223001,China
semisupervised multilabel selftraining support vector machine
传统的机器学习主要解决单标记学习,即一个样本仅有一个标记.在生物信息学中,一个基因通常至少具有一个功能,即至少具有一个标记,与传统学习方法相比,多标记学习能更有效地识别生物相关基因组的功能.目前的研究主要集中在监督多标记学习算法.然而,研究半监督多标记学习算法,从已标记和未标记的基因表达数据中学习,仍然是未解决问题.提出一种有效的基因功能分析的半监督多标记学习算法SML_SVM.首先,SML_SVM根据PT 4方法,将半监督多标记学习问题转化为半监督单标记学习问题,然后根据最大后验概率原则(MAP)和K近邻方法估计未标记样本的标记,最后,用SVM求解单标记学习问题.在yeast 基因数据和genbase蛋白质数据上的实验表明,SML_SVM性能比基于PT4方法的MLSVM和自训练 MLSVM更优
Conventional machine learning is used only for single label learning, implying that every sample has only one label. However, in bioinformatics, a gen e has more than one function, so it needs more than one label. Therefore, multi  label learning is more effective for identifying gene groups than conventional l earning approach. Current research mainly focuses on supervised multilabel lea r ning. The problem of effective semisupervised multilabel learning strategies f or labeled examples and unlabeled examples of gene expression datasets still rem ains unsolved. In this paper, a semisupervised multilabel learning algorithm , named SML_SVM, is presented as an effective multilabel learner for analysis of gene expressions with at least one function. First, the proposed SML_SVM algorit hm transforms the semisupervised multilabel learning into corresp ond ing semisupervised singlelabel learning by the PT4 method, then it labels un la beled examples using the maximum a posteriori (MAP) principle in combination wit h the Knearest neighbor method, and finally, it solves the corresponding singl e label learning problem using SVM. The distinctive characteristic of the propos e d algorithm is its efficient integration of SVMbased singlelabel learning wi th MAP and Knearest neighbor methods. Experimental results with a real Yeast gen e expression dataset and a Genbase protein dataset show that the proposed SML_S VM algorithm outperforms the PT4based MLSVM method and selftraining MLSVM.


[1]EISEN M B, SPELLMAN P T, BROWN P O, et al. Cluster analysis and dis p lay of genomewide expression patterns[C]// Proceedings of the National Acad em y of Science of the United States of America. Washington,D.C,USA, 1998.
[2]TAMAYO P, SLONIM D, MESIROV J, et al. Interpreting patterns of gene expres sion with selforganizing maps[C]// Proceedings of the National Academy of S ciences of the United States of America. Washington,D.C,USA, 1999.
[3]WU S, LIEW A W C, YAN H, et al. Cluster analysis of gene expression data b ased on selfsplitting and merging competitive learning[J]. IEEE Transactions on Information Technology in Biomedicine, 2004, 8(1):515.
[4]MCCALLUM A K. Multilabel text classification with a mixture model trained by EM[C]// Working Notes of the AAAI’99 Workshop on Text Learning. Orl ando,USA,1999.
[5]SCHAPIRE R E, SINGER Y. Boostexter: a boostingbased system for text categ orization[J]. Machine Learning, 2000, 39(23):135168.
[6]ELISSEEFF A, WESTON J. A kernel method for multilabeled classification[C ]// Advances in Neural Information Processing Systems 14. Cambridge: MI T Press,2002.
[7]BOUTELL M R, LUO J, SHEN X, et al. Learning multilabel scene classifica tion[J]. Pattern Recognition, 2004, 37(9): 17571771.
[8]OGIHARA LI T M. Detecting emotion in music[C]// Proceedings of the Inter national Symposium on Music Information Retrieval. Maryland, USA: ISMIR Pre ss,2003.
[9]ZHU X J. Semisupervised learning literature survey[R]. Department of Computer Sciences, University of Wisconsin, Madison, 2005.
[10]ZHANG M L, ZHOU Z H. MLKNN: a lazy learning approach to multil abel lea rning[J]. Pattern Recognition, 2007, 40(7): 20382048.
[11]TSOUMAKAS G, KATAKIS I. Multilabel classification: an overview[J]. Int ernational Journal of Data Warehousing and Mining, 2007, 3(3):113.
[12]CLARE A, KING R D. Knowledge discovery in multilabel phenotype data[C] // Proceedings of the 5th European Conference on Principles of Data Mining and Kn owledge Discovery (PKDD 2001). Freiberg, Germany: Springer, 2001.
[13]LUO X, ZINCIR H. Evaluation of two systems on multiclass multilabel do cument classification[C]// Lecture Notes in Computer Science. Freiberg,Germany : Springer,2005.
[14]GODBOLE S, SARAWAGI S. Discriminative methods for multilabeled classific ation[C]// Lecture Notes in Computer Science. Germany: Springer,2004.
[15]ZHOU Z H, ZHANG M L. Multiinstance multilabel learning with applicat ion to scene classification[C]// Advances in Neural Information Processing Sy stems.Cambridge: MIT Press,2007.
[16]ZHANG M L, ZHOU Z H. Multilabel neural networks with applications to func tional genomics and text categorization[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(10): 13381351.
[17]施彤年, 卢忠良, 荣  融,等.多类多标签汉语文本自动分类的研究[J].情报学报, 2003, 22(3): 306309.
 SHI Tongnian, LU Zhongliang, RONG Rong ,et al. Research on the Chinese text c ategorization of multiclassification and multilabel[J]. Jou rnal of the China Society for Scientific and Technical Information, 2003, 22(3): 306309.
[18]LIU Y, JIN R, LIU Y. Semisupervised multilabel learning by cons trained nonnegative matrix factorization[C]// Proceeding of the TwentyFir st National Conference on Artificial Intelligence, Eighteenth Conference on Innova tive Applications of Artificial Intelligence. Boston: AAAI Press, 2006.
[19]宫秀军, 史忠植. 基于Bayes潜在语义模型的半监督Web挖掘[J]. 软件学报, 2002, 12(8):15081514.
 GONG Xiujun, SHI Zhongzhi. Semisupervised web mining based on bayes late nt sem antic model[J]. Journal of Software, 2002, 12(8): 15081514.
[20]彭 雅, 林亚平, 陈治平. TFIDF_NB协同训练算法[J]. 小型微型计算机, 2004, 2 5(12): 22432246.
PENG Ya, LIN Yaping, CHEN Zhiping. TFIDFNB cooperative train ing algorithm[J]. Minimicro Systems, 2004, 25(12): 22432246.
[21]KLAUS B, JOHANNS F, EYKE H. A unified model for multilabel classifi cation and ranking[C]// Proceeding of the 15th Eureopean Conference on Artifi ci al Intelligence. Riva del Garda, Italy: IOS Press, 2006.
[22]PAVLIDIS P, WESTON J, CAI J, et al. Combining microarray expressio n data and phylogenetic protellles to learn functional categories using support vector machines[R].CUCS011000, Department of Computer Sc ience , Columbia University, Columbia, 2000.
[23]DIPLARIS S, TSOUMAKAS G, MITKAS P, et al. Protein classification w ith multiple algorithms[C]// Lecture Notes in Computer Science.Volo s, Greece: Springer, 2005.


 WANG Yue,YANG Yan,WANG Hongjun.An improved transfer fuzzy clustering with few labels[J].CAAI Transactions on Intelligent Systems,2016,11(01):310.[doi:10.11992/tis.201603046]
 GUO Yumeng,LI Guozheng.A filtering framework for the multi-label feature selection[J].CAAI Transactions on Intelligent Systems,2014,9(01):292.[doi:10.3969/j.issn.1673-4785.201403064]
 SHAO Dongheng,YANG Wenyuan,ZHAO Hong.Label distribution learning based on k-means algorithm[J].CAAI Transactions on Intelligent Systems,2017,12(01):325.[doi:10.11992/tis.201704024]
 BIAN Zekang,WANG Shitong.Robust FCM clustering algorithm based on hybrid-distance learning[J].CAAI Transactions on Intelligent Systems,2017,12(01):450.[doi:10.11992/tis.201607019]
 MIN Fan,WANG Hongjie,LIU Fulun,et al.SUCE: semi-supervised binary classification based on clustering ensemble[J].CAAI Transactions on Intelligent Systems,2018,13(01):974.[doi:10.11992/tis.201711027]
 QU Zhaowei,WU Chunye,WANG Xiaoru.Aspects extraction based on semi-supervised self-training[J].CAAI Transactions on Intelligent Systems,2019,14(01):635.[doi:10.11992/tis.201806006]


国家自然科学基金资助项目(6077320 6/F020106,60704047/F030304);国防应用基础研究基金资助项目(A1420461266);
更新日期/Last Update: 2009-05-10