LncRNA recognition by fusing multiple features and its function prediction
常征 孟军 施云生 莫冯然
大连理工大学 计算机科学与技术学院, 辽宁 大连 116023
CHANG Zheng MENG Jun SHI Yunsheng MO Fengran
School of Computer Science and Technology, Dalian University of Technology, Dalian 116023, China
lncRNAidentificationfeature extractionmultiple features fusionmachine learninginterrelationshipnetwork constructionfunction prediction
Considering the limitations of the traditional plant lncRNA identification based on a single feature, in this paper, a method, in which the open reading frame, secondary structure, and k-mers features of RNA sequences are integrated, is proposed. It involves the training of three classical classification models, Gaussian naive Bayes, support vector machines, and gradient lifting decision tree, and integrating the classification results. The performance of the method was evaluated using cross-validation, and it exhibited superior performance. The accuracy of the proposed method reached 89% when tested with the Arabidopsis thaliana dataset. Using the same dataset, the proposed method outperformed the popular CPAT, CNCI, and PLEK prediction software. In addition, based on the endogenous competition rules and RNA structure information, target prediction and filter rules for lncRNA-microRNA and microRNA-mRNA pairs were executed, and then related tools were used to establish RNA interaction regulatory networks, and the regulatory relationship was analyzed to predict the functions of lncRNAs in modules. Through Gene Ontology term analysis, the possible biological regulation function of lncRNAs can be predicted, and their corresponding functions can be inferred.


