[1]常征,孟军,施云生,等.多特征融合的lncRNA识别与其功能预测[J].智能系统学报,2018,13(6):928-934.[doi:10.11992/tis.201806008]
CHANG Zheng,MENG Jun,SHI Yunsheng,et al.LncRNA recognition by fusing multiple features and its function prediction[J].CAAI Transactions on Intelligent Systems,2018,13(6):928-934.[doi:10.11992/tis.201806008]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
13
期数:
2018年第6期
页码:
928-934
栏目:
学术论文—机器学习
出版日期:
2018-10-25
- Title:
-
LncRNA recognition by fusing multiple features and its function prediction
- 作者:
-
常征, 孟军, 施云生, 莫冯然
-
大连理工大学 计算机科学与技术学院, 辽宁 大连 116023
- Author(s):
-
CHANG Zheng, MENG Jun, SHI Yunsheng, MO Fengran
-
School of Computer Science and Technology, Dalian University of Technology, Dalian 116023, China
-
- 关键词:
-
lncRNA; 识别; 特征提取; 多特征融合; 机器学习; 互作关系; 网络构建; 功能预测
- Keywords:
-
lncRNA; identification; feature extraction; multiple features fusion; machine learning; interrelationship; network construction; function prediction
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.201806008
- 摘要:
-
针对传统的基于单一特征的植物lncRNA识别的局限性,提出了融合RNA序列的开放阅读框、二级结构以及k-mers等多特征方法,训练高斯朴素贝叶斯、支持向量机和梯度提升决策树3种经典的分类模型,并实现分类结果的集成,利用交叉验证对模型的性能进行了评估,整体性能优于目前较流行的CPAT、CNCI和PLEK预测软件,在拟南芥数据集上总体的准确率达到了89%。另外,基于内源性竞争规则以及RNA结构信息,分别对lncRNA-microRNA和microRNA-mRNA进行靶向预测、筛选,再通过整合预测数据建立互作网络,并对网络模块中的lncRNA进行功能预测。通过GO术语分析,对与mRNA相关的lncRNA可能参与的生物调控过程进行预测,推测它们的相应功能。
- Abstract:
-
Considering the limitations of the traditional plant lncRNA identification based on a single feature, in this paper, a method, in which the open reading frame, secondary structure, and k-mers features of RNA sequences are integrated, is proposed. It involves the training of three classical classification models, Gaussian naive Bayes, support vector machines, and gradient lifting decision tree, and integrating the classification results. The performance of the method was evaluated using cross-validation, and it exhibited superior performance. The accuracy of the proposed method reached 89% when tested with the Arabidopsis thaliana dataset. Using the same dataset, the proposed method outperformed the popular CPAT, CNCI, and PLEK prediction software. In addition, based on the endogenous competition rules and RNA structure information, target prediction and filter rules for lncRNA-microRNA and microRNA-mRNA pairs were executed, and then related tools were used to establish RNA interaction regulatory networks, and the regulatory relationship was analyzed to predict the functions of lncRNAs in modules. Through Gene Ontology term analysis, the possible biological regulation function of lncRNAs can be predicted, and their corresponding functions can be inferred.
备注/Memo
收稿日期:2018-06-04。
基金项目:国家自然科学基金项目(61472061);大连理工大学研究生教改基金项目(Jg2017015);大连理工大学大学生创新训练项目(2018101410201011019).
作者简介:常征,男,1995年生,硕士研究生,主要研究方向为机器学习、数据挖掘和生物信息;孟军,女,1964年生,教授,博士生导师,博士,主要研究方向为机器学习、数据挖掘和大数据处理。主持参与国家自然科学基金、国家重大专项、教育部专项和省自然基金等项目。在国际SCI收录和国内核心期刊发表学术论文70余篇;施云生,男,1994年生,硕士研究生,主要研究方向为机器学习、数据挖掘和生物信息。
通讯作者:孟军.E-mail:mengjun@dlut.edu.cn
更新日期/Last Update:
2018-12-25