[1]常征,孟军,施云生,等.多特征融合的lncRNA识别与其功能预测[J].智能系统学报,2018,13(06):928-934.[doi:10.11992/tis.201806008]
 CHANG Zheng,MENG Jun,SHI Yunsheng,et al.LncRNA recognition by fusing multiple features and its function prediction[J].CAAI Transactions on Intelligent Systems,2018,13(06):928-934.[doi:10.11992/tis.201806008]
点击复制

多特征融合的lncRNA识别与其功能预测(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第13卷
期数:
2018年06期
页码:
928-934
栏目:
出版日期:
2018-10-25

文章信息/Info

Title:
LncRNA recognition by fusing multiple features and its function prediction
作者:
常征 孟军 施云生 莫冯然
大连理工大学 计算机科学与技术学院, 辽宁 大连 116023
Author(s):
CHANG Zheng MENG Jun SHI Yunsheng MO Fengran
School of Computer Science and Technology, Dalian University of Technology, Dalian 116023, China
关键词:
lncRNA识别特征提取多特征融合机器学习互作关系网络构建功能预测
Keywords:
lncRNAidentificationfeature extractionmultiple features fusionmachine learninginterrelationshipnetwork constructionfunction prediction
分类号:
TP391
DOI:
10.11992/tis.201806008
摘要:
针对传统的基于单一特征的植物lncRNA识别的局限性,提出了融合RNA序列的开放阅读框、二级结构以及k-mers等多特征方法,训练高斯朴素贝叶斯、支持向量机和梯度提升决策树3种经典的分类模型,并实现分类结果的集成,利用交叉验证对模型的性能进行了评估,整体性能优于目前较流行的CPAT、CNCI和PLEK预测软件,在拟南芥数据集上总体的准确率达到了89%。另外,基于内源性竞争规则以及RNA结构信息,分别对lncRNA-microRNA和microRNA-mRNA进行靶向预测、筛选,再通过整合预测数据建立互作网络,并对网络模块中的lncRNA进行功能预测。通过GO术语分析,对与mRNA相关的lncRNA可能参与的生物调控过程进行预测,推测它们的相应功能。
Abstract:
Considering the limitations of the traditional plant lncRNA identification based on a single feature, in this paper, a method, in which the open reading frame, secondary structure, and k-mers features of RNA sequences are integrated, is proposed. It involves the training of three classical classification models, Gaussian naive Bayes, support vector machines, and gradient lifting decision tree, and integrating the classification results. The performance of the method was evaluated using cross-validation, and it exhibited superior performance. The accuracy of the proposed method reached 89% when tested with the Arabidopsis thaliana dataset. Using the same dataset, the proposed method outperformed the popular CPAT, CNCI, and PLEK prediction software. In addition, based on the endogenous competition rules and RNA structure information, target prediction and filter rules for lncRNA-microRNA and microRNA-mRNA pairs were executed, and then related tools were used to establish RNA interaction regulatory networks, and the regulatory relationship was analyzed to predict the functions of lncRNAs in modules. Through Gene Ontology term analysis, the possible biological regulation function of lncRNAs can be predicted, and their corresponding functions can be inferred.

参考文献/References:

[1] COSTA F F. Non-coding RNAs:meet thy masters[J]. Bioassays, 2010, 32(7):599-608.
[2] PALAZZO A F, LEE E S. Non-coding RNA:what is functional and what is junk?[J]. Frontiers in genetics, 2015, 6:Article No.2.
[3] SCHMITZ S U, GROTE P, HERRMANN B G. Mechanisms of long noncoding RNA function in development and disease[J]. Cellular and molecular life sciences, 2016, 73(13):2491-2509.
[4] O’LEARY V B, OVSEPIAN S V, CARRASCOSA L G, et al. PARTICLE, a triplex-forming long ncRNA, regulates locus-specific methylation in response to low-dose irradiation[J]. Cell reports, 2015, 11(3):474-485.
[5] CUI Jun, LUAN Yushi, JIANG Ning, et al. Comparative transcriptome analysis between resistant and susceptible tomato allows the identification of lncRNA16397 conferring resistance to Phytophthora infestans by co-expressing glutaredoxin[J]. The plant journal, 2017, 89(3):577-589.
[6] HAN Siyu, LIANG Yanchun, LI Ying, et al. Long noncoding RNA identification:comparing machine learning based tools for long noncoding transcripts discrimination[J]. BioMed research international, 2016, 2016:Article No.8496165.
[7] KONG Lei, ZHANG Yong, YE Zhiqiang, et al. CPC:assess the protein-coding potential of transcripts using sequence features and support vector machine[J]. Nucleic acids research, 2007, 36(S2):W345-W349.
[8] WANG Liguo, PARK H J, DASARI S, et al. CPAT:coding-potential assessment tool using an alignment-free logistic regression model[J]. Nucleic acids research, 2013, 41(6):Article No.e74.
[9] SUN Liang, LUO Haitao, BU Dechao, et al. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts[J]. Nucleic acids research, 2013, 41(17):Article No.e166.
[10] LI Aimin, ZHANG Junying, ZHOU Zhongyin. PLEK:a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme[J]. BMC bioinformatics, 2014, 15:Article No.311.
[11] 郭杏莉, 高琳, 刘永轩, 等. 长非编码RNA生物特征研究与分析[J]. 科学通报, 2013, 58(27):2779-2786 GUO Xingli, GAO Lin, LIU Yongxuan, et al. Research and analysis of biocharacteristics of long non-coding RNAs[J]. Chinese science bulletin, 2013, 58(27):2779-2786
[12] 李同宇, 李卫军, 覃鸿. 基于特征融合的人脸图像性别识别[J]. 智能系统学报, 2013, 8(6):505-511 LI Tongyu, LI Weijun, QIN Hong. Facial image gender recognition method based on feature fusion[J]. CAAI transactions on intelligent systems, 2013, 8(6):505-511
[13] KARIM S. Exploring plant tolerance to biotic and abiotic stresses[D]. Uppsala, Sweden:Swedish University of Agricultural Sciences, 2007:18-23.
[14] YI Xin, ZHANG Zhenhai, LING Yi, et al. PNRD:a plant non-coding RNA database[J]. Nucleic acids research, 2015, 43(D1):D982-D989.
[15] DINGER M E, PANG K C, MERCER T R, et al. Differentiating protein-coding and noncoding RNA:challenges and ambiguities[J]. PLoS computational biology, 2008, 4(11):Article No.e1000176.
[16] FRITH M C, BAILEY T L, KASUKAWA T, et al. Discrimination of non-protein-coding transcripts from protein-coding mRNA[J]. RNA biology, 2006, 3(1):40-48.
[17] LORENZ R, BERNHART S H, HÖNER ZU SIEDERDISSEN C, et al. ViennaRNA package 2.0[J]. Algorithms for molecular biology, 2011, 6:Article No.26.
[18] 王振武, 孙佳骏, 尹成峰. 改进粒子群算法优化的支持向量机及其应用[J]. 哈尔滨工程大学学报, 2016, 37(12):1728-1733 WANG Zhenwu, SUN Jiajun, YIN Chengfeng. A support vector machine based on an improved particle swarm optimization algorithm and its application[J]. Journal of Harbin engineering university, 2016, 37(12):1728-1733
[19] GRIFFITHS-JONES S, GROCOCK R J, VAN DONGEN S, et al. miRBase:microRNA sequences, targets and gene nomenclature[J]. Nucleic acids research, 2006, 34(S1):D140-D144.
[20] CESANA M, CACCHIARELLI D, LEGNINI I, et al. A long noncoding RNA controls muscle differentiation by functioning as a Competing Endogenous RNA[J]. Cell, 2011, 147(2):358-369.
[21] KRÜGER J, REHMSMEIER M. RNAhybrid:microRNA target prediction easy, fast and flexible[J]. Nucleic acids research, 2006, 34(S2):W451-W454.
[22] WU Huajun, WANG Zhimin, WANG Meng, et al. Widespread long noncoding RNAs as endogenous target mimics for microRNAs in plants[J]. Plant physiology, 2013, 161(4):1875-1884.
[23] SHANNON P, MARKIEL A, OZIER O, et al. Cytoscape:a software environment for integrated models of biomolecular interaction networks[J]. Genome research, 2003, 13(11):2498-2504.
[24] ASHBURNER M, BALL C A, BLAKE J A, et al. Gene ontology:tool for the unification of biology[J]. Nature genetics, 2000, 25(1):25-29.

相似文献/References:

[1]林海波,王浩,张毅.改进高斯核函数的人体姿态分析与识别[J].智能系统学报,2015,10(03):436.[doi:10.3969/j.issn.1673-4785.201405049]
 LIN Haibo,WANG Hao,ZHANG Yi.Human postures recognition based on the improved Gauss kernel function[J].CAAI Transactions on Intelligent Systems,2015,10(06):436.[doi:10.3969/j.issn.1673-4785.201405049]

备注/Memo

备注/Memo:
收稿日期:2018-06-04。
基金项目:国家自然科学基金项目(61472061);大连理工大学研究生教改基金项目(Jg2017015);大连理工大学大学生创新训练项目(2018101410201011019).
作者简介:常征,男,1995年生,硕士研究生,主要研究方向为机器学习、数据挖掘和生物信息;孟军,女,1964年生,教授,博士生导师,博士,主要研究方向为机器学习、数据挖掘和大数据处理。主持参与国家自然科学基金、国家重大专项、教育部专项和省自然基金等项目。在国际SCI收录和国内核心期刊发表学术论文70余篇;施云生,男,1994年生,硕士研究生,主要研究方向为机器学习、数据挖掘和生物信息。
通讯作者:孟军.E-mail:mengjun@dlut.edu.cn
更新日期/Last Update: 2018-12-25