[1]张礼,马越,吴东洋.多条件多样本RNA-Seq数据的剪切异构体表达水平估计[J].智能系统学报,2021,16(6):1126-1135.[doi:10.11992/tis.202101028]
 ZHANG Li,MA Yue,WU Dongyang.Estimation of transcription variant expression level based on multi-condition multi-sample RNA-Seq data[J].CAAI Transactions on Intelligent Systems,2021,16(6):1126-1135.[doi:10.11992/tis.202101028]
点击复制

多条件多样本RNA-Seq数据的剪切异构体表达水平估计(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第16卷
期数:
2021年6期
页码:
1126-1135
栏目:
学术论文—人工智能基础
出版日期:
2021-11-05

文章信息/Info

Title:
Estimation of transcription variant expression level based on multi-condition multi-sample RNA-Seq data
作者:
张礼1 马越2 吴东洋1
1. 南京林业大学 信息科学技术学院,江苏 南京 210016;
2. 江苏健康卫生职业学院 中西医结合学院,江苏 南京 210018
Author(s):
ZHANG Li1 MA Yue2 WU Dongyang1
1. College of Information Science and Technology, Nanjing Forest University, Nanjing 210016, China;
2. College of Integrated Chinese and Western Medicine, Jiangsu Health Vocational College, Nanjing 210018, China
关键词:
转录组测序技术多条件多样本剪切异构体表达水平估计稀疏特性读段分布偏差数据噪声
Keywords:
RNA-Seqmulti-conditionmulti-sampleisoformexpression estimationsparsityread biasdata noise
分类号:
TP391
DOI:
10.11992/tis.202101028
摘要:
当处理多条件多样本RNA-Seq测序数据时,现有方法忽略了读段分布样本之间存在高度相似性的特点。本文提出了一个基于多条件多样本RNA-Seq测序数据剪切异构体表达水平估计方法MCMS-Seq。该方法建立了一个联合偏差估计模型来提取读段分布在样本之间的相似性特征,同时考虑读段分布受全局偏差和局部偏差的影响。此外,增加了 ${{{L_2}} / {{L_1}}}$ 组稀疏约束和 ${L_1}$ 稀疏约束两个正则化项,用来体现基因和剪切异构体之间存在稀疏特性,以及消除技术性误差和数据噪声的影响。通过多个真实数据集的验证,MCMS-Seq方法能获得更为准确的剪切异构体表达水平,同时也能提供更有意义的生物性解释。
Abstract:
When analyzing multi-condition multi-sample RNA-sequencing (MCMS RNA-Seq) data, the existing methods for estimating transcription variant expression levels ignore the high similarity between read distribution samples. Thus, this study proposes a method for estimating transcription variant expression levels based on MCMS-Seq data. A joint bias estimation model was developed to extract read distribution similarity between samples, considering the influence of both global and local biases on read distribution at the same time. In addition, two regularization items, ${{{L_2}} / {{L_1}}}$ and ${L_1}$ sparse constraints, were added to reflect sparsity characteristics between genes and transcription variants and to eliminate the influence of technical errors and data noise. This method allows a more accurate estimation of transcription variant expression levels based on MCMS-Seq data and provides more meaningful biological interpretations.

参考文献/References:

[1] MARIONI J C, MASON C E, MANE S M, et al. RNA-Seq: an assessment of technical reproducibility and comparison with gene expression arrays[J]. Genome research, 2008, 18(9): 1509-1517.
[2] 周晓光, 任鲁风, 李运涛, 等. 下一代测序技术: 技术回顾与展望[J]. 中国科学: 生命科学, 2010, 40(1): 23-37
ZHOU Xiaoguang, REN Lufeng, LI Yuntao, et al. The next-generation sequencing technology: A technology review and future perspective[J]. Scientia sinica (vitae), 2010, 40(1): 23-37
[3] 王曦, 汪小我, 王立坤, 等. 新一代高通量RNA测序数据的处理与分析[J]. 生物化学与生物物理进展, 2010, 37(8): 834-846
WANG Xi, WANG Xiaowo, WANG Likun, et al. A review on the processing and analysis of next-generation RNA-seq data[J]. Progress in biochemistry and biophysics, 2010, 37(8): 834-846
[4] ZHANG Li, LIU Xuejun. A comprehensive review on RNA-Seq data analysis[J]. Transactions of Nanjing University of Aeronautics and Astronautics, 2016, 33(3): 339-361.
[5] MONIER B, MCDERMAID A, WANG Cankun, et al. RIS-EDA: an integrated RNA-Seq interpretation system for gene expression data analysis[J]. PLoS computational biology, 2019, 15(2): e1006792.
[6] 王凯莉, 张礼, 刘学军. 融合多平台表达数据的转录组差异表达分析[J]. 计算机学报, 2018, 41(6): 1415-1430
WANG Kaili, ZHANG Li, LIU Xuejun. Differential expression analysis based on integrating transcriptome expression data from multiple platforms[J]. Chinese journal of computers, 2018, 41(6): 1415-1430
[7] 王凯莉, 张礼, 刘学军. 多实验平台下基因及异构体表达分析综述[J]. 中国生物医学工程学报, 2017, 36(2): 211-218
WANG Kaili, ZHANG Li, LIU Xuejun. A review of gene and isoform expression analysis across multiple experimental platforms[J]. Chinese journal of biomedical engineering, 2017, 36(2): 211-218
[8] JIANG Hui, WONG W H. Statistical inferences for isoform expression in RNA-Seq[J]. Bioinformatics, 2009, 25(8): 1026-1032.
[9] WU Zhengpeng, WANG Xi, ZHANG Xuegong. Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq[J]. Bioinformatics, 2011, 27(4): 502-508.
[10] HU Ming, ZHU Yu, TAYLOR J M G, et al. Using poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq[J]. Bioinformatics, 2012, 28(1): 63-68.
[11] TRAPNELL C, WILLIAMS B A, PERTEA G, et. al Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation[J]. Nature biotechnology, 2010, 28(5): 511-515.
[12] GLAUS P, HONKELA A, RATTRAY M. Identifying differentially expressed transcripts from RNA-Seq data with biological variation[J]. Bioinformatics, 2012, 28(13): 1721-1728.
[13] ZHANG Li, LIU Xuejun. PBSeq: modeling base-level bias to estimate gene and isoform expression for RNA-Seq data[J]. International journal of machine learning and cybernetics, 2017, 8(4): 1247-1258.
[14] LI Bo, DEWEY C N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome[J]. BMC bioinformatics, 2011, 12: 323.
[15] LI W V, LI J J. Modeling and analysis of RNA-seq data: a review from a statistical perspective[J]. Quantitative biology, 2018, 6(3): 195-209.
[16] LIU Siyun, JIANG Yuan, TAO Yu. Modelling RNA‐Seq data with a zero-inflated mixture Poisson linear model[J]. Genetic epidemiology, 2019, 43(7): 786-799.
[17] ZHANG Chi, ZHANG Baohong, LIN L L, et al. Evaluation and comparison of computational tools for RNA-Seq isoform quantification[J]. BMC genomics, 2017, 18(1): 1-11.
[18] LI Song, SABUNCIYAN S, YANG Guangyu, et al. A multi-sample approach increases the accuracy of transcript assembly[J]. Nature communications, 2019, 10(1): 1-7.
[19] SUO Chen, CALZA S, SALIM A, et al. Joint estimation of isoform expression and isoform-specific read distribution using multisample RNA-Seq data[J]. Bioinformatics, 2014, 30(4): 506-513.
[20] LI W V, ZHAO Anqi, ZHANG Shihua, et al. MSIQ: joint modeling of multiple RNA-Seq samples for accurate isoform quantification[J]. The annals of applied statistics, 2018, 12(1): 510-539.
[21] DENG Wenjiang, MOU Tian, KALARI K R, et al. Alternating EM algorithm for a bilinear model in isoform quantification from RNA-Seq data[J]. Bioinformatics, 2020, 36(3): 805-812.
[22] AGUIAR D, CHENG Lifang, DUMITRASCU B, et al. Bayesian nonparametric discovery of isoforms and individual specific quantification[J]. Nature communications, 2018, 9(1): 1-12.
[23] LIU Xuejun, ZHANG Li, CHEN Songcan. Modeling exon-specific bias distribution improves the analysis of RNA-Seq data[J]. PLoS one, 2015, 10(10): e0140032.
[24] 焦李成, 赵进, 杨淑媛, 等. 稀疏认知学习、计算与识别的研究进展[J]. 计算机学报, 2016, 39(4): 835-852
JIAO Licheng, ZHAO Jin, YANG Shuyuan, et al. Research advances on sparse cognitive learning, computing and recognition[J]. Chinese journal of computers, 2016, 39(4): 835-852
[25] JENATTON R, MAIRAL J, OBOZINSKI G, et al. Proximal methods for sparse hierarchical dictionary learning[C]//Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa, Israel, 2010: 487-494.
[26] LANGMEAD B, SALZBERG S L. Fast gapped-read alignment with Bowtie 2[J]. Nature methods, 2012, 9(4): 357-359.
[27] MORTAZAVI A, WILLIAMS B A, MCCUE K, et al. Mapping and quantifying mammalian transcriptomes by RNA-Seq[J]. Nature methods, 2008, 5(7): 621-628.
[28] SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-Seq accuracy, reproducibility and information content by the sequencing quality control consortium[J]. Nature biotechnology, 2014, 32(9): 903-914.
[29] BULLARD J H, PURDOM E, HANSEN K D, et al. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments[J]. BMC bioinformatics, 2010, 11(1): 1-13.

备注/Memo

备注/Memo:
收稿日期:2021-01-18。
基金项目:国家自然科学基金项目(61802193);江苏省自然科学基金项目(BK20170934);南京林业大学青年科技创新基金项目(CX2017031);汕尾市省级科技创新战略专项资金项目(2018D2002)
作者简介:张礼,讲师,博士,主要研究方向为机器学习、生物信息学;马越,助教,硕士,主要研究方向为分子生物学、神经系统疾病;吴东洋,讲师,博士,主要研究方向为数据挖掘、生物信息学
通讯作者:张礼.E-mail:lizhang@njfu.edu.cn
更新日期/Last Update: 2021-12-25