[1]张礼,马越,吴东洋.多条件多样本RNA-Seq数据的剪切异构体表达水平估计[J].智能系统学报,2021,16(6):1126-1135.[doi:10.11992/tis.202101028]
ZHANG Li,MA Yue,WU Dongyang.Estimation of transcription variant expression level based on multi-condition multi-sample RNA-Seq data[J].CAAI Transactions on Intelligent Systems,2021,16(6):1126-1135.[doi:10.11992/tis.202101028]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
16
期数:
2021年第6期
页码:
1126-1135
栏目:
学术论文—人工智能基础
出版日期:
2021-11-05
- Title:
-
Estimation of transcription variant expression level based on multi-condition multi-sample RNA-Seq data
- 作者:
-
张礼1, 马越2, 吴东洋1
-
1. 南京林业大学 信息科学技术学院,江苏 南京 210016;
2. 江苏健康卫生职业学院 中西医结合学院,江苏 南京 210018
- Author(s):
-
ZHANG Li1, MA Yue2, WU Dongyang1
-
1. College of Information Science and Technology, Nanjing Forest University, Nanjing 210016, China;
2. College of Integrated Chinese and Western Medicine, Jiangsu Health Vocational College, Nanjing 210018, China
-
- 关键词:
-
转录组测序技术; 多条件; 多样本; 剪切异构体; 表达水平估计; 稀疏特性; 读段分布偏差; 数据噪声
- Keywords:
-
RNA-Seq; multi-condition; multi-sample; isoform; expression estimation; sparsity; read bias; data noise
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.202101028
- 摘要:
-
当处理多条件多样本RNA-Seq测序数据时,现有方法忽略了读段分布样本之间存在高度相似性的特点。本文提出了一个基于多条件多样本RNA-Seq测序数据剪切异构体表达水平估计方法MCMS-Seq。该方法建立了一个联合偏差估计模型来提取读段分布在样本之间的相似性特征,同时考虑读段分布受全局偏差和局部偏差的影响。此外,增加了 ${{{L_2}} / {{L_1}}}$ 组稀疏约束和 ${L_1}$ 稀疏约束两个正则化项,用来体现基因和剪切异构体之间存在稀疏特性,以及消除技术性误差和数据噪声的影响。通过多个真实数据集的验证,MCMS-Seq方法能获得更为准确的剪切异构体表达水平,同时也能提供更有意义的生物性解释。
- Abstract:
-
When analyzing multi-condition multi-sample RNA-sequencing (MCMS RNA-Seq) data, the existing methods for estimating transcription variant expression levels ignore the high similarity between read distribution samples. Thus, this study proposes a method for estimating transcription variant expression levels based on MCMS-Seq data. A joint bias estimation model was developed to extract read distribution similarity between samples, considering the influence of both global and local biases on read distribution at the same time. In addition, two regularization items, ${{{L_2}} / {{L_1}}}$ and ${L_1}$ sparse constraints, were added to reflect sparsity characteristics between genes and transcription variants and to eliminate the influence of technical errors and data noise. This method allows a more accurate estimation of transcription variant expression levels based on MCMS-Seq data and provides more meaningful biological interpretations.
更新日期/Last Update:
2021-12-25