[1]赵璨,段琼,何增有.基于概率图模型的蛋白质推断算法[J].智能系统学报编辑部,2016,11(3):376-383.[doi:10.11992/tis.201603051]
ZHAO Can,DUAN Qiong,HE Zengyou.Protein inference method based on probabilistic graphical model[J].CAAI Transactions on Intelligent Systems,2016,11(3):376-383.[doi:10.11992/tis.201603051]
点击复制
《智能系统学报》编辑部[ISSN 1673-4785/CN 23-1538/TP] 卷:
11
期数:
2016年第3期
页码:
376-383
栏目:
学术论文—脑认知基础
出版日期:
2016-06-25
- Title:
-
Protein inference method based on probabilistic graphical model
- 作者:
-
赵璨, 段琼, 何增有
-
大连理工大学 国家示范性软件学院, 辽宁 大连 116620
- Author(s):
-
ZHAO Can, DUAN Qiong, HE Zengyou
-
School of Software, Dalian University of Technology, Dalian 116620, China
-
- 关键词:
-
蛋白质推断; 肽段推断; 鸟枪法蛋白质组学; 概率图模型
- Keywords:
-
protein inference; peptide inference; shotgun proteomics; probability graph model
- 分类号:
-
TP393
- DOI:
-
10.11992/tis.201603051
- 摘要:
-
蛋白质组学是研究细胞内表达的所有的蛋白质及其变化规律的一门新兴学科。蛋白质组学的一个重要目标是能够快速准确的进行蛋白质鉴定。蛋白质鉴定主要包括肽段鉴定和蛋白质推断两个步骤。肽段鉴定是从原始质谱数据中鉴定出肽段序列,而蛋白质推断是从这些鉴定得到的肽段中还原出原始的蛋白质序列。但由于质谱数据固有的不确定性和蛋白质组的复杂性,使得解决蛋白质推断问题变得很困难。本文引入串联质谱数据对于蛋白质存在概率的影响,提出了一种基于概率图模型的方法(PGMPi)来解决蛋白质推断问题,将蛋白质推断问题抽象成一个概率图模型的求解问题,通过寻找蛋白质的最大后验概率来推断真实存在的蛋白质集合。该方法不仅能够进行有效的蛋白质推断,而且模型参数少,提高了算法的稳定性。实验结果表明该模型在蛋白质推断上具有很好的表现。
- Abstract:
-
Proteomics is an emerging discipline that focuses on the large-scale study of proteins expressed inan organism. An explicit goal of proteomics is the prompt and accurate identification of all proteins in a cell or tissue. Generally, protein identification can be divided into two parts: peptide identification and protein inference. In peptide identification, the peptide sequence is identified from raw tandem mass spectrometry , while the goal of protein inference is to identify which of these identified proteins is truly present in the sample. Because of the inherent uncertainty of MS data and the complexity of the proteome, there are several challenges in protein identification. In this article, we propose a novel method based on the probabilistic graphical model (PGMPi) that introduces the influence of tandem mass spectrometry. This method transforms the protein inference problem into a probabilistic graphical model problem to be solved, in which the maximum posteriori probabilities of proteins are identified in order to identify the protein set that is actually present in the sample. PGMPi can not only achieve efficient performance in terms of identification, but also introduces only one parameter, which ensures the algorithm’s stability. The experimental results demonstrate that our method is superior to existing state-of-the-art protein inference algorithms.
备注/Memo
收稿日期:2016-3-20;改回日期:。
基金项目:国家自然科学基金项目(61572094).
作者简介:赵璨,女,出生于1991年,硕士研究生,主要研究方向是生物信息学、蛋白质推断以及PPI网络推断。段琼,男,1990年生,硕士研究生,主要研究方向为生物信息学、基于自顶向下的蛋白质推断。何增有,男,1976年生,副教授,主要研究方向为数据挖掘、生物信息学,学术论文均发表在该领域的顶级期刊或会议上,出版学术专著1部。
通讯作者:何增有.E-mail:zyhe@dlut.edu.cn.
更新日期/Last Update:
1900-01-01