[1]曹汉童,陈璟.融合Doc2vec与GCN的多类型蛋白质相互作用预测方法[J].智能系统学报,2023,18(6):1165-1172.[doi:10.11992/tis.202212029]
CAO Hantong,CHEN Jing.Prediction of multitype protein interactions combining Doc2vec and GCN[J].CAAI Transactions on Intelligent Systems,2023,18(6):1165-1172.[doi:10.11992/tis.202212029]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
18
期数:
2023年第6期
页码:
1165-1172
栏目:
学术论文—机器学习
出版日期:
2023-11-05
- Title:
-
Prediction of multitype protein interactions combining Doc2vec and GCN
- 作者:
-
曹汉童1, 陈璟1,2
-
1. 江南大学 人工智能与计算机学院, 江苏 无锡 214122;
2. 江南大学 江苏省模式识别与计算智能工程实验室, 江苏 无锡 214122
- Author(s):
-
CAO Hantong1, CHEN Jing1,2
-
1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China;
2. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computing Intelligence, Jiangnan University, Wuxi 214122, China
-
- 关键词:
-
PPI网络; 图神经网络; 蛋白质功能预测; 深度学习; 生物学意义; 复杂网络; 图卷积神经网络; 非监督学习; 蛋白质序列
- Keywords:
-
PPI network; graph neural network; protein function prediction; deep learning; biological significance; complex network; GCN; unsupervised learning; protein sequence
- 分类号:
-
TP391;Q811.4
- DOI:
-
10.11992/tis.202212029
- 摘要:
-
多类型蛋白质-蛋白质相互作用(protein-protein interaction, PPI)的研究是从系统角度理解生物过程和揭示疾病机制的基础。现有的GNN-PPI、PIPR等针对多类型PPI预测方法在采用广度和深度优先搜索对数据集进行划分时,测试准确率会显著下降,因此本文基于Doc2vec方法思想和图卷积神经网络(graph convolutional network, GCN)技术,提出了一种新的多类型PPI预测方法GDP(GCN Doc2vec PPI)。该方法无需依赖蛋白质的物理和生物学特性,仅用序列信息对蛋白质进行编码,并结合网络结构信息对蛋白质进行特征聚合形成PPI信息,从而对其进行多类型预测。实验结果表明,该方法在不同规模的真实数据中可以有效地提高多类型PPI预测准确率,尤其是在训练集中未曾见过的新蛋白质之间的PPI。
- Abstract:
-
The study of multitype protein-protein interactions (PPIs) is the basis for understanding biological processes and revealing disease mechanisms from a systematic perspective. Existing prediction methods for multiple types of PPIs, such as GNN-PPI and PIPR, show a considerable decline in test accuracy when the breadth- and depth-first searches are used to divide data sets. Therefore, this paper proposes a new multitype PPI prediction method (GDP) based on the Doc2vec method and graph convolutional neural network technology, which does not need to rely on the physical and biological properties of proteins. Moreover, the method only uses sequence information to encode proteins and combines the network structure information to conduct characteristic protein polymerization for developing PPI information to perform multitype prediction. Experimental results show that this method can effectively improve the prediction accuracy of multiple type PPIs in real data with different scales, especially in PPI between new proteins that have not been previously observed in the training set.
备注/Memo
收稿日期:2022-12-30。
基金项目:江苏省青年自然科学基金项目(BK20150159).
作者简介:曹汉童,硕士研究生,主要研究方向为生物信息学;陈璟,副教授,博士,主要研究方向为生物信息学。主持江苏省青年基金1项,参加国家自然科学基金项目3项,申请发明专利13项,授权发明专利5项,获得省部级奖励4项,发表学术论文20余篇
通讯作者:陈璟,E-mail:chenjing@jiangnan.edu.cn
更新日期/Last Update:
1900-01-01