[1]张智慧,杨燕,张熠玲.面向不完整多视图聚类的深度互信息最大化方法[J].智能系统学报,2023,18(1):12-22.[doi:10.11992/tis.202203051]
ZHANG Zhihui,YANG Yan,ZHANG Yiling.Deep mutual information maximization method for incomplete multi-view clustering[J].CAAI Transactions on Intelligent Systems,2023,18(1):12-22.[doi:10.11992/tis.202203051]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
18
期数:
2023年第1期
页码:
12-22
栏目:
学术论文—机器学习
出版日期:
2023-01-05
- Title:
-
Deep mutual information maximization method for incomplete multi-view clustering
- 作者:
-
张智慧, 杨燕, 张熠玲
-
西南交通大学 计算机与人工智能学院,四川 成都 611756
- Author(s):
-
ZHANG Zhihui, YANG Yan, ZHANG Yiling
-
School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756, China
-
- 关键词:
-
数据挖掘; 聚类; 不完整多视图聚类; 多视图表示学习; 深度学习; 自编码器; 互信息; 自步学习
- Keywords:
-
data mining; clustering; incomplete multi-view clustering; multi-view representation learning; deep learning; autoencoder; mutual information; self-paced learning
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.202203051
- 摘要:
-
多视图聚类是无监督学习领域研究热点之一,近年来涌现出许多优秀的多视图聚类工作,但其中大多数方法均假设各视图是完整的,然而真实场景下数据收集过程极容易发生缺失,造成部分视图不完整。同时,很多方法采取传统机器学习方法(即浅层模型)对数据进行特征学习,这导致模型难以挖掘高维数据内的复杂信息。针对以上问题,本文提出一种面向不完整多视图聚类的深度互信息最大化方法。首先利用深度自编码器挖掘各视图深层次的隐含特征,并通过最大化潜在表示间的互信息来学习各视图间的一致性知识。然后,对于不完整视图中的缺失数据,利用多视图的公共潜在表示进行补全。此外,本文采用一种自步学习策略对网络进行微调,从易到难地学习数据集中的样本,得到更加宜于聚类的特征表示。最后,在多个真实数据集上进行实验,验证了本文方法的有效性。
- Abstract:
-
Multi-view clustering is a research hotspot in the field of unsupervised learning. Of the many excellent multi-view clustering studies that have recently arisen, most assume that each view is complete. However, in a real scene, the data are extremely easily missed in the collection process, resulting in partially incomplete views. Simultaneously, many methods use traditional machine learning, i.e., the shallow-layer model, to learn data features, which makes it difficult for the model to mine the complex information of high-dimensional data. To solve these problems, in this paper, a novel deep mutual information maximization method is proposed for incomplete multi-view clustering. First, a deep autoencoder is used to learn the rich complex information of each view, and the knowledge of consistency among views is learned by the mutual information between potential representations. Then, the missing data are fixed up by the common latent representation of multi-view data. Additionally, this paper uses a self-paced strategy to fine-tune the model as it learns the samples from easy to difficult, obtaining a more clustering-friendly representation. Experiments performed on several real datasets show the effectiveness of our proposed method.
备注/Memo
收稿日期:2022-03-24。
基金项目:国家自然科学基金项目(61976247).
作者简介:张智慧,硕士研究生,主要研究方向为数据挖掘和多视图聚类;杨燕,教授,博士生导师,博士,四川省学术和技术带头人,CCF杰出会员,主要研究方向为人工智能、大数据分析与挖掘、集成学习、多视图学习、聚类分析和时空挖掘。主持承担国家自然科学基金等科技项目10余项。发表学术论文230余篇,入选2021年度中国百篇最具国际影响力学术论文1篇;张熠玲,博士,主要研究方向为多视图学习、多任务学习、聚类分析和时空挖掘
通讯作者:杨燕.E-mail:yyang@swjtu.edu.cn
更新日期/Last Update:
1900-01-01