[1]江冰,谷飞洋,何增有.去冗余Top-k对比序列模式挖掘[J].智能系统学报,2018,13(5):680-686.[doi:10.11992/tis.201702019]
JIANG Bing,GU Feiyang,HE Zengyou.Mining Top-k non-redundant distinguishing sequential patterns[J].CAAI Transactions on Intelligent Systems,2018,13(5):680-686.[doi:10.11992/tis.201702019]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
13
期数:
2018年第5期
页码:
680-686
栏目:
学术论文—机器学习
出版日期:
2018-09-05
- Title:
-
Mining Top-k non-redundant distinguishing sequential patterns
- 作者:
-
江冰, 谷飞洋, 何增有
-
大连理工大学 软件学院, 辽宁 大连 116621
- Author(s):
-
JIANG Bing, GU Feiyang, HE Zengyou
-
Software School, Dalian University of Technology, Dalian 116621, China
-
- 关键词:
-
对比序列模式; 广度优先; 冗余序列模式; 模式挖掘; Top-k
- Keywords:
-
distinguishing sequential pattern; breadth-first; redundant sequential patterns; pattern mining; Top-k
- 分类号:
-
TP393
- DOI:
-
10.11992/tis.201702019
- 摘要:
-
对比序列模式可以用来表征不同类别数据集之间的差异。在生物信息、物流管理、电子商务等领域,对比序列模式有着广泛的应用。Top-k对比序列模式挖掘的目标是发现数据集中对比度最高的前k个序列模式。在Top-k对比序列模式挖掘中,可能挖掘出冗余的序列模式。目前,虽然有Top-k对比序列模式发现算法被提出,但这些算法并未考虑冗余序列模式的问题。为此,本文提出了基于广度优先生成树的去冗余Top-k对比序列模式挖掘算法BFM(breadth-first miner)。使用BFM算法可以有效地解决冗余问题,得到去冗余的Top-k对比序列模式。在BFM算法的基础上,提出了性能更好的算法PBFM(pruning breadth-first miner)。通过在真实数据集上的实验分析与对比 ,验证了本文算法的有效性。
- Abstract:
-
Distinct sequential patterns can be used to characterize different categories of datasets. In the field of bioinformatics, logistics management, and e-commerce, the comparison of sequential pattern has a wide range of applications. The goal of the Top-k distinguishing sequential pattern mining is to find k patterns with the highest contrast in a given data set. However, in the Top-k distinguishing sequential pattern mining, there is a redundancy problem with respect to the set of reported sequential patterns, which is not considered by the algorithm. Therefore, in this paper, a non-redundant Top-k distinguishing sequential pattern mining algorithm, breadth-first miner (BFM), is proposed based on breadth-first spanning tree. The redundancy problem is effectively solved using the BFM algorithm. Based on the BFM algorithm, a better algorithm, pruning breadth-first miner (PBFM), is proposed. Through the experimental analysis and comparison on the real data set, the validity of the algorithm is verified.
备注/Memo
收稿日期:2017-02-26。
基金项目:国家自然科学基金项目(61572094);大学生创新创业训练计划项目(2017101410901010382).
作者简介:江冰,女,1995年生,硕士研究生,主要研究方向为数据挖掘和生物信息;谷飞洋,男,1991年生,硕士研究生,主要研究方向为数据挖掘和生物信息;何增有,男,1976年生,教授,博士生导师,博士,主要研究方向为数据挖掘、生物信息。获得省部级奖励3项。发表国际期刊论文40余篇,据Google Scholar统计,发表的论文被引用2600余次,出版英文学术专著1部。
通讯作者:何增有.E-mail:zyhe@dlut.edu.cn.
更新日期/Last Update:
2018-10-25