[1]江冰,谷飞洋,何增有.去冗余Top-k对比序列模式挖掘[J].智能系统学报,2018,13(05):680-686.[doi:10.11992/tis.201702019]
 JIANG Bing,GU Feiyang,HE Zengyou.Mining Top-k non-redundant distinguishing sequential patterns[J].CAAI Transactions on Intelligent Systems,2018,13(05):680-686.[doi:10.11992/tis.201702019]
点击复制

去冗余Top-k对比序列模式挖掘(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第13卷
期数:
2018年05期
页码:
680-686
栏目:
出版日期:
2018-09-05

文章信息/Info

Title:
Mining Top-k non-redundant distinguishing sequential patterns
作者:
江冰 谷飞洋 何增有
大连理工大学 软件学院, 辽宁 大连 116621
Author(s):
JIANG Bing GU Feiyang HE Zengyou
Software School, Dalian University of Technology, Dalian 116621, China
关键词:
对比序列模式广度优先冗余序列模式模式挖掘Top-k
Keywords:
distinguishing sequential patternbreadth-firstredundant sequential patternspattern miningTop-k
分类号:
TP393
DOI:
10.11992/tis.201702019
摘要:
对比序列模式可以用来表征不同类别数据集之间的差异。在生物信息、物流管理、电子商务等领域,对比序列模式有着广泛的应用。Top-k对比序列模式挖掘的目标是发现数据集中对比度最高的前k个序列模式。在Top-k对比序列模式挖掘中,可能挖掘出冗余的序列模式。目前,虽然有Top-k对比序列模式发现算法被提出,但这些算法并未考虑冗余序列模式的问题。为此,本文提出了基于广度优先生成树的去冗余Top-k对比序列模式挖掘算法BFM(breadth-first miner)。使用BFM算法可以有效地解决冗余问题,得到去冗余的Top-k对比序列模式。在BFM算法的基础上,提出了性能更好的算法PBFM(pruning breadth-first miner)。通过在真实数据集上的实验分析与对比 ,验证了本文算法的有效性。
Abstract:
Distinct sequential patterns can be used to characterize different categories of datasets. In the field of bioinformatics, logistics management, and e-commerce, the comparison of sequential pattern has a wide range of applications. The goal of the Top-k distinguishing sequential pattern mining is to find k patterns with the highest contrast in a given data set. However, in the Top-k distinguishing sequential pattern mining, there is a redundancy problem with respect to the set of reported sequential patterns, which is not considered by the algorithm. Therefore, in this paper, a non-redundant Top-k distinguishing sequential pattern mining algorithm, breadth-first miner (BFM), is proposed based on breadth-first spanning tree. The redundancy problem is effectively solved using the BFM algorithm. Based on the BFM algorithm, a better algorithm, pruning breadth-first miner (PBFM), is proposed. Through the experimental analysis and comparison on the real data set, the validity of the algorithm is verified.

参考文献/References:

[1] ZHANG Minghua, KAO Ben, CHEUNG D W, et al. Mining periodic patterns with gap requirement from sequences [J]. ACM transactions on knowledge discovery from data, 2007, 1(2):7.
[2] PEI Jian, WANG Haixun, LIU Jian, et al. Discovering frequent closed partial orders from strings[J]. IEEE transactions on knowledge and data engineering, 2006, 18(11):1467-1481.
[3] YAN Xifeng, HAN Jiawei, AFSHAR R. CloSpan:mining:closed sequential patterns in large datasets[C]//Proceedings of the 3rd SIAM International Conference on Data Mining. San Francisco, USA, 2003:166-177.
[4] JI Xiaonan, BAILEY J, DONG Guozhu. Mining minimal distinguishing subsequence patterns with gap constraints [J]. Knowledge and information systems, 2007, 11(3):259-286.
[5] ZAKI M J. SPADE:an efficient algorithm for mining frequent sequences[J]. Machine learning, 2001, 42(1/2):31-60.
[6] YANG Hao, DUAN Lei, DONG Guozhu, et al. Mining itemset-based distinguishing sequential patterns with gap constraint[C]//Proceedings of the 20th International Conference on Database Systems for Advanced Applications. Hanoi, Vietnam, 2015:39-54.
[7] 杨皓, 段磊, 胡斌, 等. 带间隔约束的Top-k对比序列模式挖掘[J]. 软件学报, 2015, 26(11):2994-3009 YANG Hao, DUAN Lei, HU Bin, et al. Mining Top-k distinguishing sequential patterns with gap constraint[J]. Journal of software, 2015, 26(11):2994-3009
[8] ZHENG Zhigang, WEI Wei, LIU Chunming, et al. An effective contrast sequential pattern mining approach to taxpayer behavior analysis[J]. World wide web, 2016, 19(4):633-651.
[9] GHOSH S, FENG Mengling, NGUYEN H, et al. Risk prediction for acute hypotensive patients by using gap constrained sequential contrast patterns[J]. AMIA annual symposium proceedings, 2014, 2014:1748-1757.
[10] FOURNIER-VIGER P, TSENG VS. Mining Top-K Non-redundant association rules[C]//Proceedings of the 20th International Symposium on Foundations of Intelligent Systems. Macau, China, 2012, 7661:31-40.
[11] FOURNIER-VIGER P, TSENG V S. TNS:mining top-k non-redundant sequential rules[C]//Proceedings of the 28th Symposium on Applied Computing. Coimbra, Portugal, 2013:164-166.
[12] KAMEYA Y, SATO T. RP-growth:Top-k mining of relevant patterns with minimum support raising[C]//Proceedings of the 12th SIAM International Conference on Data Mining. Anaheim, USA, 2012:816-827.
[13] CHAN S, KAO B, YIP C L, et al. Mining emerging substrings[C]//Proceedings of 8th International Conference on Database Systems for Advanced Applications. Kyoto, Japan, 2003:119-126.
[14] JI Xiaonan, BAILEY J, DONG Guozhu. Mining minimal distinguishing subsequence patterns with gap constraints [J]. Knowledge and information systems, 2007, 11(3):259-286.
[15] DENG Kang, ZAIANE O R. An occurrence based approach to mine emerging sequences[C]//Proceedings of the 12th International Conference on Data Warehousing and Knowledge Discovery. Bilbao, Spain, 2010:275-284.
[16] YU H H, CHEN Chunhao, TSENG V S. Mining emerging patterns from time series data with time gap constraint[J]. International journal of innovative computing information and control, 2011, 7(9):5515-5528.
[17] WANG Xianming, DUAN Lei, DONG Guozhu, et al. Efficient mining of density-aware distinguishing sequential patterns with gap constraints[C]//Proceedings of the 19th International Conference on Database Systems for Advanced Applications. Bali, Indonesia, 2014:372-387.

备注/Memo

备注/Memo:
收稿日期:2017-02-26。
基金项目:国家自然科学基金项目(61572094);大学生创新创业训练计划项目(2017101410901010382).
作者简介:江冰,女,1995年生,硕士研究生,主要研究方向为数据挖掘和生物信息;谷飞洋,男,1991年生,硕士研究生,主要研究方向为数据挖掘和生物信息;何增有,男,1976年生,教授,博士生导师,博士,主要研究方向为数据挖掘、生物信息。获得省部级奖励3项。发表国际期刊论文40余篇,据Google Scholar统计,发表的论文被引用2600余次,出版英文学术专著1部。
通讯作者:何增有.E-mail:zyhe@dlut.edu.cn.
更新日期/Last Update: 2018-10-25