[1]李海林,龙芳菊.基于同步频繁树的时间序列关联规则分析[J].智能系统学报,2021,16(3):502-510.[doi:10.11992/tis.202008012]
 LI Hailin,LONG Fangju.Association rules analysis of time series based on synchronization frequent tree[J].CAAI Transactions on Intelligent Systems,2021,16(3):502-510.[doi:10.11992/tis.202008012]
点击复制

基于同步频繁树的时间序列关联规则分析(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第16卷
期数:
2021年3期
页码:
502-510
栏目:
学术论文—知识工程
出版日期:
2021-05-05

文章信息/Info

Title:
Association rules analysis of time series based on synchronization frequent tree
作者:
李海林12 龙芳菊1
1. 华侨大学 信息管理系,福建 泉州 362021;
2. 华侨大学 现代应用统计与大数据研究中心,福建 厦门 361021
Author(s):
LI Hailin12 LONG Fangju1
1. Department of Information Systems, Huaqiao University, Quanzhou 362021, China;
2. Research Center of Applied Statistics and Big Data, Huaqiao University, Xiamen 361021, China
关键词:
时间序列线性分段趋势项-位置事务集表示频繁项集同步频繁树关联规则时间效率
Keywords:
time serieslinear segmentationtrend item-locationtransactionset representationfrequent itemsetssynchronize frequent treesassociation rulestime efficiency
分类号:
TP311.13
DOI:
10.11992/tis.202008012
摘要:
针对经典算法Apriori和频繁模式增长算法 (frequent pattern growth, FP-growth)不能直接对时间序列数据进行关联规则挖掘的问题,提出一种同步频繁树算法(synchronize frequent tree, SFT)。利用时间序列的时间属性具有一维性的特点,定义趋势项-位置表示法表示时间序列数据,将首条时间序列构建成一棵基础树,通过计算树叶子节点与列表项的信息交集,可判断其是否与该树枝中的所有节点构成频繁K项集。在SFT算法中,用趋势项-位置表示的数据内存占用情况要优于原始数据,并且在挖掘过程中不会产生候选频繁项集,使得算法在整个挖掘过程中表现出较好的时间性能。基于商品数据和股票数据的数值实验表明,SFT算法所得结果不仅与其他5种对比算法的结果一致,在各量级的数据和不同的支持度计数中,其时间复杂度都要优于对比算法。
Abstract:
In this paper, a synchronization frequent tree (SFT) algorithm is proposed to solve the problem that the classic algorithms apriori and FP-growth can not directly mine the association rules of time series data. By making use of the time attribute of time series, which has one-dimensional characteristics, we define the trend item-position representation method to represent the time series data, construct a basic tree for the first time series, and then find the information between the leaf nodes of the tree and the list items by intersection, and then judge whether the item and all the nodes in the branch constitute a frequent K itemsets. In the SFT algorithm, the memory occupancy of the data represented by the trend item-location is better than that of the original data, and candidate frequent itemsets will not be generated during the mining process, which makes the algorithm show better time performance in the entire mining process. Numerical experiments based on commodity data and stock data show that the results of the SFT algorithm are consistent with the results of the comparison algorithm, and what’s more, in all levels of data, its time complexity is better than that of the comparison algorithm.

参考文献/References:

[1] 陈海燕, 刘晨晖, 孙博. 时间序列数据挖掘的相似性度量综述[J]. 控制与决策, 2017, 32(1):1-11
CHEN Haiyan, LIU Chenhui, SUN Bo. Survey on similarity measurement of time series data mining[J]. Control and decision, 2017, 32(1):1-11
[2] ACHEBAK H, DEVOLDER D, BALLESTER J. Trends in temperature-related age-specific and sex-specific mortality from cardiovascular diseases in Spain:a national time-series analysis[J]. The lancet planetary health, 2019, 3(7):e297-e306.
[3] 李海林, 梁叶. 基于关键形态特征的多元时间序列降维方法[J]. 控制与决策, 2020, 35(3):629-636
LI Hailin, LIANG Ye. Dimension reduction for multivariate time series based on crucial shape features[J]. Control and decision, 2020, 35(3):629-636
[4] 程小林, 郑兴, 李旭伟. 基于概率后缀树的股票时间序列预测方法研究[J]. 四川大学学报, 2018, 55(1):61-66
CHENG Xiaolin, ZHENG Xing, LI Xuwei. Research of stock time based on probabilistic suffix tree[J]. Journal of Sichuan University, 2018, 55(1):61-66
[5] 王 玲, 徐培培, 彭开香. 基于因子模型和动态规划的多元时间序列分段方法[J]. 控制与决策, 2020, 35(1):35-44
WANG Ling, XU Peipei, PENG Kaixiang. Segmentation of multivariate time series with factor model and dynamic programming[J]. Control and decision, 2020, 35(1):35-44
[6] AGRAWAL R, IMIELI?SKI T, SWAMI A. Mining association rules between sets of items in large databases[C]//Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. Washington, USA, 1993:207-216.
[7] AGRAWAL R, SRIKANT R. Mining sequential patterns[C]//Proceedings of the 11th International Conference on Data Engineering. Taipei, China, 1995:3-14.
[8] 魏玲, 魏永江, 高长元. 基于Bigtable与MapReduce的Apriori算法改进[J]. 计算机科学, 2015, 42(10):208-210, 243
WEI Lin, WEI Yongjiang, GAO Changyuan. Improved Apriori algorithm based on bigtable and MapReduce[J]. Computer science, 2015, 42(10):208-210, 243
[9] KARIM R, HOSSAIN A, RASHID M, et al. A MapReduce framework for mining maximal contiguous frequent patterns in large DNA sequence datasets[J]. IETE technical review, 2012, 29(2):162-168.
[10] ZHANG Xiaolu. Pythagorean fuzzy clustering analysis:a hierarchical clustering algorithm with the ratio index-based ranking methods[J]. International journal of intelligent systems, 2018,33(9):1798-1822.
[11] TRAN T N, DRAB K, DASZYKOWSKI M. Revised DBSCAN algorithm to cluster data with dense adjacent clusters[J]. Chemometrics and intelligent laboratory systems. 2013,120(15):92-96.
[12] 杨秋翔, 孙涵. 基于权值向量矩阵约简的Apriori算法[J]. 计算机工程与设计, 2018, 39(3):690-693,762
YANG Qiuxiang, SUN Han. Apriori algorithm based on weight vector matrix reduction[J]. Computer engineering and design, 2018, 39(3):690-693,762
[13] HAN Jiawei, PEI Jian, YIN Yiwen. Mining frequent patterns without candidate generation[J]. ACM sigmod record, 2000, 29(2):1-12.
[14] DAS G, LIN K I, MANNILA H, et al. Rule discovery from time series[C]//Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. New York, USA, 1998:16-22.
[15] VELUMANI B, UMAJOTHY P. Mining temporal association rules from time series microarray using apriori algorithm[J]. Review of bioinformatics and biometrics, 2013, 2(2):29-36.
[16] 赵益. 多时间序列上时序关联规则的挖掘[D]. 上海:东华大学, 2018.
ZHAO Yi. Discovery of tempopal assocition rules in multivariate time series[D], Shang Hai:Donghua University, 2018.
[17] CHEN Yicheng, PENG W C, LEE S Y. CEMiner-An efficient algorithm for mining closed patterns from time interval-based data[C]//Proceedings of the IEEE 11th International Conference on Data Mining. Vancouver, Canada, 2011:121-130.
[18] RUAN Guangchen, ZHANG Hui, PLALE B. Parallel and quantitative sequential pattern mining for large-scale interval-based temporal data[C]//Proceedings of 2014 IEEE International Conference on Big Data (Big Data). Washington, USA, 2014:32-39.
[19] SCHLüTER T, CONRAD S. Mining several kinds of temporal association rules enhanced by tree structures[C]//Proceedings of the 2nd International Conference on Information, Process, and Knowledge Management. Saint Maarten, Netherland Antilles, 2010:86-93.
[20] RASHID M M, GONDAL I, KAMRUZZAMAN J. Mining associated patterns from wireless sensor networks[J]. IEEE transactions on computers, 2015, 64(7):1998-2011.
[21] PANKAJ G, SAGAR B B. Discovering weighted calendar-based temporal relationship rules using frequent pattern tree[J]. Indian journal of science and technology, 2016, 9(28):1-6.
[22] 马慧, 汤庸, 潘炎. 一种基于FP-树的时态关联规则的分区挖掘方法[J]. 计算机工程, 2006, 32(17):132-134.
MA Hui, TANG Yong, PAN Yan. A FP-tree based partition mining approach to discovering temporal association rules[J]. Computer engineering, 2006, 32(17):132-134.
[23] 张建业, 潘泉, 张鹏等. 基于斜率表示的时间序列相似性度量方法[J]. 模式识别与人工智能, 2007, 20(2):271-274.
ZHANG Jianye, PAN Quan, ZHANG Peng, et al. Similarity measuring method in time series based on slope[J]. Pattern recognition and artificial intelligence, 2007, 20(2):271-274.
[24] SALEM M Z. Effects of perfume packaging on Basque female consumers purchase decision in Spain[J]. Management decision, 2018, 56(8):1748-1768.
[25] LI Hailin, WU Y J, CHEN Yewang. Time is money:dynamic-model-based time series data-mining for correlation analysis of commodity sales[J]. Journal of computational and applied mathematics, 2020, 370:112659.

相似文献/References:

[1]李海林,梁叶.分段聚合近似和数值导数的动态时间弯曲方法[J].智能系统学报,2016,11(2):249.[doi:10.11992/tis.201507064]
 LI Hailin,LIANG Ye.Dynamic time warping based on piecewise aggregate approximation and data derivatives[J].CAAI Transactions on Intelligent Systems,2016,11(3):249.[doi:10.11992/tis.201507064]
[2]彭昱忠,元昌安,李洁,等.个体最优共享GEP算法及其气象降水数据预测建模[J].智能系统学报,2016,11(3):401.[doi:10.11992/tis.201603035]
 PENG Yuzhong,YUAN Changan,LI Jie,et al.Individual optimal sharing GEP algorithm and its application in forecast modeling of meteorological precipitation[J].CAAI Transactions on Intelligent Systems,2016,11(3):401.[doi:10.11992/tis.201603035]
[3]李海林,梁叶.标签传播时间序列聚类的股指期货套期保值策略研究[J].智能系统学报,2019,14(2):288.[doi:10.11992/tis.201707023]
 LI Hailin,LIANG Ye.Research on the stock index futures hedging strategy using label propagation time series clustering[J].CAAI Transactions on Intelligent Systems,2019,14(3):288.[doi:10.11992/tis.201707023]

备注/Memo

备注/Memo:
收稿日期:2020-08-12。
基金项目:国家自然科学基金项目(71771094,61300139);福建省自然科学基金项目(2019J01067);福建省社会科学规划一般项目(FJ2020B088)
作者简介:李海林,教授,博士生导师,主要研究方向为数据挖掘与决策支持。主持国家自然科学基金项目2项、省部级基金项目4项。发表学术论文60余篇;龙芳菊,硕士研究生,主要研究方向为数据挖掘与企业管理
通讯作者:李海林.E-mail:hailin@hqu.edu.cn
更新日期/Last Update: 2021-06-25