[1]李海林,龙芳菊.基于同步频繁树的时间序列关联规则分析[J].智能系统学报,2021,16(3):502-510.[doi:10.11992/tis.202008012]
LI Hailin,LONG Fangju.Association rules analysis of time series based on synchronization frequent tree[J].CAAI Transactions on Intelligent Systems,2021,16(3):502-510.[doi:10.11992/tis.202008012]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
16
期数:
2021年第3期
页码:
502-510
栏目:
学术论文—知识工程
出版日期:
2021-05-05
- Title:
-
Association rules analysis of time series based on synchronization frequent tree
- 作者:
-
李海林1,2, 龙芳菊1
-
1. 华侨大学 信息管理系,福建 泉州 362021;
2. 华侨大学 现代应用统计与大数据研究中心,福建 厦门 361021
- Author(s):
-
LI Hailin1,2, LONG Fangju1
-
1. Department of Information Systems, Huaqiao University, Quanzhou 362021, China;
2. Research Center of Applied Statistics and Big Data, Huaqiao University, Xiamen 361021, China
-
- 关键词:
-
时间序列; 线性分段; 趋势项-位置; 事务集表示; 频繁项集; 同步频繁树; 关联规则; 时间效率
- Keywords:
-
time series; linear segmentation; trend item-location; transactionset representation; frequent itemsets; synchronize frequent trees; association rules; time efficiency
- 分类号:
-
TP311.13
- DOI:
-
10.11992/tis.202008012
- 摘要:
-
针对经典算法Apriori和频繁模式增长算法 (frequent pattern growth, FP-growth)不能直接对时间序列数据进行关联规则挖掘的问题,提出一种同步频繁树算法(synchronize frequent tree, SFT)。利用时间序列的时间属性具有一维性的特点,定义趋势项-位置表示法表示时间序列数据,将首条时间序列构建成一棵基础树,通过计算树叶子节点与列表项的信息交集,可判断其是否与该树枝中的所有节点构成频繁K项集。在SFT算法中,用趋势项-位置表示的数据内存占用情况要优于原始数据,并且在挖掘过程中不会产生候选频繁项集,使得算法在整个挖掘过程中表现出较好的时间性能。基于商品数据和股票数据的数值实验表明,SFT算法所得结果不仅与其他5种对比算法的结果一致,在各量级的数据和不同的支持度计数中,其时间复杂度都要优于对比算法。
- Abstract:
-
In this paper, a synchronization frequent tree (SFT) algorithm is proposed to solve the problem that the classic algorithms apriori and FP-growth can not directly mine the association rules of time series data. By making use of the time attribute of time series, which has one-dimensional characteristics, we define the trend item-position representation method to represent the time series data, construct a basic tree for the first time series, and then find the information between the leaf nodes of the tree and the list items by intersection, and then judge whether the item and all the nodes in the branch constitute a frequent K itemsets. In the SFT algorithm, the memory occupancy of the data represented by the trend item-location is better than that of the original data, and candidate frequent itemsets will not be generated during the mining process, which makes the algorithm show better time performance in the entire mining process. Numerical experiments based on commodity data and stock data show that the results of the SFT algorithm are consistent with the results of the comparison algorithm, and what’s more, in all levels of data, its time complexity is better than that of the comparison algorithm.
更新日期/Last Update:
2021-06-25