[1]普事业,刘三阳,白艺光.网络拓扑特征的不平衡数据分类[J].智能系统学报,2019,14(05):889-896.[doi:10.11992/tis.201812014]
 PU Shiye,LIU Sanyang,BAI Yiguang.Imbalanced data classification of network topology characteristics[J].CAAI Transactions on Intelligent Systems,2019,14(05):889-896.[doi:10.11992/tis.201812014]
点击复制

网络拓扑特征的不平衡数据分类(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第14卷
期数:
2019年05期
页码:
889-896
栏目:
出版日期:
2019-09-05

文章信息/Info

Title:
Imbalanced data classification of network topology characteristics
作者:
普事业 刘三阳 白艺光
西安电子科技大学 数学与统计学院, 陕西 西安 710126
Author(s):
PU Shiye LIU Sanyang BAI Yiguang
School of Mathematics and Statistics, Xidian University, Xi’an 710126, China
关键词:
不平衡数据相似度网络结构准确率拓扑物理特征
Keywords:
imbalanced datasimilaritynetwork structureaccuracy ratetopologyphysical feature
分类号:
TP391.9
DOI:
10.11992/tis.201812014
摘要:
现实中的数据集普遍具有非均衡性。针对不平衡分类问题,建立数据集网络结构来充分挖掘隐藏在样本点位置信息外的拓扑特征,分析网络节点的连接特性并赋予节点不同的效率。计算待测节点与每个子网络的相似性测度,依据新型的概率模型,进一步推算出该节点与各子网络的整体性测度。构建了一种基于网络拓扑特征的不平衡数据分类方法,算法中引入不平衡因子c用以减小由正负类样本数量差异所带来的影响。实验结果表明,该算法能有效提高分类精度,特别是对拓扑特征明显的数据集,在分类性能和适应能力上相比传统分类方法都得到进一步提升。
Abstract:
This paper aims to solve the imbalanced data classification problem, which has been proven to be common in real applications. The dataset network structure is established to fully mine the topological features hidden outside the position information of sample points, analyze the connection characteristics of network nodes, and give these nodes different efficiencies. The similarity measure between the node to be tested and each sub-network is calculated, and the integrity measure between the node and each sub-network is further calculated according to the new probability model. A classification method of imbalanced data based on network topology features is constructed. An imbalanced factor c is introduced into the algorithm to reduce the influence caused by the difference in the number of positive and negative samples. The experimental results show that the algorithm can effectively improve the classification accuracy, especially for datasets with significant topological features. The classification performance and adaptability are further improved compared with the traditional classification method.

参考文献/References:

[1] HE Haibo, GARCIA E A. Learning from imbalanced data[J]. IEEE transactions on knowledge and data engineering, 2009, 21(9):1263-1284.
[2] KHOSHGOFTAAR T M, GOLAWALA M, VAN HULSE J. An empirical study of learning from imbalanced data using random forest[C]//Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence. Patras, Greece, 2007:310-317.
[3] LIN Chunfu, WANG Shengde. Fuzzy support vector machines[J]. IEEE transactions on neural networks, 2002, 13(2):464-471.
[4] 程险峰, 李军, 李雄飞. 一种基于欠采样的不平衡数据分类算法[J]. 计算机工程, 2011, 37(13):147-149 CHENG Xianfeng, LI Jun, LI Xiongfei. Imbalanced data classification algorithm based on undersampling[J]. Computer engineering, 2011, 37(13):147-149
[5] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE:synthetic minority over-sampling technique[J]. Journal of artificial intelligence research, 2011, 16(1):321-357.
[6] VEROPOULOS K, CAMPBELL I C G, CRISTIANINI N. Controlling the sensitivity of support vector machines[C]//Proceedings of the International Joint Conference on Artificial Intelligence. Stockholm, Sweden, 1999:55-60.
[7] 张银峰, 郭华平, 职为梅, 等. 一种面向不平衡数据分类的组合剪枝方法[J]. 计算机工程, 2014, 40(6):157-161, 165 ZHANG Yinfeng, GUO Huaping, ZHI Weimei, et al. An ensemble pruning method for imbalanced data classification[J]. Computer engineering, 2014, 40(6):157-161, 165
[8] SILVA T C, ZHAO Liang. Network-based high level data classification[J]. IEEE transactions on neural networks and learning systems, 2012, 23(6):954-970.
[9] SILVA T C, ZHAO Liang. High-level pattern-based classification via tourist walks in networks[J]. Information sciences, 2015, 294:109-126.
[10] CARNEIRO M G, ZHAO Liang. Organizational data classification based on the importance concept of complex networks[J]. IEEE transactions on neural networks and learning systems, 2018, 29(8):3361-3373.
[11] BERTINI JR J R, ZHAO Liang, MOTTA R, et al. A nonparametric classification method based on K-associated graphs[J]. Information sciences, 2011, 181(24):5435-5456.
[12] LÜ Linyuan, ZHOU Tao. Link prediction in complex networks:A survey[J]. Physical A:statistical mechanics and its applications, 2011, 390(6):1150-1170.
[13] ZHANG Qianming, SHANG Mingsheng, LÜ Linyuan. Similarity-based classification in partially Labeled networks[J]. International journal of modern physical C, 2010, 21(6):813-824.
[14] BIRX D L, PIPENBERG S J. A complex mapping network for phase sensitive classification[J]. IEEE transactions on neural networks, 1993, 4(1):127-135.
[15] WANG Meng, FU Weijie, HAO Shijie, et al. Learning on big graph:label inference and regularization with anchor hierarchy[J]. IEEE transactions on knowledge and data engineering, 2017, 29(5):1101-1114.
[16] CONG Chen, LIU Tongliang, TAO Dacheng, et al. Deformed graph laplacian for semisupervised learning[J]. IEEE transactions on neural networks and learning systems, 2015, 26(10):2261-2274.
[17] 顾苏杭, 王士同. 基于数据点本身及其位置关系辅助信息挖掘的分类方法[J]. 模式识别与人工智能, 2018, 31(3):197-207 GU Suhang, WANG Shitong. Classification approach by mining betweenness information beyond data points themselves[J]. Pattern recognition and artificial intelligence, 2018, 31(3):197-207
[18] TSANG I W H, KWOK J T Y, ZURADA J M. Generalized core vector machines[J]. IEEE transactions on neural networks, 2006, 17(5):1126-1140.
[19] 赵自翔, 王广亮, 李晓东. 基于支持向量机的不平衡数据分类的改进欠采样方法[J]. 中山大学学报(自然科学版), 2012, 51(6):10-16 ZHAO Zixiang, WANG Guangliang, LI Xiaodong. An improved SVM based under-sampling method for classifying imbalanced data[J]. Acta Scientiarum Naturalium Universitatis Sunyatseni, 2012, 51(6):10-16

相似文献/References:

[1]徐立中,林志贵,黄凤辰.基于模糊证据理论的水质环境状态估计[J].智能系统学报,2007,2(03):79.
 XU Li-zhong,LIN Zhi-gui,Huang Feng-chen.Water environmental qual ity assessment based on fuzzy evidence theory[J].CAAI Transactions on Intelligent Systems,2007,2(05):79.
[2]王超,乔俊飞.参数自适应粒子群算法的给水管网优化研究[J].智能系统学报,2015,10(5):722.[doi:10.11992/tis.201410036]
 WANG Chao,QIAO Junfei.An parameter adaptive particle swarm optimization foroptimal design of water supply systems[J].CAAI Transactions on Intelligent Systems,2015,10(05):722.[doi:10.11992/tis.201410036]
[3]胡小生,温菊屏,钟勇.动态平衡采样的不平衡数据集成分类方法[J].智能系统学报,2016,11(2):257.[doi:10.11992/tis.201507015]
 HU Xiaosheng,WEN Juping,ZHONG Yong.Imbalanced data ensemble classification using dynamic balance sampling[J].CAAI Transactions on Intelligent Systems,2016,11(05):257.[doi:10.11992/tis.201507015]
[4]常亮,张伟涛,古天龙,等.知识图谱的推荐系统综述[J].智能系统学报,2019,14(02):207.[doi:10.11992/tis.201805001]
 CHANG Liang,ZHANG Weitao,GU Tianlong,et al.Review of recommendation systems based on knowledge graph[J].CAAI Transactions on Intelligent Systems,2019,14(05):207.[doi:10.11992/tis.201805001]
[5]程康明,熊伟丽.一种双优选的半监督回归算法[J].智能系统学报,2019,14(04):689.[doi:10.11992/tis.201805010]
 CHENG Kangming,XIONG Weili.A dual-optimal semi-supervised regression algorithm[J].CAAI Transactions on Intelligent Systems,2019,14(05):689.[doi:10.11992/tis.201805010]
[6]顾军华,谢志坚,武君艳,等.基于图游走的并行协同过滤推荐算法[J].智能系统学报,2019,14(04):743.[doi:10.11992/tis.201806002]
 GU Junhua,XIE Zhijian,WU Junyan,et al.Parallel collaborative filtering recommendation algorithm based on graph walk[J].CAAI Transactions on Intelligent Systems,2019,14(05):743.[doi:10.11992/tis.201806002]

备注/Memo

备注/Memo:
收稿日期:2018-12-12。
基金项目:国家自然科学基金项目(61877046);陕西省自然科学基金项目(2017JM1001).
作者简介:普事业,男,1993年生,硕士研究生,主要研究方向为数据挖掘、复杂网络;刘三阳,男,1959年生,教授,博士生导师,国家级教学名师,入选国家高层次人才万人计划领军人物,主要研究方向为最优化方法及其应用研究、系统建模、信息网络。先后主持国家自然科学基金项目5项、教育部项目10多项,获国家级教学成果奖3项。发表学术论文500余篇,包括全球热点论文和ESI高引论文及2015年中国百篇最具影响力学术论文,出版教材10余部,其中2部获国家级奖项;白艺光,男,1993年生,博士研究生,主要研究方向为复杂网络功能及鲁棒性、大规模并行优化在网络中的应用。发表学术论文7篇。
通讯作者:普事业.E-mail:psy2361@126.com
更新日期/Last Update: 1900-01-01