<-上一篇/Previous Article 下一篇/Next Article->

[1]邵东恒,杨文元,赵红.应用k-means算法实现标记分布学习[J].智能系统学报,2017,12(3):325-332.[doi:10.11992/tis.201704024]
　SHAO Dongheng,YANG Wenyuan,ZHAO Hong.Label distribution learning based on k-means algorithm[J].CAAI Transactions on Intelligent Systems,2017,12(3):325-332.[doi:10.11992/tis.201704024]

点击复制

应用k-means算法实现标记分布学习

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 12 期数: 2017年第3期页码: 325-332 栏目: 学术论文—智能系统出版日期: 2017-06-25

Title:: Label distribution learning based on k-means algorithm

作者:: 邵东恒, 杨文元, 赵红; 闽南师范大学粒计算重点实验室, 福建漳州 363000

Author(s):: SHAO Dongheng, YANG Wenyuan, ZHAO Hong; Lab of Granular Computing, Minnan Normal University, Zhangzhou 363000, China

关键词:: 标记分布; 聚类; k-means; 闵可夫斯基距离; 多标记; 权重矩阵; 均值向量; softmax函数

Keywords:: label distribution; clustering; k-means; Minkowski distance; multi-label; weight matrix; mean vector; softmax function

分类号:: TP181

DOI:: 10.11992/tis.201704024

摘要:: 标记分布学习是近年来提出的一种新的机器学习范式，它能很好地解决某些标记多义性的问题。现有的标记分布学习算法均利用条件概率建立参数模型，但未能充分利用特征和标记间的联系。本文考虑到特征相似的样本所对应的标记分布也应当相似，利用原型聚类的k均值算法（k-means），将训练集的样本进行聚类，提出基于k-means算法的标记分布学习（label distribution learning based on k-means algorithm，LDLKM）。首先通过聚类算法k-means求得每一个簇的均值向量，然后分别求得对应标记分布的均值向量。最后将测试集和训练集的均值向量间的距离作为权重，应用到对测试集标记分布的预测上。在6个公开的数据集上进行实验，并与3种已有的标记分布学习算法在5种评价指标上进行比较，实验结果表明提出的LDLKM算法是有效的。

Abstract:: Label distribution learning is a new type of machine learning paradigm that has emerged in recent years. It can solve the problem wherein different relevant labels have different importance. Existing label distribution learning algorithms adopt the parameter model with conditional probability, but they do not adequately exploit the relation between features and labels. In this study, the k-means clustering algorithm, a type of prototype-based clustering, was used to cluster the training set instance since samples having similar features have similar label distribution. Hence, a new algorithm known as label distribution learning based on k-means algorithm (LDLKM) was proposed. It firstly calculated each cluster’s mean vector using the k-means algorithm. Then, it got the mean vector of the label distribution corresponding to the training set. Finally, the distance between the mean vectors of the test set and the training set was applied to predict label distribution of the test set as a weight. Experiments were conducted on six public data sets and then compared with three existing label distribution learning algorithms for five types of evaluation measures. The experimental results demonstrate the effectiveness of the proposed KM-LDL algorithm.

参考文献/References:: [1] ZHANG M L, ZHOU Z H. A review on multi-label learning algorithms[J]. IEEE transactions on knowledge and data engineering, 2014, 26(8): 1819-1837.
[2] WEI Yunchao, XIA Wei, HUANG Junshi, et al. CNN: Single-label to multi-label[J]. Computer science, 2014,11: 26-56.
[3] TSOUMAKAS G, KATAKIS I, TANIAR D. Multi-label classification: an overview[J]. International journal of data warehousing and mining, 2007, 3(3): 1-13.
[4] READ J, PFAHRINGER B, HOLMES G, et al. Classifier chains for multi-label classification[J]. Machine learning, 2011, 85(3): 333-359.
[5] READ J, PFAHRINGER B, HOLMES G. Multi-label classification using ensembles of pruned sets[C]//Proceedings of Eighth IEEE International Conference on Data Mining, Pisa, Italy, 2008. Washington, USA: IEEE Computer Society, 2008: 995-1000.
[6] EISEN M B, SPELLMAN P T, BROWN P O, et al. Cluster analysis and display of genome-wide expression patterns[J]. Proceedings of the national academy of sciences of the united states of America, 1998, 95(25): 14863-14868.
[7] Geng X. Label distribution learning[J]. IEEE transactions on knowledge and data engineering, 2014, 28(7): 1734-1748.
[8] 季荣姿. 标记分布学习及其应用[D]. 南京:东南大学, 2014.JI Rongzi. Label distribution learning and its application[D].Nanjing: Southeast University, 2014.
[9] ZHANG Z, WANG M, GENG X. Crowd counting in public video surveillance by label distribution learning[J]. Neurocomputing, 2015, 166(C): 151-163.
[10] GENG X, WANG Q, XIA Y. Facial age estimation by adaptive label distribution learning[C]//Proceedings of IEEE International Conference on Pattern Recognition, Stockholm, Sweden, 2014. Washington, USA: IEEE Computer Society, 2014: 4465-4470.
[11] GENG X, XIA Y. Head pose estimation based on multivariate label distribution[C]//Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014. Washington, USA: IEEE Computer Society, 2014:1837-1842.
[12] GENG X, HOU P. Pre-release prediction of crowd opinion on movies by label distribution learning[C]//Proceedings of the International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 2015. San Francisco, USA:Morgan Kaufmann, 2015: 3511-3517.
[13] GENG X, YIN C, ZHOU Z H. Facial age estimation by learning from label distributions.[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(10): 2401-2412.
[14] JAIN A K. Data clustering: a review[J]. ACM computing surveys, 1999, 31(3): 264-323.
[15] 程旸, 王士同. 基于局部保留投影的多可选聚类发掘算法[J]. 智能系统学报, 2016, 11(5): 600-607. CHENG Yang, WANG Shitong. A multiple alternative clusterings mining algorithm using locality preserving projections[J]. CAAI transactions on intelligent systems, 2016, 11(5): 600-607.
[16] HARTIGAN J A, WONG M A. A k-means clustering algorithm[J]. Applied statistics, 2013, 28(1): 100-108.
[17] 申彦, 朱玉全. CMP上基于数据集划分的k-means多核优化算法[J]. 智能系统学报, 2015(4):607-614. SHEN Yan, ZHU Yuquan. An optimized algorithm of k-means based on data set partition on CMP systems[J]. CAAI transactions on intelligent systems, 2015, 10(4): 607-614.
[18] GROENEN P J F, KAYMAK U, VAN Rosmalen J. Fuzzy clustering with minkowski distance functions[J]. Fuzzy sets and systems, 2001, 120(2): 227-237.
[19] 赵权, 耿新. 标记分布学习中目标函数的选择[J]. 计算机科学与探索, 2017,11(5): 1-12.ZHAO Quan, GENG Xin. Selection of target function in label distribution learning[J]. Journal of frontiers of computer science and technology, 2017,11(5): 1-12.
[20] 周志华. 机器学习[M]. 北京:清华大学出版社, 2016.
[21] ALOISE D, DESHPANDE A, HANSEN P, et al. NP-hardness of euclidean sum-of-squares clustering[J]. Machine learning, 2009, 75(2): 245-248.
[22] CHA S H. Comprehensive survey on distance/similarity measures between probability density functions [J]. International journal of mathematical models and methods in applied sciences, 2007, 1(4): 300-307.
[23] AHONEN T, HADID A, PIETIKÄINEN M. Face description with local binary patterns: application to face recognition[J]. IEEE trans pattern anal mach intell, 2006, 28(12): 2037-2041.
[24] YU J F, JIANG D K, XIAO K, et al. Discriminate the falsely predicted protein-coding genes in Aeropyrum Pernix K1 genome based on graphical representation[J]. Match communications in mathematical and in computer chemistry, 2012, 67(3): 845-866.
[25] 周治平, 王杰锋, 朱书伟,等. 一种改进的自适应快速AF-DBSCAN聚类算法[J]. 智能系统学报, 2016, 11(1):93-98. ZHOU Zhiping, WANG Jiefeng, ZHU Shuwei, et al. An improved adaptive and fast AF-DBSCAN clustering algorithm[J]. CAAI transaction on intelligent systems, 2016, 11(1): 93-98.

相似文献/References:: [1]杨小兵,何灵敏,孔繁胜.切换回归模型的抗噪音聚类算法[J].智能系统学报,2009,4(6):497.[doi:10.3969/j.issn.1673-4785.2009.06.005]
　YANG Xiao-bing,HE Ling-min,KONG Fan-sheng.A noise-resistant clustering algorithm for switching regression models[J].CAAI Transactions on Intelligent Systems,2009,4():497.[doi:10.3969/j.issn.1673-4785.2009.06.005]
[2]季瑞瑞,刘?? 丁.支持向量数据描述的基因表达数据聚类方法[J].智能系统学报,2009,4(6):544.[doi:10.3969/j.issn.1673-4785.2009.06.013]
　JI Rui-rui,LIU Ding.Improved gene expression data clustering using a support vector domain description algorithm[J].CAAI Transactions on Intelligent Systems,2009,4():544.[doi:10.3969/j.issn.1673-4785.2009.06.013]
[3]张秀玲,逄宗鹏,李少清,等.ANFIS的板形控制动态影响矩阵方法[J].智能系统学报,2010,5(4):360.
　ZHANG Xiu-ling,PANG Zong-peng,LI Shao-qing,et al.A dynamic influence matrix method for flatness control based on adaptivenetworkbased fuzzy inference systems[J].CAAI Transactions on Intelligent Systems,2010,5():360.
[4]李伟,杨晓峰,张重阳,等.复杂网络社团的投影聚类划分[J].智能系统学报,2011,6(1):57.
　LI Wei,YANG Xiaofeng,ZHANG Chongyang,et al.A clustering method for community detection on complex networks[J].CAAI Transactions on Intelligent Systems,2011,6():57.
[5]陈岳峰,苗夺谦,李文,等.基于概念的词汇情感倾向识别方法[J].智能系统学报,2011,6(6):489.
　CHEN Yuefeng,MIAO Duoqian,LI Wen,et al.Semantic orientation computing based on concepts[J].CAAI Transactions on Intelligent Systems,2011,6():489.
[6]方然,苗夺谦,张志飞.一种基于情感的中文微博话题检测方法[J].智能系统学报,2013,8(3):208.
　FANG Ran,MIAO Duoqian,ZHANG Zhifei.An emotion-based method of topic detection from Chinese microblogs[J].CAAI Transactions on Intelligent Systems,2013,8():208.
[7]刘恋,常冬霞,邓勇.动态小生境人工鱼群算法的图像分割[J].智能系统学报,2015,10(5):669.[doi:10.11992/tis.201501001]
　LIU lian,CHANG Dongxia,DENG Yong.An image segmentation method based on dynamic niche artificial fish-swarm algorithm[J].CAAI Transactions on Intelligent Systems,2015,10():669.[doi:10.11992/tis.201501001]
[8]刘贝贝,马儒宁,丁军娣.基于密度的统计合并聚类算法[J].智能系统学报,2015,10(5):712.[doi:10.11992/tis.201410028]
　LIU Beibei,MA Runing,DING Jundi.Density-based statistical merging clustering algorithm[J].CAAI Transactions on Intelligent Systems,2015,10():712.[doi:10.11992/tis.201410028]
[9]朱书伟,周治平,张道文.融合并行混沌萤火虫算法的K-调和均值聚类[J].智能系统学报,2015,10(6):872.[doi:10.11992/tis.201505043]
　ZHU Shuwei,ZHOU Zhiping,ZHANG Daowen.K-harmonic means clustering merged with parallel chaotic firefly algorithm[J].CAAI Transactions on Intelligent Systems,2015,10():872.[doi:10.11992/tis.201505043]
[10]谷飞洋,田博,张思萌,等.基于置换检验的聚类结果评估[J].智能系统学报,2016,11(3):301.[doi:10.11992/tis.201603038]
　GU Feiyang,TIAN Bo,ZHANG Simeng,et al.Statistical evaluation of the clustering results based on permutation test[J].CAAI Transactions on Intelligent Systems,2016,11():301.[doi:10.11992/tis.201603038]

备注/Memo

收稿日期:2017-04-19。
基金项目:国家自然科学基金项目（61379049，61379089）.
作者简介:邵东恒,男,1992年生,硕士研究生,主要研究方向为标记分布学习;杨文元,男,1967年生,副教授,博士,主要研究方向为机器学习、标记分布学习。发表学术论文20余篇;赵红,女,1979年生,副教授,主要研究方向为粒计算、分层分类学习。发表学术论文40余篇。
通讯作者:杨文元.E-mail:yangwy@xmu.edu.cn.

更新日期/Last Update: 2017-06-25

应用k-means算法实现标记分布学习 PDF下载HTML

备注/Memo

应用k-means算法实现标记分布学习

PDF下载 HTML