<-上一篇/Previous Article 下一篇/Next Article->

[1]邓思宇,刘福伦,黄雨婷,等.基于PageRank的主动学习算法[J].智能系统学报,2019,14(3):551-559.[doi:10.11992/tis.201804052]
　DENG Siyu,LIU Fulun,HUANG Yuting,et al.Active learning through PageRank[J].CAAI Transactions on Intelligent Systems,2019,14(3):551-559.[doi:10.11992/tis.201804052]

点击复制

基于PageRank的主动学习算法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 14 期数: 2019年第3期页码: 551-559 栏目: 学术论文—机器学习出版日期: 2019-05-05

Title:: Active learning through PageRank

作者:: 邓思宇¹, 刘福伦¹, 黄雨婷¹, 汪敏²; 1. 西南石油大学计算机科学学院, 四川成都 610500;
2. 西南石油大学电气信息学院, 四川成都 610500

Author(s):: DENG Siyu¹, LIU Fulun¹, HUANG Yuting¹, WANG Min²; 1. School of Computer Science, Southwest Petroleum University, Chengdu 610500, China;
2. School of Electrical Engineering and Information, Southwest Petroleum University, Chengdu 610500, China

关键词:: 分类; 主动学习; PageRank; 邻域; 聚类; 二叉树

Keywords:: classification; active learning; PageRank; neighborhood; clustering; binary tree

分类号:: TP181

DOI:: 10.11992/tis.201804052

摘要:: 在许多分类任务中，存在大量未标记的样本，并且获取样本标签耗时且昂贵。利用主动学习算法确定最应被标记的关键样本，来构建高精度分类器，可以最大限度地减少标记成本。本文提出一种基于PageRank的主动学习算法（PAL），充分利用数据分布信息进行有效的样本选择。利用PageRank根据样本间的相似度关系依次计算邻域、分值矩阵和排名向量；选择代表样本，并根据其相似度关系构建二叉树，利用该二叉树对代表样本进行聚类，标记和预测；将代表样本作为训练集，对其他样本进行分类。实验采用8个公开数据集，与5种传统的分类算法和3种流行的主动学习算法比较，结果表明PAL算法能取得更好的分类效果。

Abstract:: In many classification tasks, there are a large number of unlabeled samples, and it is expensive and time-consuming to obtain a label for each class. The goal of active learning is to train an accurate classifier with minimum cost by labeling the most informative samples. In this paper, we propose a PageRank-based active learning algorithm (PAL), which makes full use of sample distribution information for effective sample selection. First, based on the PageRank theory, we sequentially calculate the neighborhoods, score matrices, and ranking vectors based on similarity relationships in the data. Next, we select representative samples and establish a binary tree to express the relationships between representative samples. Then, we use a binary tree to cluster, label, and predict representative samples. Lastly, we regard the representative samples as training sets for classifying other samples. We conducted experiments on eight datasets to compare the performance of our proposed algorithm with those of five traditional classification algorithms and three state-of-the-art active learning algorithms. The results demonstrate that PAL obtained higher classification accuracy.

参考文献/References:: [1] MINN S, 傅顺开, 吕天依, 等. 一般贝叶斯网络分类器及其学习算法[J]. 计算机应用研究, 2016, 33(5):1327-1334 MINN S, FU Shunkai, LV Tianyi, et al. Algorithm for exact recovery of Bayesian network for classification[J]. Application research of computer, 2016, 33(5):1327-1334
[2] 王翔, 胡学钢, 杨秋洁. 基于One-R的改进随机森林入侵检测模型研究[J]. 合肥工业大学学报(自然科学版), 2015, 38(5):627-630, 711 WANG Xiang, HU Xuegang, YANG Qiujie. Research on improved intrusion detection model with random forest based on feature evaluation of One-R[J]. Journal of Hefei University of Technology (natural science), 2015, 38(5):627-630, 711
[3] YANG Yi, CHEN Wenguang. Taiga:performance optimization of the C4.5 decision tree construction algorithm[J]. Tsinghua science and technology, 2016, 21(4):415-425.
[4] ZHOU Xueyuan, BELKIN M. Semi-supervised learning[J]//Journal of the royal statistical society, 2010, 172(2):530.
[5] WANG Min, MIN Fan, ZHANG Zhiheng, et al. Active learning through density clustering[J]. Expert systems with applications, 2017, 85:305-317.
[6] 胡小娟, 刘磊, 邱宁佳. 基于主动学习和否定选择的垃圾邮件分类算法[J]. 电子学报, 2018, 46(1):203-209 HU Xiaojuan, LIU Lei, QIU Ningjia. A novel spam categorization algorithm based on active learning method and negative selection algorithm[J]. Acta electronica sinica, 2018, 46(1):203-209
[7] SYED A R, ROSENBERG A, KISLAL E. Supervised and unsupervised active learning for automatic speech recognition of low-resource languages[C]//Proceedings of 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. Shanghai, China, 2016:5320-5324.
[8] SUN Shujin, ZHONG Ping, XIAO H, et al. An MRF model-based active learning framework for the spectral-spatial classification of hyperspectral imagery[J]. IEEE journal of selected topics in signal processing, 2015, 9(6):1074-1088.
[9] YANG Yi, MA Zhigang, NIE Feiping, et al. Multi-class active learning by uncertainty sampling with diversity maximization[J]. International journal of computer vision, 2015, 113(2):113-127.
[10] XIONG Sicheng, AZIMI J, FERN X Z. Active learning of constraints for semi-supervised clustering[J]. IEEE transactions on knowledge and data engineering, 2014, 26(1):43-54.
[11] BLOODGOOD M. Support vector machine active learning algorithms with query-by-committee versus closest-to-hyperplane selection[C]//Proceedings of 2018 IEEE 12th International Conference on Semantic Computing. Laguna Hills, USA, 2018:148-155.
[12] BRIN SERGEY, PAGE Lawrence. The anatomy of a large-scale hypertextual web search engine[J]. Computer networks and ISDN systems, 1998, 30(1/7):107-117.
[13] DENG Zhenyun, ZHU Xiaoshu, CHENG Debo, et al. Efficient kNN classification algorithm for big data[J]. Neurocomputing, 2016, 195:143-148.
[14] GILAD-BACHRACH R, NAVOT A, TISHBY N. Kernel query by committee (KQBC)[R]. Technical Report 2003-88, Leibniz Center, the Hebrew University, 2003.
[15] CAI Deng, HE Xiaofei. Manifold adaptive experimental design for text categorization[J]. IEEE transactions on knowledge and data engineering, 2012, 24(4):707-719.
[16] MIN Fan, ZHU W. A competition strategy to cost-sensitive decision trees[C]//Proceedings of the 7th International Conference on Rough Sets and Knowledge Technology. Chengdu, China, 2012:359-368.
[17] 张桃, 吴小伟. 基于PageRank的马尔可夫链研究[J]. 电子设计工程, 2017, 25(9):36-38 ZHANG Tao, WU Xiaowei. The study of Markov chains based on PageRank[J]. Electronic design engineering, 2017, 25(9):36-38
[18] LIU Dun, LI Tianrui, LIANG Decui. Incorporating logistic regression to decision-theoretic rough sets for classifications[J]. International journal of approximate reasoning, 2014, 55(1):197-210.

相似文献/References:: [1]刘三阳杜喆.一种改进的模糊支持向量机算法[J].智能系统学报,2007,2(3):30.
　LIU San-yang,DU Zhe.An improved fuzzy support vector machine method[J].CAAI Transactions on Intelligent Systems,2007,2():30.
[2]富春岩,葛茂松.一种能够适应概念漂移变化的数据流分类方法[J].智能系统学报,2007,2(4):86.
　FU Chun-yan,GE Mao-song.A data stream classification methods adaptive to concept drift[J].CAAI Transactions on Intelligent Systems,2007,2():86.
[3]古丽娜孜,孙铁利,伊力亚尔,等.一种基于主动学习支持向量机哈萨克文文本分类方法[J].智能系统学报,2011,6(3):261.
　GU Linazi,SUN Tieli,YI Liyaer,et al.An approach to the text categorization of the Kazakh language based on an active learning support vector machine[J].CAAI Transactions on Intelligent Systems,2011,6():261.
[4]王定桥,李卫华,杨春燕.从用户需求语句建立问题可拓模型的研究[J].智能系统学报,2015,10(6):865.[doi:10.11992/tis.201507038]
　WANG Dingqiao,LI Weihua,YANG Chunyan.Research on building an extension model from user requirements[J].CAAI Transactions on Intelligent Systems,2015,10():865.[doi:10.11992/tis.201507038]
[5]王晓初,包芳,王士同,等.基于最小最大概率机的迁移学习分类算法[J].智能系统学报,2016,11(1):84.[doi:10.11992/tis.201505024]
　WANG Xiaochu,BAO Fang,WANG Shitong,et al.Transfer learning classification algorithms based on minimax probability machine[J].CAAI Transactions on Intelligent Systems,2016,11():84.[doi:10.11992/tis.201505024]
[6]刘威,刘尚,周璇.BP神经网络子批量学习方法研究[J].智能系统学报,2016,11(2):226.[doi:10.11992/tis.201509015]
　LIU Wei,LIU Shang,ZHOU Xuan.Subbatch learning method for BP neural networks[J].CAAI Transactions on Intelligent Systems,2016,11():226.[doi:10.11992/tis.201509015]
[7]李海林,梁叶.分段聚合近似和数值导数的动态时间弯曲方法[J].智能系统学报,2016,11(2):249.[doi:10.11992/tis.201507064]
　LI Hailin,LIANG Ye.Dynamic time warping based on piecewise aggregate approximation and data derivatives[J].CAAI Transactions on Intelligent Systems,2016,11():249.[doi:10.11992/tis.201507064]
[8]胡小生,温菊屏,钟勇.动态平衡采样的不平衡数据集成分类方法[J].智能系统学报,2016,11(2):257.[doi:10.11992/tis.201507015]
　HU Xiaosheng,WEN Juping,ZHONG Yong.Imbalanced data ensemble classification using dynamic balance sampling[J].CAAI Transactions on Intelligent Systems,2016,11():257.[doi:10.11992/tis.201507015]
[9]花小朋,孙一颗,丁世飞.一种改进的投影孪生支持向量机[J].智能系统学报,2016,11(3):384.[doi:10.11992/tis.201603049]
　HUA Xiaopeng,SUN Yike,DING Shifei.An improved projection twin support vector machine[J].CAAI Transactions on Intelligent Systems,2016,11():384.[doi:10.11992/tis.201603049]
[10]李晨曦,孙正兴,宋沫飞,等.一种三维模型最优视图的分类选择方法[J].智能系统学报,2014,9(1):12.[doi:10.3969/j.issn.1673-4785.201305004]
　LI Chenxi,SUN Zhengxing,SONG Mofei,et al.A classification-based approach for best view selection of 3D models[J].CAAI Transactions on Intelligent Systems,2014,9():12.[doi:10.3969/j.issn.1673-4785.201305004]

备注/Memo

收稿日期:2018-04-26。
基金项目:国家自然科学基金项目（61379089）.
作者简介:邓思宇,女,1993年生,硕士研究生,主要研究方向为代价敏感学习、主动学习;刘福伦,男,1993年生,硕士研究生,主要研究方向为代价敏感学习、粗糙集、主动学习;黄雨婷,女,1996年生,主要研究方向为推荐系统。
通讯作者:汪敏.E-mail:wangmin80616@163.com

更新日期/Last Update: 1900-01-01

基于PageRank的主动学习算法 PDF下载HTML

备注/Memo

基于PageRank的主动学习算法

PDF下载 HTML