DENG Siyu,LIU Fulun,HUANG Yuting,et al.Active learning through PageRank[J].CAAI Transactions on Intelligent Systems,2019,14(03):551-559.[doi:10.11992/tis.201804052]





Active learning through PageRank
邓思宇1 刘福伦1 黄雨婷1 汪敏2
1. 西南石油大学 计算机科学学院, 四川 成都 610500;
2. 西南石油大学 电气信息学院, 四川 成都 610500
DENG Siyu1 LIU Fulun1 HUANG Yuting1 WANG Min2
1. School of Computer Science, Southwest Petroleum University, Chengdu 610500, China;
2. School of Electrical Engineering and Information, Southwest Petroleum University, Chengdu 610500, China
classificationactive learningPageRankneighborhoodclusteringbinary tree
In many classification tasks, there are a large number of unlabeled samples, and it is expensive and time-consuming to obtain a label for each class. The goal of active learning is to train an accurate classifier with minimum cost by labeling the most informative samples. In this paper, we propose a PageRank-based active learning algorithm (PAL), which makes full use of sample distribution information for effective sample selection. First, based on the PageRank theory, we sequentially calculate the neighborhoods, score matrices, and ranking vectors based on similarity relationships in the data. Next, we select representative samples and establish a binary tree to express the relationships between representative samples. Then, we use a binary tree to cluster, label, and predict representative samples. Lastly, we regard the representative samples as training sets for classifying other samples. We conducted experiments on eight datasets to compare the performance of our proposed algorithm with those of five traditional classification algorithms and three state-of-the-art active learning algorithms. The results demonstrate that PAL obtained higher classification accuracy.


[1] MINN S, 傅顺开, 吕天依, 等. 一般贝叶斯网络分类器及其学习算法[J]. 计算机应用研究, 2016, 33(5):1327-1334 MINN S, FU Shunkai, LV Tianyi, et al. Algorithm for exact recovery of Bayesian network for classification[J]. Application research of computer, 2016, 33(5):1327-1334
[2] 王翔, 胡学钢, 杨秋洁. 基于One-R的改进随机森林入侵检测模型研究[J]. 合肥工业大学学报(自然科学版), 2015, 38(5):627-630, 711 WANG Xiang, HU Xuegang, YANG Qiujie. Research on improved intrusion detection model with random forest based on feature evaluation of One-R[J]. Journal of Hefei University of Technology (natural science), 2015, 38(5):627-630, 711
[3] YANG Yi, CHEN Wenguang. Taiga:performance optimization of the C4.5 decision tree construction algorithm[J]. Tsinghua science and technology, 2016, 21(4):415-425.
[4] ZHOU Xueyuan, BELKIN M. Semi-supervised learning[J]//Journal of the royal statistical society, 2010, 172(2):530.
[5] WANG Min, MIN Fan, ZHANG Zhiheng, et al. Active learning through density clustering[J]. Expert systems with applications, 2017, 85:305-317.
[6] 胡小娟, 刘磊, 邱宁佳. 基于主动学习和否定选择的垃圾邮件分类算法[J]. 电子学报, 2018, 46(1):203-209 HU Xiaojuan, LIU Lei, QIU Ningjia. A novel spam categorization algorithm based on active learning method and negative selection algorithm[J]. Acta electronica sinica, 2018, 46(1):203-209
[7] SYED A R, ROSENBERG A, KISLAL E. Supervised and unsupervised active learning for automatic speech recognition of low-resource languages[C]//Proceedings of 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. Shanghai, China, 2016:5320-5324.
[8] SUN Shujin, ZHONG Ping, XIAO H, et al. An MRF model-based active learning framework for the spectral-spatial classification of hyperspectral imagery[J]. IEEE journal of selected topics in signal processing, 2015, 9(6):1074-1088.
[9] YANG Yi, MA Zhigang, NIE Feiping, et al. Multi-class active learning by uncertainty sampling with diversity maximization[J]. International journal of computer vision, 2015, 113(2):113-127.
[10] XIONG Sicheng, AZIMI J, FERN X Z. Active learning of constraints for semi-supervised clustering[J]. IEEE transactions on knowledge and data engineering, 2014, 26(1):43-54.
[11] BLOODGOOD M. Support vector machine active learning algorithms with query-by-committee versus closest-to-hyperplane selection[C]//Proceedings of 2018 IEEE 12th International Conference on Semantic Computing. Laguna Hills, USA, 2018:148-155.
[12] BRIN SERGEY, PAGE Lawrence. The anatomy of a large-scale hypertextual web search engine[J]. Computer networks and ISDN systems, 1998, 30(1/7):107-117.
[13] DENG Zhenyun, ZHU Xiaoshu, CHENG Debo, et al. Efficient kNN classification algorithm for big data[J]. Neurocomputing, 2016, 195:143-148.
[14] GILAD-BACHRACH R, NAVOT A, TISHBY N. Kernel query by committee (KQBC)[R]. Technical Report 2003-88, Leibniz Center, the Hebrew University, 2003.
[15] CAI Deng, HE Xiaofei. Manifold adaptive experimental design for text categorization[J]. IEEE transactions on knowledge and data engineering, 2012, 24(4):707-719.
[16] MIN Fan, ZHU W. A competition strategy to cost-sensitive decision trees[C]//Proceedings of the 7th International Conference on Rough Sets and Knowledge Technology. Chengdu, China, 2012:359-368.
[17] 张桃, 吴小伟. 基于PageRank的马尔可夫链研究[J]. 电子设计工程, 2017, 25(9):36-38 ZHANG Tao, WU Xiaowei. The study of Markov chains based on PageRank[J]. Electronic design engineering, 2017, 25(9):36-38
[18] LIU Dun, LI Tianrui, LIANG Decui. Incorporating logistic regression to decision-theoretic rough sets for classifications[J]. International journal of approximate reasoning, 2014, 55(1):197-210.


[1]刘三阳 杜喆.一种改进的模糊支持向量机算法[J].智能系统学报,2007,2(03):30.
 LIU San-yang,DU Zhe.An improved fuzzy support vector machine method[J].CAAI Transactions on Intelligent Systems,2007,2(03):30.
 FU Chun-yan,GE Mao-song.A data stream classification methods adaptive to concept drift[J].CAAI Transactions on Intelligent Systems,2007,2(03):86.
 GU Linazi,SUN Tieli,YI Liyaer,et al.An approach to the text categorization of the Kazakh language based on an active learning support vector machine[J].CAAI Transactions on Intelligent Systems,2011,6(03):261.
 WANG Dingqiao,LI Weihua,YANG Chunyan.Research on building an extension model from user requirements[J].CAAI Transactions on Intelligent Systems,2015,10(03):865.[doi:10.11992/tis.201507038]
 WANG Xiaochu,BAO Fang,WANG Shitong,et al.Transfer learning classification algorithms based on minimax probability machine[J].CAAI Transactions on Intelligent Systems,2016,11(03):84.[doi:10.11992/tis.201505024]
 LIU Wei,LIU Shang,ZHOU Xuan.Subbatch learning method for BP neural networks[J].CAAI Transactions on Intelligent Systems,2016,11(03):226.[doi:10.11992/tis.201509015]
 LI Hailin,LIANG Ye.Dynamic time warping based on piecewise aggregate approximation and data derivatives[J].CAAI Transactions on Intelligent Systems,2016,11(03):249.[doi:10.11992/tis.201507064]
 HU Xiaosheng,WEN Juping,ZHONG Yong.Imbalanced data ensemble classification using dynamic balance sampling[J].CAAI Transactions on Intelligent Systems,2016,11(03):257.[doi:10.11992/tis.201507015]
 HUA Xiaopeng,SUN Yike,DING Shifei.An improved projection twin support vector machine[J].CAAI Transactions on Intelligent Systems,2016,11(03):384.[doi:10.11992/tis.201603049]
 LI Chenxi,SUN Zhengxing,SONG Mofei,et al.A classification-based approach for best view selection of 3D models[J].CAAI Transactions on Intelligent Systems,2014,9(03):12.[doi:10.3969/j.issn.1673-4785.201305004]


更新日期/Last Update: 1900-01-01