[1]刘杨磊,梁吉业,高嘉伟,等.基于Tri-training的半监督多标记学习算法[J].智能系统学报,2013,8(5):439-445.[doi:10.3969/j.issn.1673-4785.201305033]
LIU Yanglei,LIANG Jiye,GAO Jiawei,et al.Semi-supervised multi-label learning algorithm based on Tri-training[J].CAAI Transactions on Intelligent Systems,2013,8(5):439-445.[doi:10.3969/j.issn.1673-4785.201305033]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
8
期数:
2013年第5期
页码:
439-445
栏目:
学术论文—机器学习
出版日期:
2013-10-25
- Title:
-
Semi-supervised multi-label learning algorithm based on Tri-training
- 文章编号:
-
1673-4785(2013)05-439-07
- 作者:
-
刘杨磊1,2,梁吉业1,2,高嘉伟1,2,杨静1,2
-
1.山西大学 计算机与信息技术学院,山西 太原 030006; 2.山西大学 计算智能与中文信息处理教育部重点实验室,山西 太原 030006
- Author(s):
-
LIU Yanglei1,2, LIANG Jiye1,2, GAO Jiawei1,2, YANG Jing1,2
-
1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China; 2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
-
- 关键词:
-
多标记学习; 半监督学习; Tri-training
- Keywords:
-
multi-label learning; semi-supervised learning; Tri-training
- 分类号:
-
TP181
- DOI:
-
10.3969/j.issn.1673-4785.201305033
- 文献标志码:
-
A
- 摘要:
-
传统的多标记学习是监督意义下的学习,它要求获得完整的类别标记.但是当数据规模较大且类别数目较多时,获得完整类别标记的训练样本集是非常困难的.因而,在半监督协同训练思想的框架下,提出了基于Tri-training的半监督多标记学习算法(SMLT).在学习阶段,SMLT引入一个虚拟类标记,然后针对每一对类别标记,利用协同训练机制Tri-training算法训练得到对应的分类器;在预测阶段,给定一个新的样本,将其代入上述所得的分类器中,根据类别标记得票数的多少将多标记学习问题转化为标记排序问题,并将虚拟类标记的得票数作为阈值对标记排序结果进行划分.在UCI中4个常用的多标记数据集上的对比实验表明,SMLT算法在4个评价指标上的性能大多优于其他对比算法,验证了该算法的有效性.
- Abstract:
-
Traditional multi-label learning is in the sense of supervision, in which the complete category labels are required. However, when the size of data is large and there are several categories of labels, it is quite difficult to obtain the training sample sets with complete labels. Therefore, a semi-supervised multi-label learning algorithm based on Tri-training (SMLT) is proposed. In the learning stage, SMLT initially introduces a virtual label, then for each pair of virtual labels, the Tri-training algorithm is utilized to train the corresponding classifiers for each pair of labels. In the forecast stage, a new sample is given, which will be substituted into the obtained classifier described above. According to the votes of each label, the multi-label learning problem is transformed into a label ranking problem, subsequently; the votes of the virtual label are taken as the threshold for distinguishing the label ranking results. The contrast experiments on four commonly used UCI multi-label datasets show the SMLT algorithm behaves better than other comparative algorithms in four evaluation indices and the effectiveness of the proposed algorithm is verified.
备注/Memo
收稿日期:2013-05-09.???? 网络出版日期:2013-09-29.
基金项目:国家“973”计划前期研究专项(2011CB311805);山西省科技攻关计划资助项目(20110321027-01);山西省科技基础条件平台建设项目(2012091002-0101).
通信作者:梁吉业. E-mail: ljy@sxu.edu.cn.
作者简介:
刘杨磊,男,1990年生,硕士研究生,主要研究方向为机器学习.发表学术论文3篇,获得计算机软件著作权登记3项.
梁吉业,男,1962年生,教授,博士生导师,博士,主要研究方向为机器学习、计算智能、数据挖掘等.先后主持国家自然科学基金重点项目1项、国家“863”计划项目2项,国家“973”计划前期研究专项1项、国家自然科学基金项目4项.发表学术论文150余篇,出版著作2部,获发明专利8项.
高嘉伟,男,1980年生,讲师,主要研究方向为机器学习.参与国家“863”计划项目1项、国家自然科学基金项目3项和山西省自然科学基金项目4项,发表学术论文10余篇.
更新日期/Last Update:
2013-11-28