[1]栾寻,高尉.优化AUC两遍学习算法[J].智能系统学报,2018,13(3):395-398.[doi:10.11992/tis.201706079]
LUAN Xun,GAO Wei.Two-pass AUC optimization[J].CAAI Transactions on Intelligent Systems,2018,13(3):395-398.[doi:10.11992/tis.201706079]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
13
期数:
2018年第3期
页码:
395-398
栏目:
学术论文—机器学习
出版日期:
2018-05-05
- Title:
-
Two-pass AUC optimization
- 作者:
-
栾寻, 高尉
-
南京大学 计算机软件新技术国家重点实验室, 南京 210023
- Author(s):
-
LUAN Xun, GAO Wei
-
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
-
- 关键词:
-
机器学习; AUC; ROC; 单遍学习; 在线学习; 排序; 随机梯度下降; 统计量
- Keywords:
-
machine learning; AUC; ROC; one-pass learning; online learning; ranking; stochastic gradient descent; statistics
- 分类号:
-
TP181
- DOI:
-
10.11992/tis.201706079
- 摘要:
-
ROC曲线下的面积(简称AUC)是机器学习中一种重要的性能评价准则,广泛应用于类别不平衡学习、代价敏感学习、排序学习等诸多学习任务。由于AUC定义于正负样本之间,传统方法需存储整个数据而不能适用于大数据。为解决大规模问题,前人已提出优化AUC的单遍学习算法,该算法仅需遍历数据一次,通过存储一阶与二阶统计量来进行优化AUC学习。然而在实际应用中,处理二阶统计量依然需要很高的存储与计算开销。为此,本文提出了一种新的优化AUC两遍学习算法TPAUC (two-pass AUC optimization)。该算法的基本思想是遍历数据两遍,第一遍扫描数据获得正、负样本的均值,第二遍采用随机梯度下降方法优化AUC。算法的优点在于通过遍历数据两遍来避免存储和计算二阶统计量,从而提高算法的效率,最后本文通过实验说明方法的有效性。
- Abstract:
-
The area under an ROC curve (AUC) has been an important performance index for class-imbalanced learning, cost-sensitive learning, learning to rank, etc. Traditional AUC optimization requires the entire dataset to be stored because AUC is defined as pairs of positive and negative instances. To solve this problem, the one-pass AUC (OPAUC) algorithm was introduced previously to scan the data only once and store the first- and second-order statistics. However, in many real applications, the second-order statistics require high storage and are computationally costly, especially for high-dimensional datasets. We introduce the two-pass AUC (TPAUC) optimization to calculate the mean of positive and negative instances in the first pass and then use the stochastic gradient descent method in the second pass. The new algorithm requires the storage of the first-order statistics but not the second-order statistics; hence, the efficiency is improved. Finally, experiments are used to verify the effectiveness of the proposed algorithm.
备注/Memo
收稿日期:2017-06-24。
基金项目:国家自然科学基金青年科学基金项目(61503179);江苏省青年基金项目(BK20150586).
作者简介:栾寻,男,1994年生,硕士研究生,主要研究方向为大规模机器学习、推荐系统。
通讯作者:高尉.E-mail:gaow@lamda.nju.edu.cn.
更新日期/Last Update:
2018-06-25