[1]郭少成,陈松灿.稀疏化的因子分解机[J].智能系统学报,2017,12(06):816-822.[doi:10.11992/tis.201706030]
 GUO Shaocheng,CHEN Songcan.Sparsified factorization machine[J].CAAI Transactions on Intelligent Systems,2017,12(06):816-822.[doi:10.11992/tis.201706030]
点击复制

稀疏化的因子分解机(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第12卷
期数:
2017年06期
页码:
816-822
栏目:
出版日期:
2017-12-25

文章信息/Info

Title:
Sparsified factorization machine
作者:
郭少成 陈松灿
南京航空航天大学 计算机科学与技术学院, 江苏 南京 210016
Author(s):
GUO Shaocheng CHEN Songcan
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
关键词:
因子分解机稀疏稀疏组Lasso特征选择推荐系统
Keywords:
factorization machinesparsitysparse group lassofeature selectionrecommender systems
分类号:
TP391
DOI:
10.11992/tis.201706030
摘要:
因子分解机(简称为FM)是最近被提出的一种特殊的二阶线性模型,不同于一般的二阶模型,FM对二阶项系数进行了分解,这种特殊的结构使得FM特别适用于高维且稀疏的数据。虽然FM在推荐系统领域已获得了应用,但FM本身并未显式考虑变量的稀疏性,特别当变量中包含结构稀疏信息时。因此,FM的二阶特征结构使其特征选择时应当满足这样一种性质,即涉及同一个特征的线性项和二阶项要么同时被选要么同时不被选,当该特征是噪音时,应当同时不被选,而当该特征是重要变量时,应当同时被选。考虑到这种结构特性,本文提出了一种基于稀疏组Lasso的因子分解机(SGL-FM),通过添加稀疏组Lasso的正则项,不仅实现了组间稀疏,还实现了组内稀疏。从另一个角度看,组内稀疏也相当于对因子分解的维度k进行了控制,使其能根据数据的不同而自适应地调整维度k。实验结果表明,本文提出的方法在保证了相当精度甚至更优精度的情况下,获得了比FM更稀疏的模型。
Abstract:
Factorization machine (FM) is a recently proposed second-order linear model. One of its main advantages is that the interactions within it are factorized, making it suitable for data with high dimensionality and high sparsity. Though FM has been applied in recommender systems, it fails to consider the sparsity of variables explicitly, especially when the variable contains information on structural sparsity. Therefore, the process of feature selection should meet the following requirements: the linear terms and second-order terms that share the same feature should be included or excluded at the same time; when the feature is noise, both should be excluded, otherwise, both should be included. Based on the sparse structure described above, this paper proposes a sparse group lasso-based factorization machine (SGL-FM). By adding sparse group lasso to the loss function, SGL-FM not only achieves sparsity between groups but also within groups. From another point of view, sparsity within groups can be seen as a method of controlling the dimensionality of the factorization; therefore, SGL-FM chooses the best k automatically when faced with datasets with different properties. The experimental results show that by applying the proposed method, under conditions of excellent precision, a model with more sparsity than FM was obtained.

参考文献/References:

[1] RAO C R, TOUTENBURG H. Linear models[M]. New York: Springer, 1995: 3-18.
[2] ADOMAVICIUS G, TUZHILIN A. Context-aware recommender systems[M]. US: Springer, 2015: 191-226.
[3] RENDLE S. Factorization machines[C]//IEEE 10th International Conference on Data Mining. Sydney, Australia, 2010: 995-1000.
[4] RENDLE S. Learning recommender systems with adaptive regularization[C]//Proceedings of the fifth ACM international conference on Web search and data mining. Seattle, USA, 2012: 133-142.
[5] TIBSHIRANI R. Regression shrinkage and selection via the lasso[J]. Journal of the royal statistical society, Series B (Methodological), 1996,73(3): 267-288.
[6] YUAN M, LIN Y. Model selection and estimation in regression with grouped variables[J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2006, 68(1): 49-67.
[7] SIMON N, FRIEDMAN J, HASTIE T, et al. A sparse-group lasso[J]. Journal of computational and graphical statistics, 2013, 22(2): 231-245.
[8] BLONDEL M, FUJINO A, UEDA N, et al. Higher-order factorization machines[C]//Advances in Neural Information Processing Systems. Barcelona, Spain 2016: 3351-3359.
[9] LI M, LIU Z, SMOLA A J, et al. DiFacto: distributed factorization machines[C]//Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. San Francisco, USA, 2016: 377-386.
[10] CHIN W S, YUAN B, YANG M Y, et al. An efficient alternating newton method for learning factorization machines [R].NTU:NTU,2016.
[11] DUCHI J, SINGER Y. Efficient online and batch learning using forward backward splitting[J]. Journal of Machine Learning Research, 2009, 10(12): 2899-2934.
[12] RENDLE S. Factorization machines with libfm[J]. ACM transactions on intelligent systems and technolog, 2012, 3(3): 57.
[13] LIU J, YE J. Moreau-Yosida regularization for grouped tree structure learning[C]//Advances in Neural Information Processing Systems. Vancouver, Canada, 2010: 1459-1467.

相似文献/References:

[1]马甲林,张永军,王志坚.基于概念簇的多主题提取算法[J].智能系统学报,2015,10(02):261.[doi:10.3969/j.issn.1673-4785.201405066]
 MA Jialin,ZHANG Yongjun,WANG Zhijian.Multi-topic extraction algorithm based on concept clusters[J].CAAI Transactions on Intelligent Systems,2015,10(06):261.[doi:10.3969/j.issn.1673-4785.201405066]

备注/Memo

备注/Memo:
收稿日期:2017-06-09;改回日期:。
基金项目:国家自然科学基金项目(61472186).
作者简介:郭少成,男,1993年生,硕士研究生,主要研究方向为机器学习、模式识别;陈松灿,男,1962年生,教授,博士生导师,博士,中国人工智能学会机器学习专委会主任,CCF高级会员,主要研究方向为模式识别、机器学习、神经计算。在国际主流期刊和顶级会议上发表多篇学术论文并多次获奖。
通讯作者:陈松灿.E-mail:s.chen@nuaa.edu.cn.
更新日期/Last Update: 2018-01-03