[1]郭少成,陈松灿.稀疏化的因子分解机[J].智能系统学报,2017,12(6):816-822.[doi:10.11992/tis.201706030]
GUO Shaocheng,CHEN Songcan.Sparsified factorization machine[J].CAAI Transactions on Intelligent Systems,2017,12(6):816-822.[doi:10.11992/tis.201706030]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
12
期数:
2017年第6期
页码:
816-822
栏目:
学术论文—人工智能基础
出版日期:
2017-12-25
- Title:
-
Sparsified factorization machine
- 作者:
-
郭少成, 陈松灿
-
南京航空航天大学 计算机科学与技术学院, 江苏 南京 210016
- Author(s):
-
GUO Shaocheng, CHEN Songcan
-
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
-
- 关键词:
-
因子分解机; 稀疏; 稀疏组Lasso; 特征选择; 推荐系统
- Keywords:
-
factorization machine; sparsity; sparse group lasso; feature selection; recommender systems
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.201706030
- 摘要:
-
因子分解机(简称为FM)是最近被提出的一种特殊的二阶线性模型,不同于一般的二阶模型,FM对二阶项系数进行了分解,这种特殊的结构使得FM特别适用于高维且稀疏的数据。虽然FM在推荐系统领域已获得了应用,但FM本身并未显式考虑变量的稀疏性,特别当变量中包含结构稀疏信息时。因此,FM的二阶特征结构使其特征选择时应当满足这样一种性质,即涉及同一个特征的线性项和二阶项要么同时被选要么同时不被选,当该特征是噪音时,应当同时不被选,而当该特征是重要变量时,应当同时被选。考虑到这种结构特性,本文提出了一种基于稀疏组Lasso的因子分解机(SGL-FM),通过添加稀疏组Lasso的正则项,不仅实现了组间稀疏,还实现了组内稀疏。从另一个角度看,组内稀疏也相当于对因子分解的维度k进行了控制,使其能根据数据的不同而自适应地调整维度k。实验结果表明,本文提出的方法在保证了相当精度甚至更优精度的情况下,获得了比FM更稀疏的模型。
- Abstract:
-
Factorization machine (FM) is a recently proposed second-order linear model. One of its main advantages is that the interactions within it are factorized, making it suitable for data with high dimensionality and high sparsity. Though FM has been applied in recommender systems, it fails to consider the sparsity of variables explicitly, especially when the variable contains information on structural sparsity. Therefore, the process of feature selection should meet the following requirements: the linear terms and second-order terms that share the same feature should be included or excluded at the same time; when the feature is noise, both should be excluded, otherwise, both should be included. Based on the sparse structure described above, this paper proposes a sparse group lasso-based factorization machine (SGL-FM). By adding sparse group lasso to the loss function, SGL-FM not only achieves sparsity between groups but also within groups. From another point of view, sparsity within groups can be seen as a method of controlling the dimensionality of the factorization; therefore, SGL-FM chooses the best k automatically when faced with datasets with different properties. The experimental results show that by applying the proposed method, under conditions of excellent precision, a model with more sparsity than FM was obtained.
备注/Memo
收稿日期:2017-06-09;改回日期:。
基金项目:国家自然科学基金项目(61472186).
作者简介:郭少成,男,1993年生,硕士研究生,主要研究方向为机器学习、模式识别;陈松灿,男,1962年生,教授,博士生导师,博士,中国人工智能学会机器学习专委会主任,CCF高级会员,主要研究方向为模式识别、机器学习、神经计算。在国际主流期刊和顶级会议上发表多篇学术论文并多次获奖。
通讯作者:陈松灿.E-mail:s.chen@nuaa.edu.cn.
更新日期/Last Update:
2018-01-03