[1]顾军华,谢志坚,武君艳,等.基于图游走的并行协同过滤推荐算法[J].智能系统学报,2019,14(4):743-751.[doi:10.11992/tis.201806002]
GU Junhua,XIE Zhijian,WU Junyan,et al.Parallel collaborative filtering recommendation algorithm based on graph walk[J].CAAI Transactions on Intelligent Systems,2019,14(4):743-751.[doi:10.11992/tis.201806002]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
14
期数:
2019年第4期
页码:
743-751
栏目:
学术论文—机器学习
出版日期:
2019-07-02
- Title:
-
Parallel collaborative filtering recommendation algorithm based on graph walk
- 作者:
-
顾军华1,2, 谢志坚1,2, 武君艳1,2, 许馨匀1,2, 张素琪3
-
1. 河北工业大学 人工智能与数据科学学院, 天津 300401;
2. 河北工业大学 河北省大数据计算重点实验室, 天津 300401;
3. 天津商业大学 信息工程学院, 天津 300134
- Author(s):
-
GU Junhua1,2, XIE Zhijian1,2, WU Junyan1,2, XU Xinyun1,2, ZHANG Suqi3
-
1. School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China;
2. Hebei Province Key Laboratory of Big Data Computing, Tianjin 300401, China;
3. School of Information Engineering, Tianjin University of Commerce, Tianjin 300134, China
-
- 关键词:
-
协同过滤; 推荐; 用户网络图; 游走; 相似度; 间接相似度; 并行; Spark 平台
- Keywords:
-
collaborative filtering; recommendation; user network map; walk; similarity; indirect similarity; parallel; Spark platform
- 分类号:
-
TP391
- DOI:
-
10.11992/tis.201806002
- 摘要:
-
针对目前协同过滤推荐算法存在的数据稀疏性问题和可扩展性问题,本文进行了相关研究。针对稀疏性问题,在传统的皮尔逊相关相似度中引入交占比系数计算用户间直接相似度,该方法缓解了用户间共同评分项的占比问题;提出一种基于图游走的间接相似度计算方法,该方法根据用户间的直接相似度建立用户网络图,在用户网络图上通过游走计算用户间的间接相似度,并进行推荐。在Spark平台上实现本文方法的并行化,缓解了数据规模增加带来的可扩展性问题。实验结果表明:本文提出的算法在不同数据集上均取得了良好效果,有效地提高了推荐准确度,并且在分布式环境下具有良好的可扩展性。
- Abstract:
-
This study aims to solve the problem of data sparsity and scalability of collaborative filtering recommendation algorithms. For the sparseness problem, the traditional Pearson correlation similarity is introduced to calculate the direct similarity between the users using the cross-ratio coefficients. This method alleviates the proportion of common scoring items among users. An indirect similarity calculation method based on graph walk is proposed in the paper. This method builds a user network map based on the direct similarity between users, calculates the indirect similarity between users by walking on the user network map, and makes recommendations. The parallelization of this method on the Spark platform mitigates the scalability problem caused by increase of the data size. Experimental results on Movielens dataset and IPTV dataset show that the proposed algorithm achieves good results on different datasets, effectively improves the recommendation accuracy rate, and has good scalability in a distributed environment.
备注/Memo
收稿日期:2018-06-01。
基金项目:河北省科技计划项目(17210305D);天津市科技计划项目(16ZXHLSF0023);天津市自然科学基金项目(15JCQNJC00600).
作者简介:顾军华,男,1966年生,教授,博士生导师,CCF会员,中国离散数学学会常务理事,河北省计算机学会副理事长。主要研究方向为数据挖掘、智能信息处理等。完成科研项目30余项,发表学术论文50余篇;谢志坚,男,1995年生,硕士研究生,主要研究方向为数据挖掘与机器学习;武君艳,女,1994年生,硕士研究生,主要研究方向为数据挖掘与计算机仿真。
通讯作者:张素琪.E-mail:zhangsuqie@163.com
更新日期/Last Update:
2019-08-25