<-上一篇/Previous Article 下一篇/Next Article->

[1]王鼎,门昌骞,王文剑.一种核的上下文多臂赌博机推荐算法[J].智能系统学报,2022,17(3):625-633.[doi:10.11992/tis.202105039]
　WANG Ding,MEN Changqian,WANG Wenjian.A kernel contextual bandit recommendation algorithm[J].CAAI Transactions on Intelligent Systems,2022,17(3):625-633.[doi:10.11992/tis.202105039]

点击复制

一种核的上下文多臂赌博机推荐算法

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 17 期数: 2022年第3期页码: 625-633 栏目: 人工智能院长论坛出版日期: 2022-05-05

Title:: A kernel contextual bandit recommendation algorithm

作者:: 王鼎¹, 门昌骞¹, 王文剑^1,2; 1. 山西大学计算机与信息技术学院，山西太原 030006;
2. 山西大学计算智能与中文信息处理教育部重点实验室，山西太原 030006

Author(s):: WANG Ding¹, MEN Changqian¹, WANG Wenjian^1,2; 1. College of Computer and Information Technology, Shanxi University, Taiyuan 030006, China;
2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China

关键词:: 个性化推荐; 变化场景; 多臂赌博机; 线性上下文多臂赌博机; 核方法; 点击率; 非线性; 探索–利用困境

Keywords:: personalized recommendation; changing scenarios; multi-armed bandits; linear contextual bandits; kernel method; click-through rate; nonlinear; exploration-exploitation dilemma

分类号:: TP181

DOI:: 10.11992/tis.202105039

摘要:: 个性化推荐服务在当今互联网时代越来越重要，但是传统推荐算法不适应一些高度变化场景。将线性上下文多臂赌博机算法(linear upper confidence bound, LinUCB)应用于个性化推荐可以有效改善传统推荐算法存在的问题，但遗憾的是准确率并不是很高。本文针对LinUCB算法推荐准确率不高这一问题，提出了一种改进算法K-UCB(kernel upper confidence bound)。该算法突破了LinUCB算法中不合理的线性假设前提，利用核方法拟合预测收益与上下文间的非线性关系，得到了一种新的在非线性数据下计算预测收益置信区间上界的方法，以解决推荐过程中的探索–利用困境。实验表明，本文提出的K-UCB算法相比其他基于多臂赌博机推荐算法有更高的点击率(click-through rate, CTR)，能更好地适应变化场景下个性化推荐的需求。

Abstract:: Personalized recommendations are becoming increasingly significant in the Internet era; however, conventional recommendation algorithms cannot adapt to the highly changing scenarios. Applying the linear contextual bandit algorithm (linear upper confidence bound, LinUCB) to personalized recommendations can effectively overcome the limitations of conventional recommendation algorithms; however, the accuracy is not sufficiently high. Herein, an improved kernel upper confidence bound (K-UCB) algorithm is proposed to handle the insufficient recommended accuracy of the LinUCB algorithm. The proposed algorithm breaks through the unreasonable linear hypothesis of the LinUCB algorithm and uses the kernel method to fit the nonlinear relation between the expected reward and context. A new method for calculating the upper confidence bound of estimate rewards under nonlinear data is established to the exploration–exploitation balance in the recommendation process. Experiments show that the proposed K-UCB algorithm exhibits higher recommended accuracy than other recommendation algorithms based on multiarmed bandits and can better adapt to the need for personalized recommendations in changing scenarios.

参考文献/References:: [1] ADOMAVICIUS G, TUZHILIN A. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions[J]. IEEE transactions on knowledge and data engineering, 2005, 17(6): 734–749.
[2] SCHAFER J B, FRANKOWSKI D, HERLOCKER J, et al. Collaborative filtering recommender systems[M]//BRUSILOVSKY P, KOBSA A, NEJDL W. The Adaptive Web. Berlin, Germany: Springer, 2007: 291–324.
[3] SARWAR B, KARYPIS G, KONSTAN J, et al. Item-based collaborative filtering recommendation algorithms[C]//Proceedings of the 10th International Conference on World Wide Web. New York: ACM, 2001: 285–295.
[4] BASU C, HIRSH H, COHEN W. Recommendation as classification: using social and content-based information in recommendation[C]//Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence. Madison: WI, 1998: 714–720.
[5] PAZZANI M J, BILLSUS D. Content-based recommendation systems[M]//BRUSILOVSKY P, KOBSA A, NEJDL W. The Adaptive Web. Berlin, Germany: Springer, 2007: 325–341.
[6] AGARWAL D, CHEN B C, ELANGO P. Explore/exploit schemes for web content optimization[C]//Proceedings of the Ninth IEEE International Conference on Data Mining. Miami Beach: TEEE, 2009: 1–10.
[7] SLIVKINS A. Introduction to multi-armed bandits[J]. Foundations and trends^? in machine learning, 2019, 12(1/2): 1–286.
[8] ABBASI-YADKORI Y, PáL D, SZEPESVáRI C. Improved algorithms for linear stochastic bandits[C]//Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada, Spain, 2011: 2312–2320.
[9] BUBECK S, CESA-BIANCHI N. Regret analysis of stochastic and nonstochastic multi-armed bandit problems[J]. Foundations and trends^? in machine learning, 2012, 5(1): 1–122.
[10] CHU Wei, LI Lihong, REYZIN L, et al. Contextual bandits with linear payoff functions[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, USA, 2011: 208–214.
[11] BOUNEFFOUF D, BOUZEGHOUB A, GAN?ARSKI A L. A contextual-bandit algorithm for mobile context-aware recommender system[C]//Proceedings of the 19th International Conference on Neural Information Processing. Berlin: Springer, 2012: 324–331.
[12] LANGFORD J, ZHANG Tong. The Epoch-Greedy algorithm for contextual multi-armed bandits[C]//Proceedings of the 20th International Conference on Neural Information Processing Systems. Vancouver British, Columbia, Canada, 2007: 817–824.
[13] AUER P, CESA-BIANCHI N, FISCHER P. Finite-time analysis of the multiarmed bandit problem[J]. Machine learning, 2002, 47(2): 235–256.
[14] THOMPSON W R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples[J]. Biometrika, 1933, 25(3/4): 285–294.
[15] LI Lihong, CHU Wei, LANGFORD J, et al. A contextual-bandit approach to personalized news article recommendation[C]//Proceedings of the 19th International Conference on World Wide Web. New York: ACM, 2010: 661–670.
[16] AGRAWAL S, GOYAL N. Thompson sampling for contextual bandits with linear payoffs[C]//Proceedings of the 30th International Conference on International Conference on Machine Learning. New York: ACM, 2013: III-1220-III-1228.
[17] CESA-BIANCHI N, GENTILE C, ZAPPELLA G. A gang of bandits[C]//Proceedings of the 26th Intermational Conference on Neural Information Processing Systems. New York: ACM, 2013: 737–745.
[18] GENTILE C, LI Shuai, ZAPPELLA G. Online clustering of bandits[C]//Proceedings of the 31th International Conference on Machine Learning. Beijing, China, 2014: 757–765.
[19] LI Shuai, KARATZOGLOU A, GENTILE C. Collaborative filtering bandits[C]//Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. Pisa, Italy, 2016: 539–548.
[20] WANG Huazheng, WU Qingyun, WANG Hongning. Factorization bandits for interactive recommendation[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, United States, 2017: 2695–2702.
[21] SCH?LKOPF B, SMOLA A J. Learning with kernels: support vector machines, regularization, optimization, and beyond[M]. Cambridge, Mass: MIT Press, 2002.
[22] WALSH T J, SZITA I, DIUK C, et al. Exploring compact reinforcement-learning representations with linear regression[C]//Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. Montreal, Quebec, Canada, 2009: 591–598.
[23] 张国亮, 沈慧, 石峰, 等. 大型实对称矩阵分块迭代求逆算法[J]. 无线互联科技, 2015(6): 127–129
ZHANG Guoliang, SHEN Hui, SHI Feng, et al. Block iterative inverse algorithm for a iarge-scale real matrix[J]. Wireless internet technology, 2015(6): 127–129
[24] Yahoo! Webscope Program. Yahoo! front page today module user click log dataset, version 1.0[EB/OL]. (2020–12–22)[2021–05–26] http://webscope.sandbox.yahoo.com.
[25] LI Lihong, CHU Wei, LANGFORD J, et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms[C]//Proceedings of the fourth ACM International Conference on Web Search and Data Mining. Hong Kong, China, 2011: 297–306.

相似文献/References:: [1]陈万志,林澍,王丽,等.基于用户移动轨迹的个性化健康建议推荐方法[J].智能系统学报,2016,11(2):264.[doi:10.11992/tis.201511026]
　CHEN Wanzhi,LIN Shu,WANG Li,et al.Personalized recommendation algorithm of health advice based on the user’s mobile trajectory[J].CAAI Transactions on Intelligent Systems,2016,11():264.[doi:10.11992/tis.201511026]
[2]雷震,文益民,王志强,等.基于影响力控制的热传导算法[J].智能系统学报,2016,11(3):328.[doi:10.11992/tis.201603042]
　LEI Zhen,WEN Yimin,WANG Zhiqiang,et al.Heat conduction controlled by the influence of users and items[J].CAAI Transactions on Intelligent Systems,2016,11():328.[doi:10.11992/tis.201603042]
[3]常亮,孙文平,张伟涛,等.旅游路线规划研究综述[J].智能系统学报,2019,14(1):82.[doi:10.11992/tis.201804005]
　CHANG Liang,SUN Wenping,ZHANG Weitao,et al.Review of tourism route planning[J].CAAI Transactions on Intelligent Systems,2019,14():82.[doi:10.11992/tis.201804005]
[4]匡海丽,常亮,宾辰忠,等.上下文感知旅游推荐系统研究综述[J].智能系统学报,2019,14(4):611.[doi:10.11992/tis.201901013]
　KUANG Haili,CHANG Liang,BIN Chenzhong,et al.Review of a context-aware travel recommendation system[J].CAAI Transactions on Intelligent Systems,2019,14():611.[doi:10.11992/tis.201901013]
[5]YOCHUM Phatpicha,常亮,古天龙,等.基于位置和开放链接数据的旅游推荐系统综述[J].智能系统学报,2020,15(1):25.[doi:10.11992/tis.201912023]
　YOCHUM Phatpicha,CHANG Liang,GU Tianlong,et al.A review of linked open data in location-based recommendation system in the tourism domain[J].CAAI Transactions on Intelligent Systems,2020,15():25.[doi:10.11992/tis.201912023]
[6]陈恩红,刘淇,王士进,等.面向智能教育的自适应学习关键技术与应用[J].智能系统学报,2021,16(5):886.[doi:10.11992/tis.202105036]
　CHEN Enhong,LIU Qi,WANG Shijin,et al.Key techniques and application of intelligent education oriented adaptive learning[J].CAAI Transactions on Intelligent Systems,2021,16():886.[doi:10.11992/tis.202105036]

备注/Memo

收稿日期:2021-05-26。
基金项目:国家自然科学基金项目(62076154，U1805263)；中央引导地方科技发展资金项目(YDZX20201400001224)；山西省自然科学基金项目(201901D111030)；山西省国际科技合作重点研发计划项目（201903D421050）.
作者简介:王鼎，硕士研究生，主要研究方向为机器学习;门昌骞，讲师，主要研究方向为支持向量机、机器学习理论、核方法;王文剑，教授，博士生导师，山西大学计算机与信息技术学院院长，主要研究方向为计算智能、机器学习与数据挖掘。主持国家自然科学基金项目4项。发表学术论文150余篇
通讯作者:王文剑.E-mail:wjwang@sxu.edu.cn

更新日期/Last Update: 1900-01-01

一种核的上下文多臂赌博机推荐算法 PDF下载HTML

备注/Memo

一种核的上下文多臂赌博机推荐算法

PDF下载 HTML