<-Previous Article Next Article->

[1]WANG Ding,MEN Changqian,WANG Wenjian.A kernel contextual bandit recommendation algorithm[J].CAAI Transactions on Intelligent Systems,2022,17(3):625-633.[doi:10.11992/tis.202105039]

Copy

A kernel contextual bandit recommendation algorithm

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 17 Number of periods: 2022 3 Page number: 625-633 Column: 人工智能院长论坛 Public date: 2022-05-05

Title:: A kernel contextual bandit recommendation algorithm

Author(s):: WANG Ding¹; MEN Changqian¹; WANG Wenjian¹; 2; 1. College of Computer and Information Technology, Shanxi University, Taiyuan 030006, China;
2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China

Keywords:: personalized recommendation; changing scenarios; multi-armed bandits; linear contextual bandits; kernel method; click-through rate; nonlinear; exploration-exploitation dilemma

CLC:: TP181

DOI:: 10.11992/tis.202105039

Abstract:: Personalized recommendations are becoming increasingly significant in the Internet era; however, conventional recommendation algorithms cannot adapt to the highly changing scenarios. Applying the linear contextual bandit algorithm (linear upper confidence bound, LinUCB) to personalized recommendations can effectively overcome the limitations of conventional recommendation algorithms; however, the accuracy is not sufficiently high. Herein, an improved kernel upper confidence bound (K-UCB) algorithm is proposed to handle the insufficient recommended accuracy of the LinUCB algorithm. The proposed algorithm breaks through the unreasonable linear hypothesis of the LinUCB algorithm and uses the kernel method to fit the nonlinear relation between the expected reward and context. A new method for calculating the upper confidence bound of estimate rewards under nonlinear data is established to the exploration–exploitation balance in the recommendation process. Experiments show that the proposed K-UCB algorithm exhibits higher recommended accuracy than other recommendation algorithms based on multiarmed bandits and can better adapt to the need for personalized recommendations in changing scenarios.

References:: [1] ADOMAVICIUS G, TUZHILIN A. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions[J]. IEEE transactions on knowledge and data engineering, 2005, 17(6): 734–749.
[2] SCHAFER J B, FRANKOWSKI D, HERLOCKER J, et al. Collaborative filtering recommender systems[M]//BRUSILOVSKY P, KOBSA A, NEJDL W. The Adaptive Web. Berlin, Germany: Springer, 2007: 291–324.
[3] SARWAR B, KARYPIS G, KONSTAN J, et al. Item-based collaborative filtering recommendation algorithms[C]//Proceedings of the 10th International Conference on World Wide Web. New York: ACM, 2001: 285–295.
[4] BASU C, HIRSH H, COHEN W. Recommendation as classification: using social and content-based information in recommendation[C]//Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence. Madison: WI, 1998: 714–720.
[5] PAZZANI M J, BILLSUS D. Content-based recommendation systems[M]//BRUSILOVSKY P, KOBSA A, NEJDL W. The Adaptive Web. Berlin, Germany: Springer, 2007: 325–341.
[6] AGARWAL D, CHEN B C, ELANGO P. Explore/exploit schemes for web content optimization[C]//Proceedings of the Ninth IEEE International Conference on Data Mining. Miami Beach: TEEE, 2009: 1–10.
[7] SLIVKINS A. Introduction to multi-armed bandits[J]. Foundations and trends^? in machine learning, 2019, 12(1/2): 1–286.
[8] ABBASI-YADKORI Y, PáL D, SZEPESVáRI C. Improved algorithms for linear stochastic bandits[C]//Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada, Spain, 2011: 2312–2320.
[9] BUBECK S, CESA-BIANCHI N. Regret analysis of stochastic and nonstochastic multi-armed bandit problems[J]. Foundations and trends^? in machine learning, 2012, 5(1): 1–122.
[10] CHU Wei, LI Lihong, REYZIN L, et al. Contextual bandits with linear payoff functions[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, USA, 2011: 208–214.
[11] BOUNEFFOUF D, BOUZEGHOUB A, GAN?ARSKI A L. A contextual-bandit algorithm for mobile context-aware recommender system[C]//Proceedings of the 19th International Conference on Neural Information Processing. Berlin: Springer, 2012: 324–331.
[12] LANGFORD J, ZHANG Tong. The Epoch-Greedy algorithm for contextual multi-armed bandits[C]//Proceedings of the 20th International Conference on Neural Information Processing Systems. Vancouver British, Columbia, Canada, 2007: 817–824.
[13] AUER P, CESA-BIANCHI N, FISCHER P. Finite-time analysis of the multiarmed bandit problem[J]. Machine learning, 2002, 47(2): 235–256.
[14] THOMPSON W R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples[J]. Biometrika, 1933, 25(3/4): 285–294.
[15] LI Lihong, CHU Wei, LANGFORD J, et al. A contextual-bandit approach to personalized news article recommendation[C]//Proceedings of the 19th International Conference on World Wide Web. New York: ACM, 2010: 661–670.
[16] AGRAWAL S, GOYAL N. Thompson sampling for contextual bandits with linear payoffs[C]//Proceedings of the 30th International Conference on International Conference on Machine Learning. New York: ACM, 2013: III-1220-III-1228.
[17] CESA-BIANCHI N, GENTILE C, ZAPPELLA G. A gang of bandits[C]//Proceedings of the 26th Intermational Conference on Neural Information Processing Systems. New York: ACM, 2013: 737–745.
[18] GENTILE C, LI Shuai, ZAPPELLA G. Online clustering of bandits[C]//Proceedings of the 31th International Conference on Machine Learning. Beijing, China, 2014: 757–765.
[19] LI Shuai, KARATZOGLOU A, GENTILE C. Collaborative filtering bandits[C]//Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. Pisa, Italy, 2016: 539–548.
[20] WANG Huazheng, WU Qingyun, WANG Hongning. Factorization bandits for interactive recommendation[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, United States, 2017: 2695–2702.
[21] SCH?LKOPF B, SMOLA A J. Learning with kernels: support vector machines, regularization, optimization, and beyond[M]. Cambridge, Mass: MIT Press, 2002.
[22] WALSH T J, SZITA I, DIUK C, et al. Exploring compact reinforcement-learning representations with linear regression[C]//Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. Montreal, Quebec, Canada, 2009: 591–598.
[23] 张国亮, 沈慧, 石峰, 等. 大型实对称矩阵分块迭代求逆算法[J]. 无线互联科技, 2015(6): 127–129
ZHANG Guoliang, SHEN Hui, SHI Feng, et al. Block iterative inverse algorithm for a iarge-scale real matrix[J]. Wireless internet technology, 2015(6): 127–129
[24] Yahoo! Webscope Program. Yahoo! front page today module user click log dataset, version 1.0[EB/OL]. (2020–12–22)[2021–05–26] http://webscope.sandbox.yahoo.com.
[25] LI Lihong, CHU Wei, LANGFORD J, et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms[C]//Proceedings of the fourth ACM International Conference on Web Search and Data Mining. Hong Kong, China, 2011: 297–306.

Similar References:

Memo

Last Update: 1900-01-01

A kernel contextual bandit recommendation algorithm PDF DownloadHTML

Memo

A kernel contextual bandit recommendation algorithm

PDF Download HTML