[1]WANG Ding,MEN Changqian,WANG Wenjian.A kernel contextual bandit recommendation algorithm[J].CAAI Transactions on Intelligent Systems,2022,17(3):625-633.[doi:10.11992/tis.202105039]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
17
Number of periods:
2022 3
Page number:
625-633
Column:
人工智能院长论坛
Public date:
2022-05-05
- Title:
-
A kernel contextual bandit recommendation algorithm
- Author(s):
-
WANG Ding1; MEN Changqian1; WANG Wenjian1; 2
-
1. College of Computer and Information Technology, Shanxi University, Taiyuan 030006, China;
2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
-
- Keywords:
-
personalized recommendation; changing scenarios; multi-armed bandits; linear contextual bandits; kernel method; click-through rate; nonlinear; exploration-exploitation dilemma
- CLC:
-
TP181
- DOI:
-
10.11992/tis.202105039
- Abstract:
-
Personalized recommendations are becoming increasingly significant in the Internet era; however, conventional recommendation algorithms cannot adapt to the highly changing scenarios. Applying the linear contextual bandit algorithm (linear upper confidence bound, LinUCB) to personalized recommendations can effectively overcome the limitations of conventional recommendation algorithms; however, the accuracy is not sufficiently high. Herein, an improved kernel upper confidence bound (K-UCB) algorithm is proposed to handle the insufficient recommended accuracy of the LinUCB algorithm. The proposed algorithm breaks through the unreasonable linear hypothesis of the LinUCB algorithm and uses the kernel method to fit the nonlinear relation between the expected reward and context. A new method for calculating the upper confidence bound of estimate rewards under nonlinear data is established to the exploration–exploitation balance in the recommendation process. Experiments show that the proposed K-UCB algorithm exhibits higher recommended accuracy than other recommendation algorithms based on multiarmed bandits and can better adapt to the need for personalized recommendations in changing scenarios.