[1]ZHANG Sen,ZHANG Chen,LIN Peiguang,et al.Web search topic analysis based on user search query logs[J].CAAI Transactions on Intelligent Systems,2017,12(5):668-677.[doi:10.11992/tis.201706096]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
12
Number of periods:
2017 5
Page number:
668-677
Column:
学术论文—自然语言处理与理解
Public date:
2017-10-25
- Title:
-
Web search topic analysis based on user search query logs
- Author(s):
-
ZHANG Sen1; ZHANG Chen1; 2; LIN Peiguang1; ZHANG Chunyun1; GUO Yuchao1; REN Weilong1; REN Ke2
-
1. School of Computer Science & Technology, Shandong University of Finance & Economics, Jinan 250014, China;
2. Department of Computer Science & Engineering, Hong Kong University of Science and Technology, Hong Kong 999077, China
-
- Keywords:
-
web search; search engine; natural language processing; topic model; data mining; burstiness; temporal analysis; parameter estimate
- CLC:
-
TP391
- DOI:
-
10.11992/tis.201706096
- Abstract:
-
Web search analysis plays a critical role in improving the performance of contemporary search engines. In addition, search engine accuracy can be improved by analyzing the individual search properties of users. Most existing models, such as the click graph and its variants, focus on the common characteristics of the group. However, as yet, there has been little investigation of a model that would obtain both the collective group characteristics and the unique characteristics of individual users. In this paper, we investigate user-specific web search analysis, whereby we obtain the topic distributions of the search queries of individual users by determining the burstiness of user searches. We propose two topic models, i.e., the search burstiness model (SBM) and the coupling-sensitive search burstiness model (CS-SBM). The SBM adopts the assumption that the query words and URL are topically independent, The CS-SBM supposes that the query words and URL are topically relevant. The obtained topic distribution information is stored in skewed Dirichlet priors and a beta distribution is used to capture the temporal properties of the user searches. Our experimental results show that each user’s web search trail has unique characteristics, and that in the case of there being a large amount of real query log data, in comparison to the latent Dirichlet allocation (LDA) and topic over time (TOT) models, our proposed models have advantages with respect to generalized performance and effectively describes the temporal change process of user search queries.