[1]YANG Xiao,MA Jun,YANG Tong-feng,et al.Automatic multidocument summarization based onthe latent Dirichlet topic allocation model[J].CAAI Transactions on Intelligent Systems,2010,5(2):169-176.
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
5
Number of periods:
2010 2
Page number:
169-176
Column:
学术论文—自然语言处理与理解
Public date:
2010-04-25
- Title:
-
Automatic multidocument summarization based onthe latent Dirichlet topic allocation model
- Author(s):
-
YANG Xiao1; MA Jun2; YANG Tong-feng2; DU Yan-qi2; SHAO Hai-min2
-
1. School of Information Management, Shandong Economic University, Ji’nan 250014, China;
2. School of Computer Science and Technology, Shandong University, Ji’nan 250101, China
-
- Keywords:
-
multidocument summarization; sentence scoring; topic model; latent dirichlet allocation; number of topics
- CLC:
-
TP391
- DOI:
-
-
- Abstract:
-
The representative problem of multidocument summarization using probabilistic topic models has begun receiving considerable attention. A multidocument summarization method was proposed based on the latent dirichlet allocation (LDA) model, itself a model representative of probabilistic generative topic models. In this method, the number of topics in the LDA model was determined by model perplexity, and the probabilistic sentence distribution on topics and the probabilistic topic distribution on words were obtained by the Gibbs sampling method. The importance of topics was determined by the sum of topic weights on all sentences. Two sentencescoring methods were proposed, one based on sentence distribution and the other on topic distribution. Evaluated by the recalloriented understudy for gisting evaluation (ROUGE) metrics, results of the both proposed methods surpassed the stateoftheart SumBasic system and the other two LDA based summarization systems for all the ROUGE scores on the DUC2002 generic multidocument summarization test set.