[1]周亦鹏,杜军平.基于关联词的主题模型语义标注[J].智能系统学报,2012,7(04):327-332.
 ZHOU Yipeng,DU Junping.Semantic tagging of a topic model based on associated words[J].CAAI Transactions on Intelligent Systems,2012,7(04):327-332.
点击复制

基于关联词的主题模型语义标注(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第7卷
期数:
2012年04期
页码:
327-332
栏目:
出版日期:
2012-08-25

文章信息/Info

Title:
Semantic tagging of a topic model based on associated words
文章编号:
1673-4785(2012)04-0327-06
作者:
周亦鹏1 杜军平2
1.北京工商大学 计算机与信息工程学院,北京 100048;
2.北京邮电大学 智能通信软件与多媒体北京市重点实验室,北京 100876
Author(s):
ZHOU Yipeng1 DU Junping2
1. School of Computer Science and Information Engineering, Beijing Technology and Business University, Beijing 100048, China; 
2. Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China
关键词:
主题分析语义标注生成模型关联词关联规则
Keywords:
topic analysis semantic tagging generative model associated words association rule
分类号:
TP391
文献标志码:
A
摘要:
互联网主题分析中经常采用概率主题模型对主题进行描述,但存在对于一般用户难以理解的问题,提出一种概率主题模型的自动语义标注方法.首先通过基于语义分类的关联规则挖掘关联主题词并建立候选标签集合,然后以关联词在数据集中的概率分布来设计相关性判别函数,计算候选标签和主题模型的相关度,最后根据最大边缘相关选择高语义覆盖度和区分度的标签.在食品安全和旅游领域主题模型标注的实验表明,与最大概率主题词标记方法相比,提出的方法能够明显提高标注的准确性,并且解决了多标签标记中语义类别单一的问题,能够以较少数量的标签表达更为丰富的语义,这有助于进一步实现更为准确的主题跟踪和主题信息检索.
Abstract:
In topic analysis field of Internet, the probabilistic topic model is often used to describe topic semanteme. But the semanteme of a topic model is difficult for users to understand. An automatic semantic tagging method of a probabilistic topic model is proposed. Firstly, an association rule mining algorithm based on semantic categories is presented to get associated topic words, which consist of a candidate tag set. Then, according to the probability of associated words, a semantic correlation function is used to calculate semantic correlation of candidate tags and topic model. At last, a maximal marginal relevance method is used to select tags with better semantic coverage and discrimination. The experimental results of food safety and tourism topic model proved that, compared with maximum probability topic words tagging method, the proposed method can improve accuracy of topic tagging obviously, and can express more abundant semantemes with a small number of tags, which solve the problem of single semantic category in the multitagging method. So it is helpful to achieve more accurate topic tracking and topic information retrieval.

参考文献/References:

[1]BLEI D M, NG A Y, JORDAN M I, et al. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3(7): 9931022.
[2]COHN D, HOFMANN T. The missing link—a probabilistic model of document content and hypertext connectivity[EB/OL]. [20100510]. http://books.nips.cc/nips13.html.
[3]GILDEA D, JURAFSKY D. Automatic labeling of semantic roles[J]. Computer Linguist, 2002, 28(3): 245288.
[4]石晶,李万龙. 基于LDA 模型的主题词抽取方法[J]. 计算机工程, 2010, 36(19): 8183.
SHI Jing, LI Wanlong. Topic words extraction method based on LDA model [J]. Computer Engineering, 2010, 36(19): 8183.
[5]BANERJEE S, PEDERSEN T. The design, implementation, and use of the ngram statistics package[C]//Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics. Mexico City, Mexico, 2003: 370381.
[6]刘铭,王晓龙,刘远超. 基于词汇链的关键短语抽取方法的研究[J]. 计算机学报, 2010, 33(7): 12461255.
LIU Ming, WANG Xiaolong, LIU Yuanchao. Research of keyphrase extraction based on lexical chain[J]. Chinese Journal of Computers, 2010, 33(7): 12461255.
[7]孙景广,蔡东风,吕德新,等. 基于知网的中文问题自动分类[J]. 中文信息学报, 2007, 21(1): 9095.
SUN Jingguang, CAI Dongfeng, L Dexin, et al. HowNet based Chinese question automatic classification [J]. Journal of Chinese Information Processing, 2007, 21(1): 9095.
[8]夏天. 汉语词语语义相似度计算研究[J]. 计算机工程,2007, 33(6): 191194.
 XIA Tian. Study on Chinese words semantic similarity computation[J]. Computer Engineering, 2007, 33(6): 191194.
[9]黄名选,严小卫,张师超. 基于矩阵加权关联规则挖掘的伪相关反馈查询扩展[J]. 软件学报, 2009, 20(7): 18541865.
HUANG Mingxuan, YAN Xiaowei, ZHANG Shichao. Query expansion of pseudo relevance feedback based on matrixweighted association rules mining[J]. Journal of Software, 2009, 20(7): 18541865.
[10]石晶,范猛,李万龙. 基于LDA模型的主题分析[J]. 自动化学报, 2009, 35(12): 15861592.
SHI Jing, FAN Meng, LI Wanlong. Topic analysis based on LDA model[J]. Acta Automatica Sinica, 2009, 35(12): 15861592.

备注/Memo

备注/Memo:
收稿日期:2012-04-20.
网络出版日期:2012-07-12.
基金项目:国家“973”计划资助项目(2012CB821206);国家自然科学基金资助项目(91024001,61070142);北京市自然科学基金资助项目(4111002).
通信作者:周亦鹏.
E-mail:yipengzhou@163.com.
作者简介:
周亦鹏,男,1976年生,讲师.主要研究方向为人工智能和Web挖掘.
 杜军平,女,1963年生,教授,博士生导师.主要研究方向为人工智能和数据挖掘,承担国家“863”、“973”计划、国家自然科学基金、北京市自然科学基金项目等多项,发表学术论文150余篇.
更新日期/Last Update: 2012-09-27