[1]周亦鹏,杜军平.基于关联词的主题模型语义标注[J].智能系统学报,2012,7(4):327-332.
ZHOU Yipeng,DU Junping.Semantic tagging of a topic model based on associated words[J].CAAI Transactions on Intelligent Systems,2012,7(4):327-332.
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
7
期数:
2012年第4期
页码:
327-332
栏目:
学术论文—自然语言处理与理解
出版日期:
2012-08-25
- Title:
-
Semantic tagging of a topic model based on associated words
- 文章编号:
-
1673-4785(2012)04-0327-06
- 作者:
-
周亦鹏1, 杜军平2
-
1.北京工商大学 计算机与信息工程学院,北京 100048;
2.北京邮电大学 智能通信软件与多媒体北京市重点实验室,北京 100876
- Author(s):
-
ZHOU Yipeng1, DU Junping2
-
1. School of Computer Science and Information Engineering, Beijing Technology and Business University, Beijing 100048, China;
2. Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China
-
- 关键词:
-
主题分析; 语义标注; 生成模型; 关联词; 关联规则
- Keywords:
-
topic analysis; semantic tagging; generative model; associated words; association rule
- 分类号:
-
TP391
- 文献标志码:
-
A
- 摘要:
-
互联网主题分析中经常采用概率主题模型对主题进行描述,但存在对于一般用户难以理解的问题,提出一种概率主题模型的自动语义标注方法.首先通过基于语义分类的关联规则挖掘关联主题词并建立候选标签集合,然后以关联词在数据集中的概率分布来设计相关性判别函数,计算候选标签和主题模型的相关度,最后根据最大边缘相关选择高语义覆盖度和区分度的标签.在食品安全和旅游领域主题模型标注的实验表明,与最大概率主题词标记方法相比,提出的方法能够明显提高标注的准确性,并且解决了多标签标记中语义类别单一的问题,能够以较少数量的标签表达更为丰富的语义,这有助于进一步实现更为准确的主题跟踪和主题信息检索.
- Abstract:
-
In topic analysis field of Internet, the probabilistic topic model is often used to describe topic semanteme. But the semanteme of a topic model is difficult for users to understand. An automatic semantic tagging method of a probabilistic topic model is proposed. Firstly, an association rule mining algorithm based on semantic categories is presented to get associated topic words, which consist of a candidate tag set. Then, according to the probability of associated words, a semantic correlation function is used to calculate semantic correlation of candidate tags and topic model. At last, a maximal marginal relevance method is used to select tags with better semantic coverage and discrimination. The experimental results of food safety and tourism topic model proved that, compared with maximum probability topic words tagging method, the proposed method can improve accuracy of topic tagging obviously, and can express more abundant semantemes with a small number of tags, which solve the problem of single semantic category in the multitagging method. So it is helpful to achieve more accurate topic tracking and topic information retrieval.
备注/Memo
收稿日期:2012-04-20.
网络出版日期:2012-07-12.
基金项目:国家“973”计划资助项目(2012CB821206);国家自然科学基金资助项目(91024001,61070142);北京市自然科学基金资助项目(4111002).
通信作者:周亦鹏.
E-mail:yipengzhou@163.com.
作者简介:
周亦鹏,男,1976年生,讲师.主要研究方向为人工智能和Web挖掘.
?杜军平,女,1963年生,教授,博士生导师.主要研究方向为人工智能和数据挖掘,承担国家“863”、“973”计划、国家自然科学基金、北京市自然科学基金项目等多项,发表学术论文150余篇.
更新日期/Last Update:
2012-09-27