[1]袁鼎荣,钟 宁.Web页面信息主动检索模型[J].智能系统学报,2010,5(02):112-116.
 YUAN Ding-rong,ZHONG Ning.Initiative retrieval of web information[J].CAAI Transactions on Intelligent Systems,2010,5(02):112-116.
点击复制

Web页面信息主动检索模型(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第5卷
期数:
2010年02期
页码:
112-116
栏目:
出版日期:
2010-04-25

文章信息/Info

Title:
Initiative retrieval of web information
文章编号:
1673-4785(2010)02-0112-05
作者:
袁鼎荣12 钟 宁1
1.北京工业大学 国际WIC研究院,北京 100022;
2.广西师范大学 计算机科学与信息工程学院,广西 桂林 541004
Author(s):
YUAN Ding-rong12 ZHONG Ning1
1. The International WIC Institute, Beijing University of Technology, Beijing 100022, China;
 2. College of Computer Science and Information Technology, Guangxi Normal University, Guilin 541004, China
关键词:
页面Block页面信息树用户特征树主动检索
Keywords:
Web Block page information tree user especial treeinitiative retrieval
分类号:
TP301.6
文献标志码:
A
摘要:
单个页面信息量远远大于特定用户对页面中的信息需求.为快速准确从当前页面中获取特定用户所需求的兴趣信息,提出了页面信息主动检索模型.该检索模型中,根据页面Block特点将当前Web页面转化成信息树,根据用户过去的浏览行为构造用户特征树,挖掘用户特征树产生用户需求信息集,然后从当前页面中检索需求的信息,获取用户兴趣信息集.详述了主动检索的基本原理,给出了相应的算法描述,并通过实验证明了该模型具有可行性.
Abstract:
The information capacity of a single web page is far more than the needs of any individual user. The goal of the authors was to construct a discriminative means for retrieving individualized web page information. Such a model could allow fast and precise retrieval of information from current pages, tuned as required by particular individuals. The approach of the researchers was to begin by transforming a web page into an information tree in light of the block characteristics of the web. Next, a user characteristics tree was constructed from user behavior seen in previously browsed pages. This was mined to get information needed about the user. Based on this, elements interesting to the user were retrieved from current pages. The basic principles of discriminative retrieval were introduced, several retrieval algorithms described, and the model’s feasibility experimentally verified. 

参考文献/References:

[1]CAI D, YU S, WEN J R, MA W Y. VIPS: a versionbased page segmentation algorithm MSRTR200379[R]. [s.l.],2003. 
 [2】SONG Ruihua, LIU Haifeng, WEN Jirong, et al. Learning block improtance models for web pages[C]//The 13th International Conference on World Wide Web. New York, USA, 2004:203211. 
[3]CAI D, YU S, WEN J R, et al. Blockbased Web search[C]//27th Annual International ACM SIGIR Conference on Information Retrival. Sheffield, UK,2004: 456463 .
[4]CAI D, YU S, WEN J R, et al. Blockbased link analysis[C]//27th Annual International ACM SIGIR Conference on Information Retrival.Sheffield, UK, 2004:440447 .
 [5]宋 杰,王大玲,鲍玉斌,等. 基于页面Block的Web档案采集和存储[J]. 软件学报,2008,19(2):275290.
 SONG Jie, WANG Daling, BAO Yubin, et al.Collecting〖LL〗and storing web archive based on page block[J]. Journal of Software, 2008, 19(2):275290.
[6] CHRISTOPHER D. Introduction to information retrieval[M]. England: Cambridge University Press,2009:2528.
[7】CHEN K J, MA Weiyun. Unknown word extraction for Chinese documents[C]//19th International Conference on Computational Linguistics. Taipei, China,2002:169175
[8]HOBBS J R. Information extraction from biomedical text[J].Journal of Biomedical Informatics,2002,35(4):260264.
[9]KONGACHANDRA R,KIMPANT C,SUWANAPONG T,et al. Newlyborn keyword extraction under limited knowledge resources based on sentence similarity verification[J].IEEE International Symposium on Communications and Information Technology,2004,21(3):11831187.
[10]GAO Junbo, LUAN Cuiju, WANG Xiaofeng. New keyword extraction research[J]. Computer Engineering and Design,2008,29(3):765767.

备注/Memo

备注/Memo:
收稿日期:2009-12-04.
基金项目:
国家自然科学基金重大研究计划资助项目(90718020);
澳大利亚ARC资助项目(Australian Research Council Discovery Grant,DP0667060).
通信作者:袁鼎荣.E-mail:dryuan@mailbox.gxnu.edu.cn.
作者简介:
袁鼎荣,男,1967年生,副教授,主要研究方向为文本信息处理、网络智能、机器学习、数据挖掘等.主持或主要参与国家或省部级项目4项, 发表学术论文20余篇.
 钟 宁,男,1956年生,教授,博导,主要研究方向为网络智能、知识发现与数据挖掘、粗糙集(Rough Set)与软计算、智能Agent技术与应用、脑信息学等,发表学要论文多篇.
更新日期/Last Update: 2010-05-24