[1]刘波,杨路明,邓云龙,等.融合粒子群算法改进XML数据智能清洗策略[J].智能系统学报,2008,3(03):226-234.
 LIU Bo,YANG Lu-ming,DENG Yun-long.An intelligence data cleaning strategy for XML database using PSO[J].CAAI Transactions on Intelligent Systems,2008,3(03):226-234.
点击复制

融合粒子群算法改进XML数据智能清洗策略(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第3卷
期数:
2008年03期
页码:
226-234
栏目:
出版日期:
2008-06-25

文章信息/Info

Title:
An intelligence data cleaning strategy for XML database using PSO
文章编号:
1673-4785(2008)03-0226-08
作者:
刘波; 杨路明; 邓云龙;
中南大学信息学院; 中南大学湘雅附三医院; 湖南长沙; 湖南长沙410083;
Author(s):
LIU Bo1; YANG Lu-ming1; DENG Yun-long2
1.College of Information Science and Engineering; Central-south University; Changsha 410083; China;
2.The 3rd Xiangya Hospital; Changsha 410013; China
关键词:
XML键 粒子群算法 数据清洗 隐马尔可夫模型
Keywords:
XML key particle swarm optimization data cleaning hidden Markov model
分类号:
TP311.13
文献标志码:
A
摘要:
针对XML数据质量问题,以XML键为基础、借助多模板隐马尔可夫模型信息抽取策略与粒子群算法构建新的XML数据清洗方法;为了提高XML相似性数据并行检测效率,尝试利用波函数对粒子群算法进行相应优化.对比其他XML数据清洗算法,一系列仿真实验表明改进的XML数据清洗方法不仅自适应学习功能强、人工参与程度低、计算量小,而且时间性能有94%左右提升.
Abstract:
To imp rove XML data quality, this paper p roposes a new XML data cleaning method based on XML2 keys, the information draw2out strategy of multip le temp lates, the hidden Markov model (HMM) , and particle swarm op timization ( PSO). To imp rove parallel efficiencywhen detecting similar XML records, a wave function is emp loyed to imp rove the PSO algorithm. A series of simulations indicated that, compared with other XML data cleaning algorithms, the imp roved XML data cleaning algorithm has a more powerful adap tive learning capability, requires less human interaction, and reduces computational time by about 94%.

参考文献/References:

[1] 陈伟,丁秋林. 一种XML相似重复数据的清理方法研究[J]北京航空航天大学学报, 2004,(09) .
 [2] 郑仕辉,周傲英,张龙. XML文档的相似测度和结构索引研究[J]计算机学报, 2003,(09) .
 [3] 叶舟,王东. 基于规则引擎的数据清洗[J]计算机工程, 2006,(23) .
 [4] 陆凤霞,王静秋,王宁生. 一种开放式数据清理框架[J]南京航空航天大学学报, 2006,(04) .
[5] 郭志懋,周傲英. 数据质量和数据清洗研究综述[J]软件学报, 2002,(11) .
 [6] 王桐,刘大昕. 一种基于改进粒子群优化的XML结构聚类方法[J]小型微型计算机系统, 2007,(05) .
 [7] 冯玉才,桂浩,李华,李又奎. 数据分析和清理中相关算法研究[J]小型微型计算机系统, 2005,(06) .
[8] RIERA L J,SALAZAR G J. A branch-and-cut algorithm for the continuous error localization problem in data cleaning[J] .Computers&Operations Research, 2007,34, 34 (9) :2790-2804 .
[9] ZHAO QK,CHENL,BHOWMICKS S,et al. XMLstruc-tural delta mining:issues and challenges[J] .Data&Knowledge Engineering, 2006,59, 59 (3) :652-680 .
[10] NAYAK R,IRYADI W. XML schema clustering with se-mantic and hierarchical similarity measures[J] .Knowledge-Based Systems, 2007,20, 20 (6) :336-349 .
[11] RIERA L J,SALAZAR G J. A branch-and-cut algorithm for the continuous error localization problem in data cleaning[J] .Computers&Operations Research, 2007,34, 34 (9) :2790-2804 .
[12] RIERA L J,SALAZAR G J. A heuristic approach for the continuous error localization problem in data cleaning[J] .Computers&Operations Research, 2007,34, 34 (8) :2370-2383 .
[13] LEE C S. Diagnostic,predictive and compositional model-ing with data mining in integrated learning environments[J] .Computers&Education, 2007,49, 49 (3) :562-580 .
[14] Sigmod. Sigmod Record[EB/OL] .ht-tp://www.sigmod.org/record/xml/index.xml. [2007-05-29], .
[15] GEMELLO R,MANA F,SCANZIO S,et al. Linear hid-den trans-formations for adaptation of hybrid ANN/HMM models[J] .Speech Communication, 2007,49, 49 (10) :827-835 .
 [16] CHARITOS T,WAAL P R,GAAG L C. Convergence in Markovian models with implications for efficiency of infer-ence[J] .International Journal of Approximate Reasoning, 2007,46, 46 (2) :300-319 .

备注/Memo

备注/Memo:
收稿日期:2007-06-25
基金项目:湖南信息职业学院科技创新资助项目(108652006011);; 湖南省教育厅科研基金资助项目(05c671)
作者简介:
刘波,男,1969年生,博士研究生,主要研究方向为软件工程与数据库技术
杨路明,男,1947年生,教授,博士生导师,主要研究方向为信息系统与数据库技术,发表学术论文40余篇
邓云龙,男,1962年生,教授,博士生导师,博士,主要研究方向为软件心理学,湖南省首批新世纪“121”人才工程入选专家,获得科研教学成果多项,发表学术论文50余篇
通讯作者:刘波,E-mail:ltbo99@yahoo.com.cn
更新日期/Last Update: 2009-05-14