[1]陈 杰,孙忠贵,周书锋.小波的文本图像区分及其在文献信息数字化中的应用[J].智能系统学报,2010,5(2):185-188.
CHEN Jie,SUN Zhong-gui,ZHOU Shu-feng.Applying image classification using wavelets to digitization of document information[J].CAAI Transactions on Intelligent Systems,2010,5(2):185-188.
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
5
期数:
2010年第2期
页码:
185-188
栏目:
学术论文—自然语言处理与理解
出版日期:
2010-04-25
- Title:
-
Applying image classification using wavelets to digitization of document information
- 文章编号:
-
1673-4785(2010)02-0185-04
- 作者:
-
陈 杰1,孙忠贵2,周书锋2
-
1.聊城大学 图书馆,山东 聊城 252059;
?2.聊城大学 数学科学学院,山东 聊城 252059
- Author(s):
-
CHEN Jie1, SUN Zhong-gui2,ZHOU Shu-feng2
-
1. Library of Liaocheng University, Liaocheng 252059, China;
2.College of Mathematics Science, Liaocheng University, Liaocheng 252059, China
-
- 关键词:
-
数字化文献; OCR; 小波; 文本图像
- Keywords:
-
digitalizing document; OCR; wavelet; text image
- 分类号:
-
TP18; TN911.72
- 文献标志码:
-
A
- 摘要:
-
目前,OCR技术对文本图像区域自动区分的效果还不够精确,进而影响了OCR技术在文献信息数字化过程中的工作效率.针对这一局限,提出了一种基于小波的文本图像区分方法.方法首先对扫描区域进行小波分解,然后使用分解系数构建分解能量,最后依据分解能量大小对文本图像进行自动区分.结果表明,该方法对文本图像的区分效果较好,减少了在使用OCR技术进行文献信息数字化时的人为干预,有利于提高文献信息数字化过程的自动化水平.最后通过实验仿真验证了该方法的有效性.
- Abstract:
-
The accuracy of optical character recognition (OCR) technology in distinguishing between text areas and image areas has remained relatively low. Unfortunately this reduces the efficiency of OCR in digitization of document information. After analyzing the main steps of OCR applied to a digital library, the authors evolved an image classification algorithm based on wavelets. Decomposing the scanning area with wavelet transform was the first step in the algorithm. The energy value of the area could then be derived from wavelet coefficients. The task of distinguishing between text and images was accomplished by analyzing their energy values. The algorithm proved fast and automatic, characteristics increasing the efficiency of the digitization of document information. It was clear that the simulation verified the new algorithm’s feasibility.
更新日期/Last Update:
2010-05-24