<-上一篇/Previous Article 下一篇/Next Article->

[1]陈杰,孙忠贵,周书锋.小波的文本图像区分及其在文献信息数字化中的应用[J].智能系统学报,2010,5(2):185-188.
　CHEN Jie,SUN Zhong-gui,ZHOU Shu-feng.Applying image classification using wavelets to digitization of document information[J].CAAI Transactions on Intelligent Systems,2010,5(2):185-188.

点击复制

小波的文本图像区分及其在文献信息数字化中的应用

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 5 期数: 2010年第2期页码: 185-188 栏目: 学术论文—自然语言处理与理解出版日期: 2010-04-25

Title:: Applying image classification using wavelets to digitization of document information

文章编号:: 1673-4785(2010)02-0185-04

作者:: 陈杰¹，孙忠贵²，周书锋²; 1.聊城大学图书馆，山东聊城 252059；
?2.聊城大学数学科学学院，山东聊城 252059

Author(s):: CHEN Jie^1, SUN Zhong-gui²，ZHOU Shu-feng²; 1. Library of Liaocheng University, Liaocheng 252059, China；
2.College of Mathematics Science, Liaocheng University, Liaocheng 252059, China

关键词:: 数字化文献; OCR; 小波; 文本图像

Keywords:: digitalizing document; OCR; wavelet; text image

分类号:: TP18； TN911.72

文献标志码:: A

摘要:: 目前，OCR技术对文本图像区域自动区分的效果还不够精确，进而影响了OCR技术在文献信息数字化过程中的工作效率.针对这一局限，提出了一种基于小波的文本图像区分方法.方法首先对扫描区域进行小波分解，然后使用分解系数构建分解能量，最后依据分解能量大小对文本图像进行自动区分.结果表明，该方法对文本图像的区分效果较好，减少了在使用OCR技术进行文献信息数字化时的人为干预，有利于提高文献信息数字化过程的自动化水平.最后通过实验仿真验证了该方法的有效性.

Abstract:: The accuracy of optical character recognition (OCR) technology in distinguishing between text areas and image areas has remained relatively low. Unfortunately this reduces the efficiency of OCR in digitization of document information. After analyzing the main steps of OCR applied to a digital library, the authors evolved an image classification algorithm based on wavelets. Decomposing the scanning area with wavelet transform was the first step in the algorithm. The energy value of the area could then be derived from wavelet coefficients. The task of distinguishing between text and images was accomplished by analyzing their energy values. The algorithm proved fast and automatic, characteristics increasing the efficiency of the digitization of document information. It was clear that the simulation verified the new algorithm’s feasibility.

参考文献/References:: ［1］孙洪睿．高校数据信息平台的研究与设计［J］．应用科技，2009(7)：4146．
?SUN Hongrui. The research and design platform of college information［J］. Applied Science and Technology, 2009(7):4146．
［2］孙萍，苏东出．基于OCR的电子图书目录自动生成算法的实现［J］．现代情报，2004(9)：151155．
?SUN Ping, SU Dongchu. An algorithm based on OCR for ebook directory automatically generated［J］. Modern Information, 2004(9):151155．
［3］梁红．高校数字图书馆信息资源建设探析［J］．图书馆工作与研究，2005(4)：5557．
?LIANG Hong. The analysis about the construction of university digital library information resources［J］. Library Work and Study, 2005(4): 5557． 
［4］上海中晶科技有限公司．OCR软件使用经验谈［J］．电子出版，2002(6)：10．
Shanghai Microtek Technology Co, Ltd. The applying experience of OCR software［J］. Electronic Publishing, 2002(6):10．
［5］张成昱，赵仪，邹荣，等．中文电子图书系统开发和应用研究［J］．大学图书馆学报，2002(4)：1923．
ZHANG Chenyu, ZAO Yi, ZHOU Rong, et al. Study on the development and application of a Chinese ebook system ［J］. Journal of Academic Libraries, 2002(4):1923． 
［6］苏东出．一种改进的黑白二值化方法—谈文献扫描图像的数字化处理［J］．情报杂志，2003(5)：6970．
?SU Dongchu. An improved binarization method: introducing the digitization of documents scanning［J］. Journal of Information, 2003(5): 6970．
［7］MALLAT S．信号处理的小波导引［M］．北京：机械工业出版社，2002：193199．
［8］SCHETTINI R, BRAMBILLA C, CIOCCAA G, VALSASNA A, De PONTI M. A hierarchical classification strategy for digital documents［J］. Pattern Recognition, 2002(35):17591769．
［9］唐远炎，王玲．小波分析与文本文字识别［M］．北京：科学出版社，2004：269277．
［10］The University of Manchester. Face image library ［EB/OL］. ［200591］. http://images.ee.umist.ac.uk/danny/face.tar.gz．
［11］GONZALEZ R C, WOODS R E, EDDINS S L. Digital image processing using Matlab［M］．Beijng: Publishing House of Electronics Industry, 2004:181186．
［12］张晓威，郑雄波，郭健．小波域内背景图像的文本信息提取研究［J］．哈尔滨工程大学学报，2008(3)：314318．
ZHANG Xiaowei, ZHENG Xiongbo, GUO Jian. Extracting text information from a background image using wavelet domains［J］. Journal of Harbin Engineering University, 2008(3):314318．

备注/Memo

收稿日期：2009-12-05.
基金项目：聊城大学青年教师科研基金资助项目（X0810029）.
通信作者：孙忠贵.E-mail:altlp@vip.sina.com.
作者简介：
陈杰，女，1974年生，主要研究方向为图书馆资源建设、信息检索等.发表学术论文10篇. 
孙忠贵，男，1971年生，副教授，主要研究方向为信息处理、机器学习等.发表学术论文15篇.
周书锋，男，1973年生，讲师，主要研究方向为机器学习、计算机应用等.发表学术论文13篇.

更新日期/Last Update: 2010-05-24

小波的文本图像区分及其在文献信息数字化中的应用 PDF下载HTML

备注/Memo

小波的文本图像区分及其在文献信息数字化中的应用

PDF下载 HTML