[1]孙庆美,金聪.基于视觉注意机制和条件随机场的图像标注[J].智能系统学报,2016,11(4):442-448.[doi:10.11992/tis.201606004]
 SUN Qingmei,JIN Cong.Image annotation method based on visual attention mechanism and conditional random field[J].CAAI Transactions on Intelligent Systems,2016,11(4):442-448.[doi:10.11992/tis.201606004]
点击复制

基于视觉注意机制和条件随机场的图像标注(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第11卷
期数:
2016年4期
页码:
442-448
栏目:
出版日期:
2016-07-25

文章信息/Info

Title:
Image annotation method based on visual attention mechanism and conditional random field
作者:
孙庆美 金聪
华中师范大学 计算机学院, 湖北 武汉 430079
Author(s):
SUN Qingmei JIN Cong
School of Computer, Central China Normal University, Wuhan 430079, China
关键词:
自动图像标注视觉注意词相关性条件随机场
Keywords:
automatic image annotationvisual attention mechanisminter-word correlationconditional random fields
分类号:
TP391
DOI:
10.11992/tis.201606004
摘要:
传统的图像标注方法对图像各个区域同等标注,忽视了人们对图像的理解方式。为此提出了基于视觉注意机制和条件随机场的图像标注方法。首先,由于人们在对图像认识的过程中,对显著区域会有较多的关注,因此通过视觉注意机制来取得图像的显著区域,用支持向量机对显著区域赋予语义标签;再利用k-NN聚类算法对非显著区域进行标注;最后,又由于显著区域的标注词与非显著区域的标注词在逻辑上存在一定的关联性,因此条件随机场模型可以根据标注词的关联性校正并确定图像的最终标注向量。在Corel5k、IAPR TC-12和ESP Game图像库上进行实验并且和其他方法进行比较,从平均查准率、平均查全率和F1的实验结果验证了本文方法的有效性。
Abstract:
Traditional image annotation methods interpret all image regions equally, neglecting any understanding of the image. Therefore, an image annotation method based on the visual attention mechanism and conditional random field, called VAMCRF, is proposed. Firstly, people pay more attention to image salient regions during the process of image recognition; this can be achieved through the visual attention mechanism and the support vector machine is then used to assign semantic labels. It then labels the non-salient regions using a k-NN clustering algorithm. Finally, as the annotations of salient and non-salient regions are logically related, the ultimate label vector of the image can be corrected and determined by a conditional random field (CRF) model and inter-word correlation. From the values of average precision, average recall, and F1, the experimental results on Corel5k, IAPR TC-12, and ESP Game confirm that the proposed method is efficient compared with traditional annotation methods.

参考文献/References:

[1] WANG Meng, NI Bingbing, HUA Xiansheng, et al. Assistive tagging:a survey of multimedia tagging with human-computer joint exploration[J]. ACM computing surveys, 2012, 44(4):25.
[2] JIN Cong, JIN Shuwei. Image distance metric learning based on neighborhood sets for automatic image annotation[J]. Journal of visual communication and image representation, 2016, 34:167-175.
[3] DUYGULU P, BARNARD K, DE FREITAS J F G, et al. Object recognition as machine translation:learning a lexicon for a fixed image vocabulary[C]//Proceedings of the 7th European Conference on Computer Vision. Berlin Heidelberg:Springer-Verlag, 2002:97-112.
[4] JEON J, LAVRENKO V, MANMATHA R. Automatic image annotation and retrieval using cross-media relevance models[C]//Proceedings of the 26th annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA:ACM, 2003:119-126.
[5] LOOG M. Semi-supervised linear discriminant analysis through moment-constraint parameter estimation[J]. Pattern recognition letters, 2014, 37:24-31.
[6] FU Hong, CHI Zheru, FENG Dagan. Recognition of attentive objects with a concept association network for image annotation[J]. Pattern recognition, 2010, 43(10):3539-3547.
[7] FAREED M M S, AHMED G, CHUN Qi. Salient region detection through sparse reconstruction and graph-based ranking[J]. Journal of visual communication and image representation, 2015, 32:144-155.
[8] JIA Cong, QI Jinqing, LI Xiaohui, et al. Saliency detection via a unified generative and discriminative model[J]. Neurocomputing, 2016, 173:406-417.
[9] KHANDOKER A H, PALANISWAMI M, KARMAKAR C K. Support vector machines for automated recognition of obstructive sleep apnea syndrome from ECG recordings[J]. IEEE transactions on information technology in biomedicine, 2009, 13(1):37-48.
[10] PRUTEANU-MALINICI I, MAJOROS W H, OHLER U. Automated annotation of gene expression image sequences via non-parametric factor analysis and conditional random fields[J]. Bioinformatics, 2013, 29(13):i27-i35.
[11] VERMA Y, JAWAHAR C V. Image annotation using metric learning in semantic neighbourhoods[C]//Proceedings of the 12th European Conference on Computer Vision. Berlin Heidelberg:Springer, 2012:836-849.
[12] NAKAYAMA H. Linear distance metric learning for large-scale generic image recognition[D]. Tokyo, Japan:The University of Tokyo, 2011.
[13] FENG S L, MANMATHA R, LAVRENKO V. Multiple Bernoulli relevance models for image and video annotation[C]//Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC, USA:IEEE, 2004, 2:Ⅱ-1002-Ⅱ-1009.
[14] MAKADIA A, PAVLOVIC V, KUMAR S. A new baseline for image annotation[C]//Proceedings of the European Conference on Computer Vision. Berlin Heidelberg:Springer-Verlag, 2008:316-329.
[15] GUILLAUMIN M, MENSINK T, VERBEEK J, et al. TagProp:discriminative metric learning in nearest neighbor models for image auto-annotation[C]//Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. Kyoto:IEEE, 2009:309-316.

相似文献/References:

[1]吕国宁,高敏.视觉感知式场景文字检测定位方法[J].智能系统学报,2017,12(04):563.[doi:10.11992/tis.201604011]
 LYU Guoning,GAO Min.Scene text detection and localization scheme with visual perception mechanism[J].CAAI Transactions on Intelligent Systems,2017,12(4):563.[doi:10.11992/tis.201604011]

备注/Memo

备注/Memo:
收稿日期:2016-06-02。
基金项目:国家社会科学基金项目(13BTQ050).
作者简介:孙庆美,女,1989年生,硕士研究生,主要研究方向为数字图像处理;金聪,女,1960年生,教授,博士。主要研究方向为数字图像处理。
通讯作者:金聪.E-mail:jinc26@aliyun.com.
更新日期/Last Update: 1900-01-01