[1]单义,杨金福,武随烁,等.基于跳跃连接金字塔模型的小目标检测[J].智能系统学报,2019,14(06):1144-1151.[doi:10.11992/tis.201905041]
 SHAN Yi,YANG Jinfu,WU Suishuo,et al.Skip feature pyramid network with a global receptive field for small object detection[J].CAAI Transactions on Intelligent Systems,2019,14(06):1144-1151.[doi:10.11992/tis.201905041]
点击复制

基于跳跃连接金字塔模型的小目标检测(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第14卷
期数:
2019年06期
页码:
1144-1151
栏目:
出版日期:
2019-11-05

文章信息/Info

Title:
Skip feature pyramid network with a global receptive field for small object detection
作者:
单义12 杨金福12 武随烁12 许兵兵12
1. 北京工业大学 信息学部, 北京 100124;
2. 计算智能与智能系统北京重点实验室, 北京 100124
Author(s):
SHAN Yi12 YANG Jinfu12 WU Suishuo12 XU Bingbing12
1. Beijing University of Technology, Faculty of Information Technology, Beijing 100124, China;
2. Beijing Key Laboratory of Computational Intelligence and Intelligence System, Beijing 100124, China
关键词:
跳跃连接金字塔全局感受野目标检测深度学习特征提取卷积神经网络空洞卷积图像处理
Keywords:
skip feature pyramid networkglobal receptive fieldobject detectiondeep learningfeature extractionconvolutional neural networkdilated convolutionimage processing
分类号:
TP183
DOI:
10.11992/tis.201905041
摘要:
随着深度学习的发展,目标检测已经获得了较高的精度和效率。但是小目标的检测仍然是一个挑战。小目标检测准确率较低的重要原因是没有充分利用高层特征的语义信息和低层特征的细节信息之间的关系。针对上述问题,本文提出一种基于跳跃连接金字塔模型的小目标检测方法。与其他的目标检测方法不同,本文提出利用跳跃连接金字塔结构来融合多层高层语义特征信息和低层特征图的细节信息。而且为了更好地提取不同尺度物体对应的特征信息,在网络模型中采用不同大小的卷积核和不同步长的空洞卷积来提取全局特征信息。在PASCAL VOC和MS COCO数据集上进行了实验,验证了算法的有效性。
Abstract:
With the development of deep learning, objects can be detected with high accuracy and efficiency. However, the detection of small objects remains challenging. The main reason for this is that the relationship between high-level semantic information and low-level feature maps is not fully utilized. To solve this problem, we propose a novel detection framework, called the skip feature pyramid network with a global receptive field, to improve the ability to detect small objects. Unlike previous detection architectures, the skip feature pyramid architecture fuses high-level semantic information with low-level feature maps to obtain detailed information. To extract global information from a network, we apply a global receptive field (GRF) with convolution kernels of different sizes and different dilated convolution steps. The experimental results on PASCAL VOC and MS COCO datasets show that the proposed approach realizes significant improvements over other comparable detection models.

参考文献/References:

[1] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, OH, USA, 2014:580-587.
[2] GIRSHICK R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile, 2015:1440-1448.
[3] UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective search for object recognition[J]. International journal of computer vision, 2013, 104(2):154-171.
[4] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(6):1137-1149.
[5] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once:unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA, 2016:779-788.
[6] LIU Wei, ANGUELOV D, ERHAN D, et al. SSD:single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands, 2016:21-37.
[7] BELL S, ZITNICK C L, BALA K, et al. Inside-outside net:detecting objects in context with skip pooling and recurrent neural networks[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA, 2016:2874-2883.
[8] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, 2017:936-944.
[9] FU Chengyang, LIU Wei, RANGA A, et al. DSSD:deconvolutional single shot detector[J]. arXiv:1701.06659, 2017.
[10] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA, 2016:770-778.
[11] YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[J]. arXiv:1511.07122, 2015.
[12] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[J]. arXiv:1409.1556, 2014.
[13] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The Pascal visual object classes (VOC) challenge[J]. International journal of computer vision, 2010, 88(2):303-338.
[14] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO:common objects in context[C]//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland, 2014:740-755.
[15] DAI Jifeng, LI Yi, HE Kaiming, et al. R-FCN:object detection via region-based fully convolutional networks[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain, 2016:379-387.
[16] SHEN Zhiqiang, LIU Zhuang, LI Jianguo, et al. DSOD:learning deeply supervised object detectors from scratch[C]//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy, 2017:1937-1945.
[17] REDMON J, FARHADI A. YOLO9000:better, faster, stronger[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, 2017:6517-6525.
[18] ZHOU Peng, NI Bingbing, GENG Cong, et al. Scale-transferrable object detection[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, 2018:528-537.
[19] GIDARIS S, KOMODAKIS N. Object detection via a multi-region and semantic segmentation-aware CNN model[C]//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile, 2015:1134-1142.
[20] HUANG Gao, LIU Zhuang, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, 2017:2261-2269.

备注/Memo

备注/Memo:
收稿日期:2019-05-23。
基金项目:国家自然科学基金项目(6153302);北京市自然科学基金项目(4182009)
作者简介:单义,男,1992年生,硕士研究生,主要研究方向为深度学习、计算机视觉;杨金福,男,1977年生,教授,主要研究方向为机器学习、机器视觉、智能计算与智能系统。近年来承担包括国家大科学工程、国家重点研发计划、国家973计划、国家863计划、国家自然科学基金、北京市自然科学基金等20多项科研项目。申请国家发明专利30余项(获得授权20余项),获得软件著作权登记10余项,发表学术论文80余篇;武随烁,男,1997年生,硕士研究生,主要研究方向为深度学习、计算机视觉
通讯作者:单义.E-mail:15732036708@163.com
更新日期/Last Update: 2019-12-25