[1]葛园园,许有疆,赵帅,等.自动驾驶场景下小且密集的交通标志检测[J].智能系统学报,2018,13(03):366-372.[doi:10.11992/tis.201706040]
 GE Yuanyuan,XU Youjiang,ZHAO Shuai,et al.Detection of small and dense traffic signs in self-driving scenarios[J].CAAI Transactions on Intelligent Systems,2018,13(03):366-372.[doi:10.11992/tis.201706040]
点击复制

自动驾驶场景下小且密集的交通标志检测(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第13卷
期数:
2018年03期
页码:
366-372
栏目:
出版日期:
2018-05-05

文章信息/Info

Title:
Detection of small and dense traffic signs in self-driving scenarios
作者:
葛园园1 许有疆1 赵帅2 韩亚洪1
1. 天津大学 计算机科学与技术学院, 天津 300350;
2. 中国汽车技术研究中心 数据资源中心, 天津 300300
Author(s):
GE Yuanyuan1 XU Youjiang1 ZHAO Shuai2 HAN Yahong1
1. School of Computer Science and Technology, Tianjin University, Tianjin 300350, China;
2. Data Resource Center, China Automotive Technology and Research Center, Tianjin 300300, China
关键词:
交通标志目标检测深度学习组合特征卷积神经网络特征图候选框自动驾驶
Keywords:
traffic signobject detectiondeep learningaggregate featureCNNfeature mapregion proposalself-driving
分类号:
TP183
DOI:
10.11992/tis.201706040
摘要:
在自动驾驶场景中,交通标志的检测和识别对行车周围环境的理解至关重要。行车过程中拍摄的图片中存在许多较小的交通标志,它们很难被现有的物体检测方法检测到。为了能够精确地检测到这部分小的交通标志,我们提出了用浅层VGG16网络作为物体检测框架R-FCN的主体网络,并改进VGG16网络,主要有两个改进点:1)减小特征图缩放倍数,去掉VGG16网络卷积conv4_3后面的特征图,使用RPN网络在浅层卷积conv4_3上提取候选框;2)特征拼层,将尺度相同的卷积conv4_1、conv4_2、conv4_3层的特征拼接起来形成组合特征(aggregated feature)。改进后的物体检测框架能够检测到更多的小物体,在驭势科技提供的交通标志数据集上取得了很好的性能,检测的准确率mAP达到了65%。
Abstract:
In self-driving scenarios, the detection and recognition of traffic signs is critical to understanding the driving environment. The plethora of small traffic signs are hard to detect by the existing object detection technology. To detect these small traffic signs accurately, we propose the use of the shallow network VGG16 as the R-FCN’s backbone and the modification of the VGG16 network. There are mainly two improvements in the VGG16 network. First, we reduce the multiple zooming of feature maps, remove the feature maps behind the VGG16 network convolution conv4_3, and use the RPN network to extract the region proposal in the shallow convolution conv4_3 layer. We then concatenate the feature maps. The features of the layers of the convolutions conv4_1, conv4_2, and conv4_3 are adjoined to form an aggregated feature. The improved object detection framework can detect more small objects. We use a dataset of traffic signs to test the performance and mAP accuracy.

参考文献/References:

[1] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Proceedings of Advances in Neural Information Processing Systems. Stateline, NV, USA, 2012:1097-1105.
[2] SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA, 2015:1-9.
[3] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[J]. arXiv:1412.7062, 2015.
[4] YU Gang, YUAN Junsong. Fast action proposals for human action detection and search[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA, 2015:1302-1311.
[5] HONG S, YOU T, KWAK S, et al. Online tracking by learning discriminative saliency map with convolutional neural network[C]//Proceedings of the 32nd International Conference on Machine Learning. Lille, France, 2015:597-606.
[6] WANG Naiyan, YEUNG D Y. Learning a deep compact image representation for visual tracking[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2013:809-817.
[7] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA, 2016:2818-2826.
[8] SAHA S, SINGH G, SAPIENZA M, et al. Deep learning for detecting multiple space-time action tubes in videos[J]. arXiv:1608.01529, 2016.
[9] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA, 2014:580-587.
[10] UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective search for object recognition[J]. International journal of computer vision, 2013, 104(2):154-171.
[11] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The Pascal visual object classes (VOC) challenge[J]. International journal of computer vision, 2010, 88(2):303-338.
[12] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland, 2014:346-361.
[13] GIRSHICK R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile, 2015:1440-1448.
[14] Ren S, He K, Girshick R, et al. Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Proceedings of 2015 Advances in Neural Information Processing Systems. Montréal,Canada, 2015:91-99.
[15] Li Y, He K, Sun J. R-fcn:Object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems. Barcelona, Spain, 2016:379-387.
[16] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once:unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA, 2016:779-788.
[17] LIU Wei, ANGUELOV D, ERHAN D, et al. SSD:single shot multibox detector[C]//Proceedings of 14th European Conference on Computer Vision. Amsterdam, The Netherlands, 2016:21-37.
[18] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA, 2016:770-778.
[19] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409.1556, 2014.
[20] LIU Wei, RABINOVICH A, BERG A C. ParseNet:looking wider to see better[J]. arXiv:1506.04579, 2015.

相似文献/References:

[1]胡光龙,秦世引.动态成像条件下基于SURF和Mean shift的运动目标高精度检测[J].智能系统学报,2012,7(01):61.
 HU Guanglong,QIN Shiyin.High precision detection of a mobile object under dynamic imaging based on SURF and Mean shift[J].CAAI Transactions on Intelligent Systems,2012,7(03):61.
[2]韩峥,刘华平,黄文炳,等.基于Kinect的机械臂目标抓取[J].智能系统学报,2013,8(02):149.[doi:10.3969/j.issn.1673-4785.201212038]
 HAN Zheng,LIU Huaping,HUANG Wenbing,et al.Kinect-based object grasping by manipulator[J].CAAI Transactions on Intelligent Systems,2013,8(03):149.[doi:10.3969/j.issn.1673-4785.201212038]
[3]韩延彬,郭晓鹏,魏延文,等.RGB和HSI颜色空间的一种改进的阴影消除算法[J].智能系统学报,2015,10(5):769.[doi:10.11992/tis.201410010]
 HAN Yanbin,GUO Xiaopeng,WEI Yanwen,et al.An improved shadow removal algorithm based on RGB and HSI color spaces[J].CAAI Transactions on Intelligent Systems,2015,10(03):769.[doi:10.11992/tis.201410010]
[4]曾宪华,易荣辉,何姗姗.流形排序的交互式图像分割[J].智能系统学报,2016,11(1):117.[doi:10.11992/tis.201505037]
 ZENG Xianhua,YI Ronghui,HE Shanshan.Interactive image segmentation based on manifold ranking[J].CAAI Transactions on Intelligent Systems,2016,11(03):117.[doi:10.11992/tis.201505037]
[5]莫宏伟,汪海波.基于Faster R-CNN的人体行为检测研究[J].智能系统学报,2018,13(06):967.[doi:10.11992/tis.201801025]
 MO Hongwei,WANG Haibo.Research on human behavior detection based on Faster R-CNN[J].CAAI Transactions on Intelligent Systems,2018,13(03):967.[doi:10.11992/tis.201801025]

备注/Memo

备注/Memo:
收稿日期:2017-06-10。
基金项目:国家自然科学基金项目(61472276).
作者简介:葛园园,女,1991年生,硕士研究生,主要研究方向为物体检测;许有疆,男,1992年生,硕士研究生,主要研究方向为视频动作识别;赵帅,男,1988年生,硕士研究生,主要研究方向为深度学习与机器学习、车辆动力学、自动驾驶技术、驾驶行为分析。
通讯作者:韩亚洪.E-mail:yahong@tju.edu.cn.
更新日期/Last Update: 2018-06-25