[1]张新钰,邹镇洪,李志伟,等.面向自动驾驶目标检测的深度多模态融合技术[J].智能系统学报,2020,15(4):758-771.[doi:10.11992/tis.202002010]
 ZHANG Xinyu,ZOU Zhenhong,LI Zhiwei,et al.Deep multi-modal fusion in object detection for autonomous driving[J].CAAI Transactions on Intelligent Systems,2020,15(4):758-771.[doi:10.11992/tis.202002010]
点击复制

面向自动驾驶目标检测的深度多模态融合技术

参考文献/References:
[1] URMSON C, ANHALT J, BAGNELL D, et al. Autonomous driving in urban environments: boss and the urban challenge[J]. Journal of field robotics, 2008, 25(8): 425-466.
[2] EVERINGHAM M, VANGOOL L, WILLIAMS C K I, et al. The PASCAL visual object classes challenge 2007 Results[EB/OL]. http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html.
[3] HILLEL A B, LERNER R, LEVI D, et al. Recent progress in road and lane detection: a survey[J]. Machine vision applications, 2014, 25(3): 727-745.
[4] FENG D, HAASE-SCHUETZ C, ROSENBAUM L, et al. Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges[J]. arXiv preprint arXiv: 1902.07830, 2019.
[5] 罗俊海, 杨阳. 基于数据融合的目标检测方法综述[J]. 控制与决策, 2020, 35(1): 1-15
LUO Junhai, YANG Yang. An overview of target detection methods based on data fusion[J]. Control and decision, 2020, 35(1): 1-15
[6] HARIHARAN B, ARBELáEZ P, GIRSHICK R, et al. Simultaneous detection and segmentation[C]//European Conference on Computer vision. Zurich, Switzerland, 2014: 297-312.
[7] KANG K, LI H, YAN J, et al. T-CNN: tubelets with convolutional neural networks for object detection from videos[J]. IEEE transactions on circuits and systems for video technology, 2018, 28(10): 2896-2907.
[8] ZOU Z, SHI Z, GUO Y, et al. Object detection in 20 years: a survey[J]. arXiv preprint arXiv: 1905.05055, 2019.
[9] ARNOLD E, AL-JARRAH O Y, DIANATI M, et al. A survey on 3D object detection methods for autonomous driving applications[J]. IEEE transactions on intelligent transportation systems, 2019, 20(10): 3782-3795.
[10] MEES O, EITEL A, BURGARD W. Choosing smartly: Adaptive multimodal fusion for object detection in changing environments[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Daejeon, South Korea, 2016: 151-156.
[11] EITEL A, SPRINGENBERG J T, SPINELLO L, et al. Multimodal deep learning for robust RGB-D object recognition[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg, Germany, 2015: 681-687.
[12] YU J, JIANG Y, WANG Z, et al. UnitBox: an advanced object detection network[C]//ACM International Conference on Multimedia. Amsterdam, Netherlands, 2016: 516-520.
[13] REZATOFIGHI H, TSOI N, GWAK J, at al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA, 2019: 658-666.
[14] REDMON J, FARHADI A. YOLOv3: An incremental improvement[J]. arXiv preprint arXiv: 1804.02767, 2018.
[15] REN S, HE K, GIRSHICK R, et al. Faster RCNN: towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems. Montreal, Canada, 2015: 91-99.
[16] YANG W, ZHANG X, TIAN Y, at al. Deep learning for single image super-resolution: a brief review[J]. IEEE transactions on multimedia, 2019, 21(12): 3106-3121.
[17] LU Y, LU C, TANG C K. Online video object detection using association LSTM[C]//IEEE International Conference on Computer Vision. Venice, Italy, 2017: 2363-2371.
[18] WANG S, ZHOU Y, YAN J, at al. Fully motion-aware network for video object detection[C]//The European Conference on Computer Vision. Munich, Germany, 2018: 557-573.
[19] GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]//IEEE Conference on Computer Vision and Pattern Recognition. RI, USA, 2012: 3354-3361.
[20] SUN P, KRETZSCHMAR H, DOTIWALLA X, et al. Scalability in perception for autonomous driving: Waymo open dataset[C]//IEEE Conference on Computer Vision and Pattern Recognition. Virtual, 2020: 2446-2454.
[21] CAESAR H, BANKITI V, LANG A H, et al. NuScenes: A multimodal dataset for autonomous driving[J].arXiv preprint arXiv:1903.11027, 2019.
[22] HUANG X, CHENG X, GENG Q, et al. The Apolloscape dataset for autonomous driving[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018: 954-960.
[23] CHEN X, MA H, WAN J, et al. Multi-view 3D object detection network for autonomous driving[C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, 2017: 1907-1915.
[24] MUR-ARTAL R, TARDOS J D. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE transactions on robotics, 2017, 33(5): 1255-1262.
[25] CHADWICK S, MADDERN W, NEWMAN P. Distant vehicle detection using radar and vision[C]//International Conference on Robotics and Automation. Montreal, Canada, 2019: 8311-8317.
[26] BIJELIC M, GRUBER T. Seeing through fog without seeing fog: deep sensor fusion in the absence of labeled training data[C]//IEEE Conference on Computer Vision and Pattern Recognition. Virtual, 2020: 11621-11631.
[27] DU X, ANG M H, KARAMAN S, et al. A general pipeline for 3D detection of vehicles[C]//IEEE international Conference on Robotics and Automation. Brisbane, Australia, 2018: 3194-3200.
[28] BANERJEE K, NOTZ D, WINDELEN J, et al. Online camera lidar fusion and object detection on hybrid data for autonomous driving[C]//IEEE Intelligent Vehicles Symposium. Changshu, China, 2018: 1632-1638.
[29] LIU J, ZHANG S, WANG S, et al. Multispectral deep neural networks for pedestrian detection[C]//British Machine Vision Conference. York, UK, 2016: 1-13.
[30] FISCHER V, HERMAN M, BEHNKE S. Multispectral pedestrian detection using deep fusion convolutional neural networks[C]//European Symposium on artificial Neural Networks. Bruges, Belgium, 2016: 27-29.
[31] VIOLA P, JONES M. Rapid object detection using a boosted cascade of simple features[C]//IEEE Conference on Computer Vision and Pattern Recognition. Kauai, USA, 2001: 511-518.
[32] VIOLA P, JONES M. Robust real-time face detection[J]. International journal of computer vision, 2004, 57: 137-154.
[33] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. San Diego, USA, 2005: 886-893.
[34] FELZENSZWALB P, MCALLESTER D, RAMANAN D. A discriminatively trained, multiscale, deformable part model[C]//IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA, 2008: 1-8.
[35] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Advances in neural Information Processing Systems. Lake Tahoe, USA, 2012: 1097-1105.
[36] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014: 580-587.
[37] GIRSHICK R. Fast R-CNN[C]//IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 1440-1448
[38] GIRSHICK R, DONAHUE J, DARRELL T, et al. Region-based convolutional networks for accurate object detection and segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 38(1): 142-158.
[39] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, 2017: 2117-2125.
[40] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016: 779-788
[41] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//European Conference on Computer Vision. Amsterdam, Netherlands, 2016: 21-37.
[42] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, 2017: 7263-7271.
[43] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE international conference on computer vision, 2017: 2980-2988.
[44] MEYER G P, LADDHA A, KEE E, et al. LaserNet: an efficient probabilistic 3D object detector for autonomous driving[C]//IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA, 2019: 12677-12686.
[45] QI C R, SU H, MO K, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, 2017: 652-660.
[46] QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]//Advances in Neural Information Processing Systems. Long Beach, USA, 2017: 5099-5108.
[47] YANG B, LUO W, URTASUN R. PIXOR: real-time 3D object detection from point clouds[C]//IEEE Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018: 7652-7660.
[48] ASVADI A, GARROTE L, PREMEBIDA C, et al. Multimodal vehicle detection: fusing 3D LIDAR and color camera data[J]. Pattern recognition letters, 2018, 115: 20-29.
[49] SCHLOSSER J, CHOW C K, KIRA Z. Fusing LIDAR and images for pedestrian detection using convolutional neural networks[C]//IEEE International Conference on Robotics and Automation. Stockholm, Sweden, 2016: 2198-2205.
[50] QI C R, GUIBAS L J. Frustum PointNets for 3D object detection from RGB-D data[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018: 918-927.
[51] GUAN D, CAO Y, YANG J, et al. Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection[J]. Information fusion, 2018, 50: 148-157.
[52] YANG B, LIANG M, URTASUN R, at al. HDNET: exploiting HD maps for 3D object detection[J]. Proceedings of machine learning research, 2018, 87: 146-155.
[53] ZHOU T, JIANG K, XIAO Z, et al. Object detection using multi-sensor fusion based on deep learning[C]//COTA International Conference of Transportation. Nanjing, China, 2019: 5770-5782.
[54] CHO H, SEO Y W, KUMAR B. A multi-sensor fusion system for moving object detection and tracking in urban driving environments[C]//IEEE International Conference on Robotics and Automation. Hong Kong, China, 2014: 1836-1843.
[55] DOU J, XUE J, FANG J. SEG-VoxelNet for 3D vehicle detection from RGB and lidar data[C]//International Conference on Robotics and Automation. Montreal, Canada, 2019: 4362-4368.
[56] LIANG M, YANG B, CHEN Y, et al. Multi-task multi-sensor fusion for 3D object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA, 2019: 7345-7353.
[57] SINDAGI V A, ZHOU Y, TUZEL O. MVX-Net: multimodal VoxelNet for 3D object detection[C]//2019 International Conference on Robotics and Automation. Montreal, Canada, 2019: 7276-7282.
[58] LIANG M, YANG B, WANG S, et al. Deep continuous fusion for multi-sensor 3D object detection[C]//The European Conference on Computer Vision. Munich, Germany, 2018: 641-656.
[59] WANG Z, ZHAN W, TOMIZUKA M. Fusing bird’s eye view LIDAR point cloud and front view camera image for deep object detection[C]//IEEE Intelligent Vehicles Symposium. Changshu, China, 2018: 1-6.
[60] KIM J, KOH J, KIM Y, et al. Robust deep multi-modal learning based on gated information fusion network[C]//Asian Conference on Computer Vision. Perth, Australia, 2018: 90-106.
[61] CASAS S, LUO W, URTASUN R. IntentNet: learning to predict intention from raw sensor data[J]. Proceedings of machine learning research, 2018, 87: 947-956.
[62] KU J, MOZIFIAN M, LEE J, et al. Joint 3D proposal generation and object detection from view aggregation[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid, Spain, 2018: 5750-5757.
[63] PFEUFFER A, DIETMAYER K. Optimal sensor data fusion architecture for object detection in adverse weather conditions[C]//International Conference on Information Fusion. Cambridge, UK, 2018: 1-8.
[64] XU D, ANGUELOV D, JAIN A. PointFusion: deep sensor fusion for 3D bounding box estimation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018: 244-253.
[65] DU X, ANG M H, RUS D. Car detection for autonomous vehicle: lidar and vision fusion approach through deep learning framework[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver, Canada, 2017: 749-754.
[66] MATTI D, EKENEL H K, THIRAN J. Combining LiDAR space clustering and convolutional neural networks for pedestrian detection[C]//IEEE International Conference on Advanced Video and Signal Based Surveillance. Lecce, Italy, 2017: 1-6.
[67] SCHNEIDER L, JASCH M. Multimodal neural networks: RGB-D for semantic segmentation and object detection[C]//Scandinavian Conference on Image Analysis. Norrk?ping, Sweden, 2017: 98-109.
[68] OH S, KANG H. Object detection and classification by decision-level fusion for intelligent vehicle systems[J]. Sensors (Basel), 2017, 17(1): 207-214.
[69] KIM T, GHOSH J. Robust detection of nonmotorized road users using deep learning on optical and lidar data[C]//IEEE Iinternational Conference on Intelligent Transportation Systems. Rio de Janeiro, Brazil, 2016: 271-276.
[70] BAI M, MATTYUS G, HOMAYOUNFAR N, et al. Deep multi-sensor lane detection[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid, Spain, 2018: 3102-3109.
[71] CALTAGIRONE L, BELLONE M, SVENSSON L, et al. LIDAR-camera fusion for road detection using fully convolutional neural networks[J]. Robotics and autonomous systems, 2019, 111: 125-131.
[72] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780.
[73] ZHANG X, ZHOU M, LIU H, et al. A cognitively-inspired system architecture for the Mengshi cognitive vehicle[J]. Cognitive computation, 2020, 12(1): 140-149.
[74] ZOU Q, JIANG H, DAI Q, et al. Robust lane detection from continuous driving scenes using deep neural networks[J]. IEEE transactions on vehicular technology, 2020, 69(1): 41-54.
[75] ZHANG X, GAO H, GUO M, et al. A study on key technologies of unmanned driving[J]. CAAI transactions on intelligence technology, 2016, 1(1): 4-13.
相似文献/References:
[1]毕晓君,张艳双.基于免疫算法的无线传感器网络路由算法[J].智能系统学报,2009,4(1):67.
 BI Xiao-jun,ZHANG Yan-shuang.A routing algorithm for wireless sensor networks based on an immune algorithm[J].CAAI Transactions on Intelligent Systems,2009,4():67.
[2]胡光龙,秦世引.动态成像条件下基于SURF和Mean shift的运动目标高精度检测[J].智能系统学报,2012,7(1):61.
 HU Guanglong,QIN Shiyin.High precision detection of a mobile object under dynamic imaging based on SURF and Mean shift[J].CAAI Transactions on Intelligent Systems,2012,7():61.
[3]韩峥,刘华平,黄文炳,等.基于Kinect的机械臂目标抓取[J].智能系统学报,2013,8(2):149.[doi:10.3969/j.issn.1673-4785.201212038]
 HAN Zheng,LIU Huaping,HUANG Wenbing,et al.Kinect-based object grasping by manipulator[J].CAAI Transactions on Intelligent Systems,2013,8():149.[doi:10.3969/j.issn.1673-4785.201212038]
[4]韩延彬,郭晓鹏,魏延文,等.RGB和HSI颜色空间的一种改进的阴影消除算法[J].智能系统学报,2015,10(5):769.[doi:10.11992/tis.201410010]
 HAN Yanbin,GUO Xiaopeng,WEI Yanwen,et al.An improved shadow removal algorithm based on RGB and HSI color spaces[J].CAAI Transactions on Intelligent Systems,2015,10():769.[doi:10.11992/tis.201410010]
[5]曾宪华,易荣辉,何姗姗.流形排序的交互式图像分割[J].智能系统学报,2016,11(1):117.[doi:10.11992/tis.201505037]
 ZENG Xianhua,YI Ronghui,HE Shanshan.Interactive image segmentation based on manifold ranking[J].CAAI Transactions on Intelligent Systems,2016,11():117.[doi:10.11992/tis.201505037]
[6]葛园园,许有疆,赵帅,等.自动驾驶场景下小且密集的交通标志检测[J].智能系统学报,2018,13(3):366.[doi:10.11992/tis.201706040]
 GE Yuanyuan,XU Youjiang,ZHAO Shuai,et al.Detection of small and dense traffic signs in self-driving scenarios[J].CAAI Transactions on Intelligent Systems,2018,13():366.[doi:10.11992/tis.201706040]
[7]莫宏伟,汪海波.基于Faster R-CNN的人体行为检测研究[J].智能系统学报,2018,13(6):967.[doi:10.11992/tis.201801025]
 MO Hongwei,WANG Haibo.Research on human behavior detection based on Faster R-CNN[J].CAAI Transactions on Intelligent Systems,2018,13():967.[doi:10.11992/tis.201801025]
[8]宁欣,李卫军,田伟娟,等.一种自适应模板更新的判别式KCF跟踪方法[J].智能系统学报,2019,14(1):121.[doi:10.11992/tis.201806038]
 NING Xin,LI Weijun,TIAN Weijuan,et al.Adaptive template update of discriminant KCF for visual tracking[J].CAAI Transactions on Intelligent Systems,2019,14():121.[doi:10.11992/tis.201806038]
[9]伍鹏瑛,张建明,彭建,等.多层卷积特征的真实场景下行人检测研究[J].智能系统学报,2019,14(2):306.[doi:10.11992/tis.201710019]
 WU Pengying,ZHANG Jianming,PENG Jian,et al.Research on pedestrian detection based on multi-layer convolution feature in real scene[J].CAAI Transactions on Intelligent Systems,2019,14():306.[doi:10.11992/tis.201710019]
[10]刘召,张黎明,耿美晓,等.基于改进的Faster R-CNN高压线缆目标检测方法[J].智能系统学报,2019,14(4):627.[doi:10.11992/tis.201905026]
 LIU Zhao,ZHANG Liming,GENG Meixiao,et al.Object detection of high-voltage cable based on improved Faster R-CNN[J].CAAI Transactions on Intelligent Systems,2019,14():627.[doi:10.11992/tis.201905026]

备注/Memo

收稿日期:2020-02-14。
基金项目:国家重点研发计划项目(2018YFE0204300);北京市科技计划项目(Z191100007419008);国强研究院项目(2019GQG1010)
作者简介:张新钰,研究员,清华猛狮智能车团队负责人,剑桥大学访问学者,主要研究方向为智能驾驶和多模态信息融合。担任国家重点研发计划项目负责人。多次在国内无人驾驶顶级赛事获得冠亚军,获2019年吴文俊人工智能科技进步二等奖,发表智能驾驶领域的SCI/EI检索30篇,入选ESI高被引论文1篇;刘华平,副教授,博士生导师,中国人工智能学会理事,中国人工智能学会认知系统与信息处理专业委员会秘书长,IEEE高级会员,主要研究方向为智能机器人的多模态感知、学习与控制技术。李骏,中国工程院院士,中国汽车工程学会理事长,主要研究方向为智能网联汽车和汽车动力总成,长期主持我国大型汽车企业的产品研发与科技创新工作,在汽车动力总成、新能源汽车和智能网联汽车领域有多项科研成果,曾获国家科技进步一等奖1项、二等奖1项,国家技术发明奖二等奖1项,中国汽车工业科技进步特等奖3项、一等奖2项,国家机械工业科技进步一等奖2项、二等奖1项,2012年荣获何梁何利科学与技术创新奖,获得授权专利9项,发表学术论文98篇,出版专著1部。
通讯作者:刘华平.E-mail:hpliu@tsinghua.edu.cn

更新日期/Last Update: 2020-07-25
Copyright @ 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134