[1]张新钰,邹镇洪,李志伟,等.面向自动驾驶目标检测的深度多模态融合技术[J].智能系统学报,2020,15(4):758-771.[doi:10.11992/tis.202002010]
 ZHANG Xinyu,ZOU Zhenhong,LI Zhiwei,et al.Deep multi-modal fusion in object detection for autonomous driving[J].CAAI Transactions on Intelligent Systems,2020,15(4):758-771.[doi:10.11992/tis.202002010]
点击复制

面向自动驾驶目标检测的深度多模态融合技术(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第15卷
期数:
2020年4期
页码:
758-771
栏目:
吴文俊人工智能科学技术奖论坛
出版日期:
2020-10-30

文章信息/Info

Title:
Deep multi-modal fusion in object detection for autonomous driving
作者:
张新钰12 邹镇洪12 李志伟12 刘华平3 李骏12
1. 清华大学 汽车安全与节能国家重点实验室,北京 100084;
2. 清华大学 车辆与运载学院,北京 100084;
3. 清华大学 计算机科学与技术系,北京 100084
Author(s):
ZHANG Xinyu12 ZOU Zhenhong12 LI Zhiwei12 LIU Huaping3 LI Jun12
1. State Key Laboratory of Automotive Safety and Energy, Tsinghua University, Beijing 100084, China;
2. School of Vehicle and Mobility, Tsinghua University, Beijing 100084, China;
3. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
关键词:
数据融合目标检测自动驾驶深度学习多模态感知计算机视觉传感器综述
Keywords:
data fusionobject detectionautonomous drivingdeep learningmultimodalperceptioncomputer visionsensorsurvey
分类号:
TP274;TP212
DOI:
10.11992/tis.202002010
摘要:
研究者关注利用多个传感器来提升自动驾驶中目标检测模型的准确率,因此对目标检测中的数据融合方法进行研究具有重要的学术和应用价值。为此,本文总结了近年来自动驾驶中深度目标检测模型中的数据融合方法。首先介绍了自动驾驶中深度目标检测技术和数据融合技术的发展,以及已有的研究综述;接着从多模态目标检测、数据融合的层次、数据融合的计算方法3个方面展开阐述,全面展现了该领域的前沿进展;此外,本文提出了数据融合的合理性分析,从方法、鲁棒性、冗余性3个角度对数据融合方法进行了讨论;最后讨论了融合方法的一些公开问题,并从挑战、策略和前景等方面作了总结。
Abstract:
In autonomous driving, there has been an increasing interest in utilizing multiple sensors to improve the accuracy of object detection models. Accordingly, the research on data fusion has important academic and application value. This paper summarizes the data fusion methods in deep object detection models of autonomous driving in recent years. The paper first introduces the development of deep object detection and data fusion in autonomous driving, as well as existing researches and reviews, then expounds from three aspects of multi-modal object detection, fusion levels and calculation methods, comprehensively showing the cutting-edge progress in this field. In addition, this paper proposes a rationality analysis of data fusion from another three perspectives: methods, robustness and redundancy. Finally, open issues are discussed, and the challenges, strategy and prospects are summarized.

参考文献/References:

[1] URMSON C, ANHALT J, BAGNELL D, et al. Autonomous driving in urban environments: boss and the urban challenge[J]. Journal of field robotics, 2008, 25(8): 425-466.
[2] EVERINGHAM M, VANGOOL L, WILLIAMS C K I, et al. The PASCAL visual object classes challenge 2007 Results[EB/OL]. http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html.
[3] HILLEL A B, LERNER R, LEVI D, et al. Recent progress in road and lane detection: a survey[J]. Machine vision applications, 2014, 25(3): 727-745.
[4] FENG D, HAASE-SCHUETZ C, ROSENBAUM L, et al. Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges[J]. arXiv preprint arXiv: 1902.07830, 2019.
[5] 罗俊海, 杨阳. 基于数据融合的目标检测方法综述[J]. 控制与决策, 2020, 35(1): 1-15
LUO Junhai, YANG Yang. An overview of target detection methods based on data fusion[J]. Control and decision, 2020, 35(1): 1-15
[6] HARIHARAN B, ARBELáEZ P, GIRSHICK R, et al. Simultaneous detection and segmentation[C]//European Conference on Computer vision. Zurich, Switzerland, 2014: 297-312.
[7] KANG K, LI H, YAN J, et al. T-CNN: tubelets with convolutional neural networks for object detection from videos[J]. IEEE transactions on circuits and systems for video technology, 2018, 28(10): 2896-2907.
[8] ZOU Z, SHI Z, GUO Y, et al. Object detection in 20 years: a survey[J]. arXiv preprint arXiv: 1905.05055, 2019.
[9] ARNOLD E, AL-JARRAH O Y, DIANATI M, et al. A survey on 3D object detection methods for autonomous driving applications[J]. IEEE transactions on intelligent transportation systems, 2019, 20(10): 3782-3795.
[10] MEES O, EITEL A, BURGARD W. Choosing smartly: Adaptive multimodal fusion for object detection in changing environments[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Daejeon, South Korea, 2016: 151-156.
[11] EITEL A, SPRINGENBERG J T, SPINELLO L, et al. Multimodal deep learning for robust RGB-D object recognition[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg, Germany, 2015: 681-687.
[12] YU J, JIANG Y, WANG Z, et al. UnitBox: an advanced object detection network[C]//ACM International Conference on Multimedia. Amsterdam, Netherlands, 2016: 516-520.
[13] REZATOFIGHI H, TSOI N, GWAK J, at al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA, 2019: 658-666.
[14] REDMON J, FARHADI A. YOLOv3: An incremental improvement[J]. arXiv preprint arXiv: 1804.02767, 2018.
[15] REN S, HE K, GIRSHICK R, et al. Faster RCNN: towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems. Montreal, Canada, 2015: 91-99.
[16] YANG W, ZHANG X, TIAN Y, at al. Deep learning for single image super-resolution: a brief review[J]. IEEE transactions on multimedia, 2019, 21(12): 3106-3121.
[17] LU Y, LU C, TANG C K. Online video object detection using association LSTM[C]//IEEE International Conference on Computer Vision. Venice, Italy, 2017: 2363-2371.
[18] WANG S, ZHOU Y, YAN J, at al. Fully motion-aware network for video object detection[C]//The European Conference on Computer Vision. Munich, Germany, 2018: 557-573.
[19] GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]//IEEE Conference on Computer Vision and Pattern Recognition. RI, USA, 2012: 3354-3361.
[20] SUN P, KRETZSCHMAR H, DOTIWALLA X, et al. Scalability in perception for autonomous driving: Waymo open dataset[C]//IEEE Conference on Computer Vision and Pattern Recognition. Virtual, 2020: 2446-2454.
[21] CAESAR H, BANKITI V, LANG A H, et al. NuScenes: A multimodal dataset for autonomous driving[J].arXiv preprint arXiv:1903.11027, 2019.
[22] HUANG X, CHENG X, GENG Q, et al. The Apolloscape dataset for autonomous driving[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018: 954-960.
[23] CHEN X, MA H, WAN J, et al. Multi-view 3D object detection network for autonomous driving[C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, 2017: 1907-1915.
[24] MUR-ARTAL R, TARDOS J D. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE transactions on robotics, 2017, 33(5): 1255-1262.
[25] CHADWICK S, MADDERN W, NEWMAN P. Distant vehicle detection using radar and vision[C]//International Conference on Robotics and Automation. Montreal, Canada, 2019: 8311-8317.
[26] BIJELIC M, GRUBER T. Seeing through fog without seeing fog: deep sensor fusion in the absence of labeled training data[C]//IEEE Conference on Computer Vision and Pattern Recognition. Virtual, 2020: 11621-11631.
[27] DU X, ANG M H, KARAMAN S, et al. A general pipeline for 3D detection of vehicles[C]//IEEE international Conference on Robotics and Automation. Brisbane, Australia, 2018: 3194-3200.
[28] BANERJEE K, NOTZ D, WINDELEN J, et al. Online camera lidar fusion and object detection on hybrid data for autonomous driving[C]//IEEE Intelligent Vehicles Symposium. Changshu, China, 2018: 1632-1638.
[29] LIU J, ZHANG S, WANG S, et al. Multispectral deep neural networks for pedestrian detection[C]//British Machine Vision Conference. York, UK, 2016: 1-13.
[30] FISCHER V, HERMAN M, BEHNKE S. Multispectral pedestrian detection using deep fusion convolutional neural networks[C]//European Symposium on artificial Neural Networks. Bruges, Belgium, 2016: 27-29.
[31] VIOLA P, JONES M. Rapid object detection using a boosted cascade of simple features[C]//IEEE Conference on Computer Vision and Pattern Recognition. Kauai, USA, 2001: 511-518.
[32] VIOLA P, JONES M. Robust real-time face detection[J]. International journal of computer vision, 2004, 57: 137-154.
[33] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. San Diego, USA, 2005: 886-893.
[34] FELZENSZWALB P, MCALLESTER D, RAMANAN D. A discriminatively trained, multiscale, deformable part model[C]//IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA, 2008: 1-8.
[35] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Advances in neural Information Processing Systems. Lake Tahoe, USA, 2012: 1097-1105.
[36] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014: 580-587.
[37] GIRSHICK R. Fast R-CNN[C]//IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 1440-1448
[38] GIRSHICK R, DONAHUE J, DARRELL T, et al. Region-based convolutional networks for accurate object detection and segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 38(1): 142-158.
[39] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, 2017: 2117-2125.
[40] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016: 779-788
[41] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//European Conference on Computer Vision. Amsterdam, Netherlands, 2016: 21-37.
[42] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, 2017: 7263-7271.
[43] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE international conference on computer vision, 2017: 2980-2988.
[44] MEYER G P, LADDHA A, KEE E, et al. LaserNet: an efficient probabilistic 3D object detector for autonomous driving[C]//IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA, 2019: 12677-12686.
[45] QI C R, SU H, MO K, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, 2017: 652-660.
[46] QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]//Advances in Neural Information Processing Systems. Long Beach, USA, 2017: 5099-5108.
[47] YANG B, LUO W, URTASUN R. PIXOR: real-time 3D object detection from point clouds[C]//IEEE Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018: 7652-7660.
[48] ASVADI A, GARROTE L, PREMEBIDA C, et al. Multimodal vehicle detection: fusing 3D LIDAR and color camera data[J]. Pattern recognition letters, 2018, 115: 20-29.
[49] SCHLOSSER J, CHOW C K, KIRA Z. Fusing LIDAR and images for pedestrian detection using convolutional neural networks[C]//IEEE International Conference on Robotics and Automation. Stockholm, Sweden, 2016: 2198-2205.
[50] QI C R, GUIBAS L J. Frustum PointNets for 3D object detection from RGB-D data[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018: 918-927.
[51] GUAN D, CAO Y, YANG J, et al. Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection[J]. Information fusion, 2018, 50: 148-157.
[52] YANG B, LIANG M, URTASUN R, at al. HDNET: exploiting HD maps for 3D object detection[J]. Proceedings of machine learning research, 2018, 87: 146-155.
[53] ZHOU T, JIANG K, XIAO Z, et al. Object detection using multi-sensor fusion based on deep learning[C]//COTA International Conference of Transportation. Nanjing, China, 2019: 5770-5782.
[54] CHO H, SEO Y W, KUMAR B. A multi-sensor fusion system for moving object detection and tracking in urban driving environments[C]//IEEE International Conference on Robotics and Automation. Hong Kong, China, 2014: 1836-1843.
[55] DOU J, XUE J, FANG J. SEG-VoxelNet for 3D vehicle detection from RGB and lidar data[C]//International Conference on Robotics and Automation. Montreal, Canada, 2019: 4362-4368.
[56] LIANG M, YANG B, CHEN Y, et al. Multi-task multi-sensor fusion for 3D object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA, 2019: 7345-7353.
[57] SINDAGI V A, ZHOU Y, TUZEL O. MVX-Net: multimodal VoxelNet for 3D object detection[C]//2019 International Conference on Robotics and Automation. Montreal, Canada, 2019: 7276-7282.
[58] LIANG M, YANG B, WANG S, et al. Deep continuous fusion for multi-sensor 3D object detection[C]//The European Conference on Computer Vision. Munich, Germany, 2018: 641-656.
[59] WANG Z, ZHAN W, TOMIZUKA M. Fusing bird’s eye view LIDAR point cloud and front view camera image for deep object detection[C]//IEEE Intelligent Vehicles Symposium. Changshu, China, 2018: 1-6.
[60] KIM J, KOH J, KIM Y, et al. Robust deep multi-modal learning based on gated information fusion network[C]//Asian Conference on Computer Vision. Perth, Australia, 2018: 90-106.
[61] CASAS S, LUO W, URTASUN R. IntentNet: learning to predict intention from raw sensor data[J]. Proceedings of machine learning research, 2018, 87: 947-956.
[62] KU J, MOZIFIAN M, LEE J, et al. Joint 3D proposal generation and object detection from view aggregation[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid, Spain, 2018: 5750-5757.
[63] PFEUFFER A, DIETMAYER K. Optimal sensor data fusion architecture for object detection in adverse weather conditions[C]//International Conference on Information Fusion. Cambridge, UK, 2018: 1-8.
[64] XU D, ANGUELOV D, JAIN A. PointFusion: deep sensor fusion for 3D bounding box estimation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018: 244-253.
[65] DU X, ANG M H, RUS D. Car detection for autonomous vehicle: lidar and vision fusion approach through deep learning framework[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver, Canada, 2017: 749-754.
[66] MATTI D, EKENEL H K, THIRAN J. Combining LiDAR space clustering and convolutional neural networks for pedestrian detection[C]//IEEE International Conference on Advanced Video and Signal Based Surveillance. Lecce, Italy, 2017: 1-6.
[67] SCHNEIDER L, JASCH M. Multimodal neural networks: RGB-D for semantic segmentation and object detection[C]//Scandinavian Conference on Image Analysis. Norrk?ping, Sweden, 2017: 98-109.
[68] OH S, KANG H. Object detection and classification by decision-level fusion for intelligent vehicle systems[J]. Sensors (Basel), 2017, 17(1): 207-214.
[69] KIM T, GHOSH J. Robust detection of nonmotorized road users using deep learning on optical and lidar data[C]//IEEE Iinternational Conference on Intelligent Transportation Systems. Rio de Janeiro, Brazil, 2016: 271-276.
[70] BAI M, MATTYUS G, HOMAYOUNFAR N, et al. Deep multi-sensor lane detection[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid, Spain, 2018: 3102-3109.
[71] CALTAGIRONE L, BELLONE M, SVENSSON L, et al. LIDAR-camera fusion for road detection using fully convolutional neural networks[J]. Robotics and autonomous systems, 2019, 111: 125-131.
[72] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780.
[73] ZHANG X, ZHOU M, LIU H, et al. A cognitively-inspired system architecture for the Mengshi cognitive vehicle[J]. Cognitive computation, 2020, 12(1): 140-149.
[74] ZOU Q, JIANG H, DAI Q, et al. Robust lane detection from continuous driving scenes using deep neural networks[J]. IEEE transactions on vehicular technology, 2020, 69(1): 41-54.
[75] ZHANG X, GAO H, GUO M, et al. A study on key technologies of unmanned driving[J]. CAAI transactions on intelligence technology, 2016, 1(1): 4-13.

相似文献/References:

[1]毕晓君,张艳双.基于免疫算法的无线传感器网络路由算法[J].智能系统学报,2009,4(01):67.
 BI Xiao-jun,ZHANG Yan-shuang.A routing algorithm for wireless sensor networks based on an immune algorithm[J].CAAI Transactions on Intelligent Systems,2009,4(4):67.
[2]胡光龙,秦世引.动态成像条件下基于SURF和Mean shift的运动目标高精度检测[J].智能系统学报,2012,7(01):61.
 HU Guanglong,QIN Shiyin.High precision detection of a mobile object under dynamic imaging based on SURF and Mean shift[J].CAAI Transactions on Intelligent Systems,2012,7(4):61.
[3]韩峥,刘华平,黄文炳,等.基于Kinect的机械臂目标抓取[J].智能系统学报,2013,8(02):149.[doi:10.3969/j.issn.1673-4785.201212038]
 HAN Zheng,LIU Huaping,HUANG Wenbing,et al.Kinect-based object grasping by manipulator[J].CAAI Transactions on Intelligent Systems,2013,8(4):149.[doi:10.3969/j.issn.1673-4785.201212038]
[4]韩延彬,郭晓鹏,魏延文,等.RGB和HSI颜色空间的一种改进的阴影消除算法[J].智能系统学报,2015,10(5):769.[doi:10.11992/tis.201410010]
 HAN Yanbin,GUO Xiaopeng,WEI Yanwen,et al.An improved shadow removal algorithm based on RGB and HSI color spaces[J].CAAI Transactions on Intelligent Systems,2015,10(4):769.[doi:10.11992/tis.201410010]
[5]曾宪华,易荣辉,何姗姗.流形排序的交互式图像分割[J].智能系统学报,2016,11(1):117.[doi:10.11992/tis.201505037]
 ZENG Xianhua,YI Ronghui,HE Shanshan.Interactive image segmentation based on manifold ranking[J].CAAI Transactions on Intelligent Systems,2016,11(4):117.[doi:10.11992/tis.201505037]
[6]葛园园,许有疆,赵帅,等.自动驾驶场景下小且密集的交通标志检测[J].智能系统学报,2018,13(03):366.[doi:10.11992/tis.201706040]
 GE Yuanyuan,XU Youjiang,ZHAO Shuai,et al.Detection of small and dense traffic signs in self-driving scenarios[J].CAAI Transactions on Intelligent Systems,2018,13(4):366.[doi:10.11992/tis.201706040]
[7]莫宏伟,汪海波.基于Faster R-CNN的人体行为检测研究[J].智能系统学报,2018,13(06):967.[doi:10.11992/tis.201801025]
 MO Hongwei,WANG Haibo.Research on human behavior detection based on Faster R-CNN[J].CAAI Transactions on Intelligent Systems,2018,13(4):967.[doi:10.11992/tis.201801025]
[8]宁欣,李卫军,田伟娟,等.一种自适应模板更新的判别式KCF跟踪方法[J].智能系统学报,2019,14(01):121.[doi:10.11992/tis.201806038]
 NING Xin,LI Weijun,TIAN Weijuan,et al.Adaptive template update of discriminant KCF for visual tracking[J].CAAI Transactions on Intelligent Systems,2019,14(4):121.[doi:10.11992/tis.201806038]
[9]伍鹏瑛,张建明,彭建,等.多层卷积特征的真实场景下行人检测研究[J].智能系统学报,2019,14(02):306.[doi:10.11992/tis.201710019]
 WU Pengying,ZHANG Jianming,PENG Jian,et al.Research on pedestrian detection based on multi-layer convolution feature in real scene[J].CAAI Transactions on Intelligent Systems,2019,14(4):306.[doi:10.11992/tis.201710019]
[10]刘召,张黎明,耿美晓,等.基于改进的Faster R-CNN高压线缆目标检测方法[J].智能系统学报,2019,14(04):627.[doi:10.11992/tis.201905026]
 LIU Zhao,ZHANG Liming,GENG Meixiao,et al.Object detection of high-voltage cable based on improved Faster R-CNN[J].CAAI Transactions on Intelligent Systems,2019,14(4):627.[doi:10.11992/tis.201905026]

备注/Memo

备注/Memo:
收稿日期:2020-02-14。
基金项目:国家重点研发计划项目(2018YFE0204300);北京市科技计划项目(Z191100007419008);国强研究院项目(2019GQG1010)
作者简介:张新钰,研究员,清华猛狮智能车团队负责人,剑桥大学访问学者,主要研究方向为智能驾驶和多模态信息融合。担任国家重点研发计划项目负责人。多次在国内无人驾驶顶级赛事获得冠亚军,获2019年吴文俊人工智能科技进步二等奖,发表智能驾驶领域的SCI/EI检索30篇,入选ESI高被引论文1篇;刘华平,副教授,博士生导师,中国指挥与控制学会青年工作委员会副主任,中国人工智能学会理事,中国人工智能学会认知系统与信息处理专业委员会秘书长,IEEE高级会员,主要研究方向为智能机器人的多模态感知、学习与控制技术。国家杰出青年基金获得者,获中国指挥与控制学会曙光创新奖和国家高技术研究发展计划(863计划)“十二五”科技攻关“青年创新之星”,以及IEEE仪器与测量协会(IMS)颁发的Andy Chi Best Paper Award;李骏,中国工程院院士,中国汽车工程学会理事长,主要研究方向为智能网联汽车和汽车动力总成,长期主持我国大型汽车企业的产品研发与科技创新工作,在汽车动力总成、新能源汽车和智能网联汽车领域有多项科研成果,曾获国家科技进步一等奖1项、二等奖1项,国家技术发明奖二等奖1项,中国汽车工业科技进步特等奖3项、一等奖2项,国家机械工业科技进步一等奖2项、二等奖1项,2012年荣获何梁何利科学与技术创新奖,获得授权专利9项,发表学术论文98篇,出版专著1部。
通讯作者:刘华平.E-mail:hpliu@tsinghua.edu.cn
更新日期/Last Update: 2020-07-25