<-上一篇/Previous Article 下一篇/Next Article->

[1]陈丽,马楠,逄桂林,等.多视角数据融合的特征平衡YOLOv3行人检测研究[J].智能系统学报,2021,16(1):57-65.[doi:10.11992/tis.202010003]
　CHEN Li,MA Nan,PANG Guilin,et al.Research on multi-view data fusion and balanced YOLOv3 for pedestrian detection[J].CAAI Transactions on Intelligent Systems,2021,16(1):57-65.[doi:10.11992/tis.202010003]

点击复制

多视角数据融合的特征平衡YOLOv3行人检测研究

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 16 期数: 2021年第1期页码: 57-65 栏目: 学术论文—机器感知与模式识别出版日期: 2021-01-05

Title:: Research on multi-view data fusion and balanced YOLOv3 for pedestrian detection

作者:: 陈丽¹, 马楠^1,2, 逄桂林³, 高跃⁴, 李佳洪^1,2, 张国平¹, 吴祉璇¹, 姚永强¹; 1. 北京联合大学北京市信息服务工程重点实验室，北京 100101;
2. 北京联合大学机器人学院，北京 100101;
3. 北京交通大学计算机与信息技术学院，北京 100044;
4. 清华大学软件学院，北京 100085

Author(s):: CHEN Li¹, MA Nan^1,2, PANG Guilin³, GAO Yue⁴, LI Jiahong^1,2, ZHANG Guoping¹, WU Zhixuan¹, YAO Yongqiang¹; 1. Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China;
2. College of Robotics, Beijing Union University, Beijing 100101, China;
3. School of Computer and Information Technology, Beijing Jiaoton

关键词:: 多视数据; 自监督学习; 特征点匹配; 特征融合; YOLOv3网络; 平衡特征; 复杂场景; 行人检测

Keywords:: multi-view data; self- supervised learning; feature point matching; feature fusion; YOLOv3 network; balanced feature; complex scene; pedestrian detection

分类号:: TP391

DOI:: 10.11992/tis.202010003

摘要:: 针对复杂场景下行人发生遮挡检测困难以及远距离行人检测精确度低的问题，本文提出一种多视角数据融合的特征平衡YOLOv3行人检测模型(MVBYOLO)，包括2部分：自监督学习的多视角特征点融合模型(Self-MVFM)和特征平衡YOLOv3网络(BYOLO)。Self-MVFM对输入的2个及以上的视角数据进行自监督学习特征，通过特征点的匹配实现多视角信息融合，在融合时使用加权平滑算法解决产生的色差问题；BYOLO使用相同分辨率融合高层语义特征和低层细节特征，得到平衡的语义增强多层级特征，提高复杂场景下车辆前方行人检测的精确度。为了验证所提出方法的有效性，在VOC数据集上进行对比实验，最终AP值达到80.14%。与原YOLOv3网络相比，本文提出的MVBYOLO模型精度提高了2.89%。

Abstract:: Because of the occlusion and low accuracy of long-distance detection, pedestrian detection in complex scenes is difficult. Therefore, a pedestrian detection method based on multi-view data fusion and balanced YOLOv3 (MVBYOLO) is proposed, including the self-supervised network for multi-view fusion model (Self-MVFM) and balanced YOLOv3 network (BYOLO). Self-MVFM fuses two or more input perspective data through a self-supervised network and incorporates a weighted smoothing algorithm to solve the color difference problem during the fusion; BYOLO uses the same resolution to fuse high- and low-level semantic features to obtain balanced semantic information, thereby enhancing multi-level features and improving the accuracy of pedestrian detection in front of vehicles in complex scenes. A comparative experiment is conducted on the VOC dataset to verify the effectiveness of the proposed method. The final AP value reaches 80.14%. The experimental results indicate that compared with the original YOLOv3 network, the accuracy of the MVBYOLO is increased by 2.89%.

参考文献/References:: [1] 马楠, 高跃, 李佳洪, 等. 自驾驶中的交互认知[J]. 中国科学:信息科学, 2018, 48(8):1083-1096
MA Nan, GAO Yue, LI Jiahong, et al. Interactive cognition in self-driving[J]. Scientia sinca informationis, 2018, 48(8):1083-1096
[2] LI Deyi, MA Nan, GAO Yue. Future vehicles:learnable wheeled robots[J]. Science China information sciences, 2020, 63(9):193201.
[3] 贲晛烨, 徐森, 王科俊. 行人步态的特征表达及识别综述[J]. 模式识别与人工智能, 2012, 25(1):71-81
BENXianye, XU Sen, WANG Kejun. Review on pedestrian gait feature expression and recognition[J]. PR and AI, 2012, 25(1):71-81
[4] CHEN Li, MA Nan, WANG P, et al. Survey of pedestrian action recognition techniques for autonomous driving[J]. Tsinghua science and technology, 2020, 25(4):458-470.
[5] 赵永强, 饶元, 董世鹏, 等. 深度学习目标检测方法综述[J]. 中国图象图形学报, 2020, 25(4):629-654
ZHAO Yongqiang, RAO Yuan, DONG Shipeng, et al. Survey on deep learning object detection[J]. Journal of image and graphics, 2020, 25(4):629-654
[6] FARENZENA M, BAZZANI L, PERINA A, et al. Person re-identification by symmetry-driven accumulation of local features[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA, 2010:2360-2367.
[7] ANDRILUKA M, ROTH S, SCHIELE B. Pictorial structures revisited:people detection and articulated pose estimation[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA, 2009:1014-1021.
[8] WEN Chao, ZHANG Yinda, LI Zhuwen, et al. Pixel2Mesh++:multi-view 3D mesh generation via deformation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South), 2019:1042-1051.
[9] CHEN Rui, HAN Songfang, XU Jing, et al. Point-based multi-view stereo network[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South), 2019:1538-1547.
[10] YI Hongwei, WEI Zizhuang, DING Mingyu, et al. Pyramid multi-view stereo net with self-adaptive view aggregation[C]//16th European Conference on Computer Vision. Glasgow, UK, 2020:766-782.
[11] YU Changqian, WANG Jingbo, PENG Chao, et al. Bisenet:bilateral segmentation network for real-time semantic segmentation[C]//15th European Conference on Computer Vision. Munich, Germany, 2018:334-349.
[12] SU Hang, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3D shape recognition[C]//2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile, 2015:945-953.
[13] FENG Yifan, ZHANG Zizhao, ZHAO Xibin, et al. GVCNN:group-view convolutional neural networks for 3D shape recognition[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018:264-272.
[14] DONG Junting, JIANG Wen, HUANG Qixing, et al. Fast and robust multi-person 3D pose estimation from multiple views[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA, 2019:7784-7793.
[15] HOU Yunzhong, ZHENG Liang, GOULD S. Multiview detection with feature perspective transformation[C]//16th European Conference on Computer Vision. Glasgow, UK, 2020:1-18.
[16] LIU Hong, XU Tao, WANG Xiangdong, et al. Related HOG features for human detection using cascaded adaboost and SVM classifiers[C]//19th International Conference on Advances in Multimedia Modeling. Huangshan, China, 2013:345-355.
[17] FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al. Object detection with discriminatively trained part-based models[J]. IEEE transactions on pattern analysis and machine intelligence, 2009, 32(9):1627-1645.
[18] KAEWTRAKULPONG P, BOWDEN R. An improved adaptive background mixture model for real-time tracking with shadow detection[M]//REMAGNINO P, JONES G A, PARAGIOS N, et al. Video-Based Surveillance Systems. Boston, MA:Springer, 2002:135-144.
[19] BARNICH O, VAN DROOGENBROECK M. ViBe:a universal background subtraction algorithm for video sequences[J]. IEEE transactions on image processing, 2010, 20(6):1709-1724.
[20] WANG Hanzi, SUTER D. A consensus-based method for tracking:modelling background scenario and foreground appearance[J]. Pattern recognition, 2007, 40(3):1091-1105.
[21] HOFMANN M, TIEFENBACHER P, RIGOLL G. Background segmentation with feedback:the pixel-based adaptive segmenter[C]//2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Providence, USA, 2012:38-43.
[22] KRIZHEVSKY A, SUTSKEVER I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2012:1097-1105.
[23] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014:580-587.
[24] GIRSHICK R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile, 2015:1440-1448.
[25] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[C]//Proceedings of the 2015 Conference and Workshop on Neural Information Processing Systems. Montreal, Canada, 2015:91-99.
[26] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once:unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA, 2016:779-788.
[27] LIU Wei, ANGUELOV D, ERHAN D, et al. SSD:single shot multibox detector[C]//14th European Conference on Computer Vision. Amsterdam, The Netherlands, 2016:21-37.
[28] ZHANG Shifeng, WEN Longyin, BIAN Xiao, et al. Single-shot refinement neural network for object detection[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018:4203-4212.
[29] DETONE D, MALISIEWICZ T, RABINOVICH A. SuperPoint:self-supervised interest point detection and description[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Salt Lake City, USA, 2018:224-236.
[30] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA, 2016:770-778.
[31] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA, 2017:936-944.
[32] CAO Guimei, XIE Xuemei, YANG Wenzhe, et al. Feature-fused SSD:fast detection for small objects[C]//Proceedings Volume 10615, Ninth International Conference on Graphic and Image Processing (ICGIP 2017). Qingdao, China, 2018:106151E.
[33] HUANG Gao, LIU Zhuang, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA, 2017:2261-2269.
[34] HE Kaiming, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy, 2017:2980-2988.

相似文献/References:: [1]杜航原,张晶,王文剑.一种深度自监督聚类集成算法[J].智能系统学报,2020,15(6):1113.[doi:10.11992/tis.202006050]
　DU Hangyuan,ZHANG Jing,WANG Wenjian.A deep self-supervised clustering ensemble algorithm[J].CAAI Transactions on Intelligent Systems,2020,15():1113.[doi:10.11992/tis.202006050]
[2]尹宝才,张超辉,胡永利,等.基于图嵌入的自适应多视降维方法[J].智能系统学报,2021,16(5):963.[doi:10.11992/tis.202105021]
　YIN Baocai,ZHANG Chaohui,HU Yongli,et al.An adaptive multi-view dimensionality reduction method based on graph embedding[J].CAAI Transactions on Intelligent Systems,2021,16():963.[doi:10.11992/tis.202105021]
[3]丁维昌,施俊,王骏.自监督对比特征学习的多模态乳腺超声诊断[J].智能系统学报,2023,18(1):66.[doi:10.11992/tis.202111052]
　DING Weichang,SHI Jun,WANG Jun.Multi-modality ultrasound diagnosis of the breast with self-supervised contrastive feature learning[J].CAAI Transactions on Intelligent Systems,2023,18():66.[doi:10.11992/tis.202111052]
[4]黎建宇,詹志辉.面向大规模特征选择的自监督数据驱动粒子群优化算法[J].智能系统学报,2023,18(1):194.[doi:10.11992/tis.202206008]
　LI Jianyu,ZHAN Zhihui.A self-supervised data-driven particle swarm optimization approach for large-scale feature selection[J].CAAI Transactions on Intelligent Systems,2023,18():194.[doi:10.11992/tis.202206008]
[5]刘嘉轩,胡非易,张辉,等.上下文空间与实例信息的皮肤镜图像自监督分类[J].智能系统学报,2023,18(4):783.[doi:10.11992/tis.202211010]
　LIU Jiaxuan,HU Feiyi,ZHANG Hui,et al.Dermoscopic images classification based on context and instance-level feature of self-supervised learning[J].CAAI Transactions on Intelligent Systems,2023,18():783.[doi:10.11992/tis.202211010]

备注/Memo

收稿日期:2020-10-07。
基金项目:国家自然科学基金项目(61871038, 61931012, 6183034)；军委装备发展部共性预研计划项目(41412040302)；北京联合大学“人才强校优选计划”领军计划(BPHR2020AZ02)；北京联合大学研究生科研创新资助项目(YZ2020K001)
作者简介:陈丽，硕士研究生，主要研究方向为多视角数据融合、行人动作识别;马楠，教授，博士，主要研究方向为交互认知、知识发现与智能系统，带领团队分别在2018、2019、2020WIC世界无人驾驶挑战赛虚拟场景赛项获得冠军(领军奖)。授权发明专利7项、软件著作权13项。发表学术论文50余篇，主编专著和教材3部;逄桂林，硕士研究生，主要研究方向为计算机视觉、车道线检测
通讯作者:马楠. E-mail：xxtmanan@buu.edu.cn

更新日期/Last Update: 2021-02-25

多视角数据融合的特征平衡YOLOv3行人检测研究 PDF下载HTML

备注/Memo

多视角数据融合的特征平衡YOLOv3行人检测研究

PDF下载 HTML