<-Previous Article Next Article->

[1]CHEN Li,MA Nan,PANG Guilin,et al.Research on multi-view data fusion and balanced YOLOv3 for pedestrian detection[J].CAAI Transactions on Intelligent Systems,2021,16(1):57-65.[doi:10.11992/tis.202010003]

Copy

Research on multi-view data fusion and balanced YOLOv3 for pedestrian detection

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 16 Number of periods: 2021 1 Page number: 57-65 Column: 学术论文—机器感知与模式识别 Public date: 2021-01-05

Title:: Research on multi-view data fusion and balanced YOLOv3 for pedestrian detection

Author(s):: CHEN Li¹; MA Nan¹; 2; PANG Guilin³; GAO Yue⁴; LI Jiahong¹; 2; ZHANG Guoping¹; WU Zhixuan¹; YAO Yongqiang¹; 1. Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China;
2. College of Robotics, Beijing Union University, Beijing 100101, China;
3. School of Computer and Information Technology, Beijing Jiaoton

Keywords:: multi-view data; self- supervised learning; feature point matching; feature fusion; YOLOv3 network; balanced feature; complex scene; pedestrian detection

CLC:: TP391

DOI:: 10.11992/tis.202010003

Abstract:: Because of the occlusion and low accuracy of long-distance detection, pedestrian detection in complex scenes is difficult. Therefore, a pedestrian detection method based on multi-view data fusion and balanced YOLOv3 (MVBYOLO) is proposed, including the self-supervised network for multi-view fusion model (Self-MVFM) and balanced YOLOv3 network (BYOLO). Self-MVFM fuses two or more input perspective data through a self-supervised network and incorporates a weighted smoothing algorithm to solve the color difference problem during the fusion; BYOLO uses the same resolution to fuse high- and low-level semantic features to obtain balanced semantic information, thereby enhancing multi-level features and improving the accuracy of pedestrian detection in front of vehicles in complex scenes. A comparative experiment is conducted on the VOC dataset to verify the effectiveness of the proposed method. The final AP value reaches 80.14%. The experimental results indicate that compared with the original YOLOv3 network, the accuracy of the MVBYOLO is increased by 2.89%.

References:: [1] 马楠, 高跃, 李佳洪, 等. 自驾驶中的交互认知[J]. 中国科学:信息科学, 2018, 48(8):1083-1096
MA Nan, GAO Yue, LI Jiahong, et al. Interactive cognition in self-driving[J]. Scientia sinca informationis, 2018, 48(8):1083-1096
[2] LI Deyi, MA Nan, GAO Yue. Future vehicles:learnable wheeled robots[J]. Science China information sciences, 2020, 63(9):193201.
[3] 贲晛烨, 徐森, 王科俊. 行人步态的特征表达及识别综述[J]. 模式识别与人工智能, 2012, 25(1):71-81
BENXianye, XU Sen, WANG Kejun. Review on pedestrian gait feature expression and recognition[J]. PR and AI, 2012, 25(1):71-81
[4] CHEN Li, MA Nan, WANG P, et al. Survey of pedestrian action recognition techniques for autonomous driving[J]. Tsinghua science and technology, 2020, 25(4):458-470.
[5] 赵永强, 饶元, 董世鹏, 等. 深度学习目标检测方法综述[J]. 中国图象图形学报, 2020, 25(4):629-654
ZHAO Yongqiang, RAO Yuan, DONG Shipeng, et al. Survey on deep learning object detection[J]. Journal of image and graphics, 2020, 25(4):629-654
[6] FARENZENA M, BAZZANI L, PERINA A, et al. Person re-identification by symmetry-driven accumulation of local features[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA, 2010:2360-2367.
[7] ANDRILUKA M, ROTH S, SCHIELE B. Pictorial structures revisited:people detection and articulated pose estimation[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA, 2009:1014-1021.
[8] WEN Chao, ZHANG Yinda, LI Zhuwen, et al. Pixel2Mesh++:multi-view 3D mesh generation via deformation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South), 2019:1042-1051.
[9] CHEN Rui, HAN Songfang, XU Jing, et al. Point-based multi-view stereo network[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South), 2019:1538-1547.
[10] YI Hongwei, WEI Zizhuang, DING Mingyu, et al. Pyramid multi-view stereo net with self-adaptive view aggregation[C]//16th European Conference on Computer Vision. Glasgow, UK, 2020:766-782.
[11] YU Changqian, WANG Jingbo, PENG Chao, et al. Bisenet:bilateral segmentation network for real-time semantic segmentation[C]//15th European Conference on Computer Vision. Munich, Germany, 2018:334-349.
[12] SU Hang, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3D shape recognition[C]//2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile, 2015:945-953.
[13] FENG Yifan, ZHANG Zizhao, ZHAO Xibin, et al. GVCNN:group-view convolutional neural networks for 3D shape recognition[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018:264-272.
[14] DONG Junting, JIANG Wen, HUANG Qixing, et al. Fast and robust multi-person 3D pose estimation from multiple views[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA, 2019:7784-7793.
[15] HOU Yunzhong, ZHENG Liang, GOULD S. Multiview detection with feature perspective transformation[C]//16th European Conference on Computer Vision. Glasgow, UK, 2020:1-18.
[16] LIU Hong, XU Tao, WANG Xiangdong, et al. Related HOG features for human detection using cascaded adaboost and SVM classifiers[C]//19th International Conference on Advances in Multimedia Modeling. Huangshan, China, 2013:345-355.
[17] FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al. Object detection with discriminatively trained part-based models[J]. IEEE transactions on pattern analysis and machine intelligence, 2009, 32(9):1627-1645.
[18] KAEWTRAKULPONG P, BOWDEN R. An improved adaptive background mixture model for real-time tracking with shadow detection[M]//REMAGNINO P, JONES G A, PARAGIOS N, et al. Video-Based Surveillance Systems. Boston, MA:Springer, 2002:135-144.
[19] BARNICH O, VAN DROOGENBROECK M. ViBe:a universal background subtraction algorithm for video sequences[J]. IEEE transactions on image processing, 2010, 20(6):1709-1724.
[20] WANG Hanzi, SUTER D. A consensus-based method for tracking:modelling background scenario and foreground appearance[J]. Pattern recognition, 2007, 40(3):1091-1105.
[21] HOFMANN M, TIEFENBACHER P, RIGOLL G. Background segmentation with feedback:the pixel-based adaptive segmenter[C]//2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Providence, USA, 2012:38-43.
[22] KRIZHEVSKY A, SUTSKEVER I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2012:1097-1105.
[23] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014:580-587.
[24] GIRSHICK R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile, 2015:1440-1448.
[25] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[C]//Proceedings of the 2015 Conference and Workshop on Neural Information Processing Systems. Montreal, Canada, 2015:91-99.
[26] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once:unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA, 2016:779-788.
[27] LIU Wei, ANGUELOV D, ERHAN D, et al. SSD:single shot multibox detector[C]//14th European Conference on Computer Vision. Amsterdam, The Netherlands, 2016:21-37.
[28] ZHANG Shifeng, WEN Longyin, BIAN Xiao, et al. Single-shot refinement neural network for object detection[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018:4203-4212.
[29] DETONE D, MALISIEWICZ T, RABINOVICH A. SuperPoint:self-supervised interest point detection and description[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Salt Lake City, USA, 2018:224-236.
[30] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA, 2016:770-778.
[31] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA, 2017:936-944.
[32] CAO Guimei, XIE Xuemei, YANG Wenzhe, et al. Feature-fused SSD:fast detection for small objects[C]//Proceedings Volume 10615, Ninth International Conference on Graphic and Image Processing (ICGIP 2017). Qingdao, China, 2018:106151E.
[33] HUANG Gao, LIU Zhuang, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA, 2017:2261-2269.
[34] HE Kaiming, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy, 2017:2980-2988.

Similar References:

Memo

Last Update: 2021-02-25

Research on multi-view data fusion and balanced YOLOv3 for pedestrian detection PDF DownloadHTML

Memo

Research on multi-view data fusion and balanced YOLOv3 for pedestrian detection

PDF Download HTML