<-Previous Article Next Article->

[1]SHEN Tianxiao,HAN Yiyuan,HAN Bing,et al.Recognition of driver’s eye movement based on the human visual cortex two-stream model[J].CAAI Transactions on Intelligent Systems,2022,17(1):41-49.[doi:10.11992/tis.202106051]

Copy

Recognition of driver’s eye movement based on the human visual cortex two-stream model

PDF Download HTML

CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume: 17 Number of periods: 2022 1 Page number: 41-49 Column: 学术论文—机器学习 Public date: 2022-01-05

Title:: Recognition of driver’s eye movement based on the human visual cortex two-stream model

Author(s):: SHEN Tianxiao¹; HAN Yiyuan¹; HAN Bing¹; GAO Xinbo²; 1. School of Electronic Engineering, Xidian University, Xi’an 710071, China;
2. Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Keywords:: eye movement video dataset; action recognition; deep learning; road safety; aided driving; eye tracking; human visual system; behavioral research

CLC:: TP391

DOI:: 10.11992/tis.202106051

Abstract:: Drivers’ dangerous actions will increase the incidence of traffic accidents. The current researches on driver’s action are based on facial recognition to recognize abnormal actions, such as fatigue driving, cell phone usage. These methods only classify drivers’ actions objectively and ignore their subjective thoughts during driving. The eye tracker is a device that can record and analyze driver’s eye movement effectively, understand their thoughts clearly and summarize their visual cognition patterns. There is no dataset for driver’s eye movement currently. Therefore, this paper first builds a eye movement video dataset named VIPDAR_5 applicable in real road scenes. Compared with traditional dataset, it contains more camera motion, illumination change, and sight occlusion situations. Therefore, the TWNet model based on two channels of the human visual cortex is built in this paper, which can improve recognition performance by simulating human visual mechanisms. On the other hand, adaptive max-pooling layer and channel weight setting are added to reduce parameters and improve recognition accuracy. Experimental results on the VIPDAR_5 dataset indicate that the model proposed in this paper can effectively recognize drivers’ eye movement in comparison with existing methods.

References:: [1] 国家统计局. 中华人民共和国2019年国民经济和社会发展统计公报[N]. 人民日报, 2020-02-29(5).
[2] JAIN D K, JAIN R, LAN Xiangyuan, et al. Driver distraction detection using capsule network[J]. Neural computing and applications, 2021, 33(11): 6183–6196.
[3] LE T H N, ZHENG Yutong, ZHU Chenchen, et al. Multiple scale faster-RCNN approach to driver’s cell-phone usage and hands on steering wheel detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops. New York, USA: IEEE, 2016: 46?53.
[4] 王荣本, 郭克友, 储江伟, 等. 适用驾驶员疲劳状态监测的人眼定位方法研究[J]. 公路交通科技, 2003(5): 111–114
WANG Rongben, GUO Keyou, CHU Jiangwei, et al. Study on the eye location method in driver fatigue state surveillance[J]. Journal of highway and transportation research and development, 2003(5): 111–114
[5] 张杰. 基于眼动仪的驾驶员视点分布特性研究[J]. 湖南交通科技, 2012, 38(4): 153–155,170
ZHANG Jie. Driver’s viewpoint distribution based on the eye tracker[J]. Hunan communication science and technology, 2012, 38(4): 153–155,170
[6] 袁伟, 徐远新, 郭应时, 等. 车道变换与直行时的驾驶人注视转移特性[J]. 长安大学学报(自然科学版), 2015, 35(5): 124–130
YUAN Wei, XU Yuanxin, GUO Yingshi, et al. Fixation transfer characteristics of drivers during lane change and straight drive[J]. Journal of chang’an university (natural science edition), 2015, 35(5): 124–130
[7] MISHKIN M, UNGERLEIDER L G, MACKO K A. Object vision and spatial vision: two cortical pathways[J]. Trends in neurosciences, 1983, 6: 414–417.
[8] KOOTSTRA G, DE BOER B, SCHOMAKER L R B. Predicting eye fixations on complex visual stimuli using local symmetry[J]. Cognitive computation, 2011, 3(1): 223–240.
[9] SOOMRO K, ZAMIR A R, SHAH M. UCF101: a dataset of 101 human actions classes from videos in the wild[EB/OL]. (2012-12-1) [2021-05-30]. https://arxiv.org/abs/1212.0402.
[10] KAY W, CARREIRA J, SIMONYAN K, et al. The kinetics human action video dataset[EB/OL]. (2017-05-19) [2021-05-30]. https://arxiv.org/abs/1705.06950.
[11] SIGURDSSON G A, GUPTA A, SCHMID C, et al. Charades-ego: a large-scale dataset of paired third and first person videos[EB/OL]. (2018-04-30) [2021-05-30]. https://arxiv.org/abs/1804.09626.
[12] DAMEN Dima, DOUGHTY H, FARINELLA G M, et al. Scaling egocentric vision: the dataset[M]//Computer Vision – ECCV 2018. Cham: Springer International Publishing, 2018: 753?771.
[13] JIANG Lai, XU Mai, WANG Zulin. Predicting video saliency with object-to-motion CNN and two-layer convolutional LSTM[EB/OL]. (2017-09-19) [2021-06-30]. https://arxiv.org/abs/1709.06316.
[14] Li Y, Liu M, Rehg J M. In the eye of beholder: Joint learning of gaze and actions in first person video[C]//2018 European Conference on Computer Vision. Berlin, German: Springer, 2018: 619?635.
[15] YIN Li, YE Zhefan, REHG J M. Delving into egocentric actions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2015: 287?295.
[16] MATHE S, SMINCHISESCU C. Actions in the eye: dynamic gaze datasets and learnt saliency models for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(7): 1408–1424.
[17] MARSZALEK M, LAPTEV I, SCHMID C. Actions in context[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2009: 2929?2936.
[18] RODRIGUEZ M. Spatio-temporal maximum average correlation height templates in action recognition and video summarization[EB/OL]. (2013-12-10) [2021-06-30]. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.221.5006.
[19] JUDD T, EHINGER K, DURAND F, et al. Learning to predict where humans look[C]//2009 IEEE 12th International Conference on Computer Vision. New York, USA: IEEE, 2009: 2106?2113.
[20] PAPADOPOULOS D P, CLARKE A D F, KELLER F, et al. Training object class detectors from eye tracking data[C]//Computer vision–ECCV 2014. Berlin, German: Springer, 2014: 361?376.
[21] EVERINGHAM M, GOOL L V, WILLIAMS C K I, et al. The pascal visual object classes (VOC) challenge[J]. International journal of computer vision, 2010, 88(2): 303–338.
[22] JI Shuiwang, XU Wei, YANG Ming, et al. 3D convolutional neural networks for human action recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(1): 221–231.
[23] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[EB/OL]. (2014-11-12) [2021-06-30]. https://arxiv.org/abs/1406.2199.
[24] NG Joey H, HAUSKNECHT M, VIJAYANARASIMHAN S, et al. Beyond short snippets: Deep networks for video classification[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2015: 4694?4702.
[25] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735–1780.
[26] WANG Limin, XIONG Yuanjun, WANG Zhe, et al. Temporal segment networks: towards good practices for deep action recognition[C]//Computer vision–ECCV 2016. Berlin, German: Springer, 2016: 20?36.
[27] LIN Ji, GAN Chuang, HAN Song. TSM: temporal shift module for efficient video understanding[C]//2019 IEEE/CVF International Conference on Computer Vision. New York, USA: IEEE, 2019: 7082?7092.
[28] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//2015 IEEE International Conference on Computer Vision. New York, USA: IEEE, 2015: 4489?4497.
[29] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2016: 770?778.
[30] TRAN D, RAY J, SHOU ZHENG, et al. ConvNet architecture search for spatiotemporal feature learning[EB/OL]. (2017-8-16) [2021-06-30]. https://arxiv.org/abs/1708.05038.
[31] TRAN D, WANG Heng, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2018: 6450?6459.
[32] FEICHTENHOFER C, FAN Haoqi, MALIK J, et al. SlowFast networks for video recognition[C]//2019 IEEE/CVF International Conference on Computer Vision. New York, USA: IEEE, 2019: 6201?6210.
[33] 彭金栓, 高翠翠, 郭应时. 基于熵率值的驾驶人视觉与心理负荷特性分析[J]. 重庆交通大学学报(自然科学版), 2014, 33(2): 118–121
PENG Jinshuan, GAO Cuicui, GUO Yingshi. Drivers’ visual characteristics and mental load based on entropy rates[J]. Journal of Chongqing Jiaotong University (natural science edition), 2014, 33(2): 118–121
[34] 袁伟, 付锐, 马勇, 等. 车速与标志文字高度对驾驶人视觉搜索模式的影响[J]. 交通运输工程学报, 2011, 11(1): 119–126
YUAN Wei, FU Rui, MA Yong, et al. Effects of vehicle speed and traffic sign text height on drivers’ visual search patterns[J]. Journal of traffic and transportation engineering, 2011, 11(1): 119–126

Similar References:

Memo

Last Update: 1900-01-01

Recognition of driver’s eye movement based on the human visual cortex two-stream model PDF DownloadHTML

Memo

Recognition of driver’s eye movement based on the human visual cortex two-stream model

PDF Download HTML