<-上一篇/Previous Article 下一篇/Next Article->

[1]申天啸,韩怡园,韩冰,等.基于人类视觉皮层双通道模型的驾驶员眼动行为识别[J].智能系统学报,2022,17(1):41-49.[doi:10.11992/tis.202106051]
　SHEN Tianxiao,HAN Yiyuan,HAN Bing,et al.Recognition of driver’s eye movement based on the human visual cortex two-stream model[J].CAAI Transactions on Intelligent Systems,2022,17(1):41-49.[doi:10.11992/tis.202106051]

点击复制

基于人类视觉皮层双通道模型的驾驶员眼动行为识别

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 17 期数: 2022年第1期页码: 41-49 栏目: 学术论文—机器学习出版日期: 2022-01-05

Title:: Recognition of driver’s eye movement based on the human visual cortex two-stream model

作者:: 申天啸¹, 韩怡园¹, 韩冰¹, 高新波²; 1. 西安电子科技大学电子工程学院, 陕西西安 710071;
2. 重庆邮电大学重庆市图像认知重点实验室, 重庆 400065

Author(s):: SHEN Tianxiao¹, HAN Yiyuan¹, HAN Bing¹, GAO Xinbo²; 1. School of Electronic Engineering, Xidian University, Xi’an 710071, China;
2. Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

关键词:: 眼动视频数据库; 行为识别; 深度学习; 道路安全; 辅助驾驶; 眼动追踪; 人类视觉系统; 行为研究

Keywords:: eye movement video dataset; action recognition; deep learning; road safety; aided driving; eye tracking; human visual system; behavioral research

分类号:: TP391

DOI:: 10.11992/tis.202106051

摘要:: 驾驶员的危险行为会增加交通事故的发生率，目前对驾驶员行为的研究中大多通过面部识别等方法对异常行为如疲劳驾驶、接电话等进行识别。这种方法仅客观地对驾驶员行为进行分类，而忽略了他们在驾驶过程中的主观心理。眼动仪是记录和分析驾驶员眼动数据的有效工具，可以清晰地了解驾驶员的想法并总结其视觉认知模式。因为目前还没有针对驾驶员眼动行为的数据库，首先构建了真实道路场景下的眼动视频数据集VIPDAR_5，与传统数据相比，它存在更多的摄像机运动、光照变化、视线遮挡等情况。针对这些问题提出了一个基于人类视觉皮层双通路的模型TWNet，通过模拟视觉机制，提高了驾驶员眼动行为的识别性能。另一方面，通过自适应最大池化层和通道权重设置，减少参数，提高准确率。在VIPDAR_5数据集上的实验结果表明，与现有方法相比，该模型能有效识别驾驶员眼动行为。

Abstract:: Drivers’ dangerous actions will increase the incidence of traffic accidents. The current researches on driver’s action are based on facial recognition to recognize abnormal actions, such as fatigue driving, cell phone usage. These methods only classify drivers’ actions objectively and ignore their subjective thoughts during driving. The eye tracker is a device that can record and analyze driver’s eye movement effectively, understand their thoughts clearly and summarize their visual cognition patterns. There is no dataset for driver’s eye movement currently. Therefore, this paper first builds a eye movement video dataset named VIPDAR_5 applicable in real road scenes. Compared with traditional dataset, it contains more camera motion, illumination change, and sight occlusion situations. Therefore, the TWNet model based on two channels of the human visual cortex is built in this paper, which can improve recognition performance by simulating human visual mechanisms. On the other hand, adaptive max-pooling layer and channel weight setting are added to reduce parameters and improve recognition accuracy. Experimental results on the VIPDAR_5 dataset indicate that the model proposed in this paper can effectively recognize drivers’ eye movement in comparison with existing methods.

参考文献/References:: [1] 国家统计局. 中华人民共和国2019年国民经济和社会发展统计公报[N]. 人民日报, 2020-02-29(5).
[2] JAIN D K, JAIN R, LAN Xiangyuan, et al. Driver distraction detection using capsule network[J]. Neural computing and applications, 2021, 33(11): 6183–6196.
[3] LE T H N, ZHENG Yutong, ZHU Chenchen, et al. Multiple scale faster-RCNN approach to driver’s cell-phone usage and hands on steering wheel detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops. New York, USA: IEEE, 2016: 46?53.
[4] 王荣本, 郭克友, 储江伟, 等. 适用驾驶员疲劳状态监测的人眼定位方法研究[J]. 公路交通科技, 2003(5): 111–114
WANG Rongben, GUO Keyou, CHU Jiangwei, et al. Study on the eye location method in driver fatigue state surveillance[J]. Journal of highway and transportation research and development, 2003(5): 111–114
[5] 张杰. 基于眼动仪的驾驶员视点分布特性研究[J]. 湖南交通科技, 2012, 38(4): 153–155,170
ZHANG Jie. Driver’s viewpoint distribution based on the eye tracker[J]. Hunan communication science and technology, 2012, 38(4): 153–155,170
[6] 袁伟, 徐远新, 郭应时, 等. 车道变换与直行时的驾驶人注视转移特性[J]. 长安大学学报(自然科学版), 2015, 35(5): 124–130
YUAN Wei, XU Yuanxin, GUO Yingshi, et al. Fixation transfer characteristics of drivers during lane change and straight drive[J]. Journal of chang’an university (natural science edition), 2015, 35(5): 124–130
[7] MISHKIN M, UNGERLEIDER L G, MACKO K A. Object vision and spatial vision: two cortical pathways[J]. Trends in neurosciences, 1983, 6: 414–417.
[8] KOOTSTRA G, DE BOER B, SCHOMAKER L R B. Predicting eye fixations on complex visual stimuli using local symmetry[J]. Cognitive computation, 2011, 3(1): 223–240.
[9] SOOMRO K, ZAMIR A R, SHAH M. UCF101: a dataset of 101 human actions classes from videos in the wild[EB/OL]. (2012-12-1) [2021-05-30]. https://arxiv.org/abs/1212.0402.
[10] KAY W, CARREIRA J, SIMONYAN K, et al. The kinetics human action video dataset[EB/OL]. (2017-05-19) [2021-05-30]. https://arxiv.org/abs/1705.06950.
[11] SIGURDSSON G A, GUPTA A, SCHMID C, et al. Charades-ego: a large-scale dataset of paired third and first person videos[EB/OL]. (2018-04-30) [2021-05-30]. https://arxiv.org/abs/1804.09626.
[12] DAMEN Dima, DOUGHTY H, FARINELLA G M, et al. Scaling egocentric vision: the dataset[M]//Computer Vision – ECCV 2018. Cham: Springer International Publishing, 2018: 753?771.
[13] JIANG Lai, XU Mai, WANG Zulin. Predicting video saliency with object-to-motion CNN and two-layer convolutional LSTM[EB/OL]. (2017-09-19) [2021-06-30]. https://arxiv.org/abs/1709.06316.
[14] Li Y, Liu M, Rehg J M. In the eye of beholder: Joint learning of gaze and actions in first person video[C]//2018 European Conference on Computer Vision. Berlin, German: Springer, 2018: 619?635.
[15] YIN Li, YE Zhefan, REHG J M. Delving into egocentric actions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2015: 287?295.
[16] MATHE S, SMINCHISESCU C. Actions in the eye: dynamic gaze datasets and learnt saliency models for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(7): 1408–1424.
[17] MARSZALEK M, LAPTEV I, SCHMID C. Actions in context[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2009: 2929?2936.
[18] RODRIGUEZ M. Spatio-temporal maximum average correlation height templates in action recognition and video summarization[EB/OL]. (2013-12-10) [2021-06-30]. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.221.5006.
[19] JUDD T, EHINGER K, DURAND F, et al. Learning to predict where humans look[C]//2009 IEEE 12th International Conference on Computer Vision. New York, USA: IEEE, 2009: 2106?2113.
[20] PAPADOPOULOS D P, CLARKE A D F, KELLER F, et al. Training object class detectors from eye tracking data[C]//Computer vision–ECCV 2014. Berlin, German: Springer, 2014: 361?376.
[21] EVERINGHAM M, GOOL L V, WILLIAMS C K I, et al. The pascal visual object classes (VOC) challenge[J]. International journal of computer vision, 2010, 88(2): 303–338.
[22] JI Shuiwang, XU Wei, YANG Ming, et al. 3D convolutional neural networks for human action recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(1): 221–231.
[23] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[EB/OL]. (2014-11-12) [2021-06-30]. https://arxiv.org/abs/1406.2199.
[24] NG Joey H, HAUSKNECHT M, VIJAYANARASIMHAN S, et al. Beyond short snippets: Deep networks for video classification[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2015: 4694?4702.
[25] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735–1780.
[26] WANG Limin, XIONG Yuanjun, WANG Zhe, et al. Temporal segment networks: towards good practices for deep action recognition[C]//Computer vision–ECCV 2016. Berlin, German: Springer, 2016: 20?36.
[27] LIN Ji, GAN Chuang, HAN Song. TSM: temporal shift module for efficient video understanding[C]//2019 IEEE/CVF International Conference on Computer Vision. New York, USA: IEEE, 2019: 7082?7092.
[28] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//2015 IEEE International Conference on Computer Vision. New York, USA: IEEE, 2015: 4489?4497.
[29] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2016: 770?778.
[30] TRAN D, RAY J, SHOU ZHENG, et al. ConvNet architecture search for spatiotemporal feature learning[EB/OL]. (2017-8-16) [2021-06-30]. https://arxiv.org/abs/1708.05038.
[31] TRAN D, WANG Heng, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2018: 6450?6459.
[32] FEICHTENHOFER C, FAN Haoqi, MALIK J, et al. SlowFast networks for video recognition[C]//2019 IEEE/CVF International Conference on Computer Vision. New York, USA: IEEE, 2019: 6201?6210.
[33] 彭金栓, 高翠翠, 郭应时. 基于熵率值的驾驶人视觉与心理负荷特性分析[J]. 重庆交通大学学报(自然科学版), 2014, 33(2): 118–121
PENG Jinshuan, GAO Cuicui, GUO Yingshi. Drivers’ visual characteristics and mental load based on entropy rates[J]. Journal of Chongqing Jiaotong University (natural science edition), 2014, 33(2): 118–121
[34] 袁伟, 付锐, 马勇, 等. 车速与标志文字高度对驾驶人视觉搜索模式的影响[J]. 交通运输工程学报, 2011, 11(1): 119–126
YUAN Wei, FU Rui, MA Yong, et al. Effects of vehicle speed and traffic sign text height on drivers’ visual search patterns[J]. Journal of traffic and transportation engineering, 2011, 11(1): 119–126

相似文献/References:: [1]梅雪,胡石,许松松,等.基于多尺度特征的双层隐马尔可夫模型及其在行为识别中的应用[J].智能系统学报,2012,7(6):512.
　MEI Xue,HU Shi,XU Songsong,et al.Multi scale feature based double layer HMM and its application in behavior recognition[J].CAAI Transactions on Intelligent Systems,2012,7():512.
[2]姬晓飞,谢旋,任艳.深度学习的双人交互行为识别与预测算法研究[J].智能系统学报,2020,15(3):484.[doi:10.11992/tis.201812029]
　JI Xiaofei,XIE Xuan,REN Yan.Human interaction recognition and prediction algorithm based on deep learning[J].CAAI Transactions on Intelligent Systems,2020,15():484.[doi:10.11992/tis.201812029]
[3]刘董经典,孟雪纯,张紫欣,等.一种基于2D时空信息提取的行为识别算法[J].智能系统学报,2020,15(5):900.[doi:10.11992/tis.201906054]
　LIU Dongjingdian,MENG Xuechun,ZHANG Zixin,et al.A behavioral recognition algorithm based on 2D spatiotemporal information extraction[J].CAAI Transactions on Intelligent Systems,2020,15():900.[doi:10.11992/tis.201906054]
[4]代金利,曹江涛,姬晓飞.交互关系超图卷积模型的双人交互行为识别[J].智能系统学报,2024,19(2):316.[doi:10.11992/tis.202208001]
　DAI Jinli,CAO Jiangtao,JI Xiaofei.Two-person interaction recognition based on the interactive relationship hypergraph convolution network model[J].CAAI Transactions on Intelligent Systems,2024,19():316.[doi:10.11992/tis.202208001]
[5]田枫,卫宁彬,刘芳,等.基于时空-动作自适应融合网络的油田作业行为识别[J].智能系统学报,2024,19(6):1407.[doi:10.11992/tis.202309021]
　TIAN Feng,WEI Ningbin,LIU Fang,et al.Oilfield operation behavior recognition based on spatio-temporal and action adaptive fusion network[J].CAAI Transactions on Intelligent Systems,2024,19():1407.[doi:10.11992/tis.202309021]

备注/Memo

收稿日期:2021-07-01。
基金项目:国家自然科学基金项目（61572384, 62076190，41831072）；西安电子科技大学研究生创新基金项目.
作者简介:申天啸，硕士研究生，主要研究方向为深度学习、人类眼动行为、行为识别;韩怡园，博士研究生，主要研究方向为深度学习、人类视觉注意和人类眼动行为;韩冰，教授，博士生导师，主要研究方向为模式识别、计算机视觉和极光影像分析。主持和参与国家自然科学基金重点项目、国家自然科学基金面上项目、中国博士后一等资助项目、海洋公益项目和青年项目等，发表论文30 余篇，授权国家发明专利13 项，其中成果转化1项。获省科学技术进步奖2项、省高等学校科学技术一等奖1项。
通讯作者:韩冰，E-mail: bhan@xidian.edu.cn

更新日期/Last Update: 1900-01-01

基于人类视觉皮层双通道模型的驾驶员眼动行为识别 PDF下载HTML

备注/Memo

基于人类视觉皮层双通道模型的驾驶员眼动行为识别

PDF下载 HTML