[1]赵荣峰,卢宝莉,唐小江,等.面向智能座舱的多源混合模态数据集及层次化融合分类方法[J].智能系统学报,2026,21(1):83-94.[doi:10.11992/tis.202507024]
 ZHAO Rongfeng,LU Baoli,TANG Xiaojiang,et al.Multi-source hybrid-modality dataset and hierarchical fusion classification method for intelligent cockpits[J].CAAI Transactions on Intelligent Systems,2026,21(1):83-94.[doi:10.11992/tis.202507024]
点击复制

面向智能座舱的多源混合模态数据集及层次化融合分类方法

参考文献/References:
[1] 郗来乐, 林声浩, 王震, 等. 智能网联汽车自动驾驶安全: 威胁、攻击与防护[J]. 软件学报, 2025, 36(4): 1859-1880 XI Laile, LIN Shenghao, WANG Zhen, et al. Autonomous driving security of intelligent connected vehicles: threats, attacks, and defenses[J]. Journal of software, 2025, 36(4): 1859-1880
[2] 褚万里, 郭鹏, 章捷, 等. 机动车驾驶员疲劳驾驶检测方法研究综述[J]. 电子设计工程, 2025, 33(4): 36-41 CHU Wanli, GUO Peng, ZHANG Jie, et al. Review of research on fatigue driving detection methods for motor vehicle drivers[J]. Electronic design engineering, 2025, 33(4): 36-41
[3] 王润民, 朱宇, 赵祥模, 等. 自动驾驶测试场景研究进展[J]. 交通运输工程学报, 2021, 21(2): 21-37 WANG Runmin, ZHU Yu, ZHAO Xiangmo, et al. Research progress on test scenario of autonomous driving[J]. Journal of traffic and transportation engineering, 2021, 21(2): 21-37
[4] GAO Fei, GE Xiaojun, LI Jinyu, et al. Intelligent cockpits for connected vehicles: taxonomy, architecture, interaction technologies, and future directions[J]. Sensors, 2024, 24(16): 5172
[5] 刘佳雨. 自动-人工驾驶车辆混行下快速路合流区交通安全评价[D]. 哈尔滨: 哈尔滨工业大学, 2021. LIU Jiayu. Traffic safety evaluation of freeway merging areas under mixed traffic of automated and human-driven vehicles[D]. Harbin: Harbin Institute of Technology, 2021.
[6] GRIGORESCU S, TRASNEA B, COCIAS T, et al. A survey of deep learning techniques for autonomous driving[J]. Journal of field robotics, 2020, 37(3): 362-386
[7] BALTRU?AITIS T, AHUJA C, MORENCY L P. Multimodal machine learning: a survey and taxonomy[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 41(2): 423-443
[8] 张辉, 杜瑞, 钟杭, 等. 电力设施多模态精细化机器人巡检关键技术及应用[J]. 自动化学报, 2025, 51(1): 20-42 ZHANG Hui, DU Rui, ZHONG Hang, et al. The key technology and application of multi-modal fine robot inspection for power facilities[J]. Acta automatica sinica, 2025, 51(1): 20-42
[9] CHEN Long, LI Yuchen, HUANG Chao, et al. Milestones in autonomous driving and intelligent vehicles: survey of surveys[J]. IEEE transactions on intelligent vehicles, 2023, 8(2): 1046-1056
[10] XU Peng, ZHU Xiatian, CLIFTON D A. Multimodal learning with Transformers: a survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2023, 45(10): 12113-12132
[11] SCHULDT C, LAPTEV I, CAPUTO B. Recognizing human actions: a local SVM approach[C]//Proceedings of the 17th International Conference on Pattern Recognition. Piscataway: IEEE, 2004: 32-36.
[12] GORELICK L, BLANK M, SHECHTMAN E, et al. Actions as space-time shapes[J]. IEEE transactions on pattern analysis and machine intelligence, 2007, 29(12): 2247-2253
[13] MARSZALEK M, LAPTEV I, SCHMID C. Actions in context[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2009: 2929-2936.
[14] SOOMRO K, ZAMIR A R, SHAH M. UCF101: a dataset of 101 human actions classes from videos in the wild[EB/OL]. (2012-12-03)[2025-07-24]. https://arxiv.org/abs/1212.0402.
[15] KUEHNE H, JHUANG H, STIEFELHAGEN R, et al. HMDB51: a large video database for human motion recognition[C]//High Performance Computing in Science and Engineering ‘12. Berlin: Springer, 2013: 571-582.
[16] CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? a new model and the kinetics dataset[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 4724-4733.
[17] SHAHROUDY A, LIU Jun, NG T T, et al. NTU RGB+D: a large scale dataset for 3D human activity analysis[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1010-1019.
[18] GU Chunhui, SUN Chen, ROSS D A, et al. AVA: a video dataset of spatio-temporally localized atomic visual actions[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6047-6056.
[19] RASOULI A, KOTSERUBA I, TSOTSOS J K. Are they going to cross? a benchmark dataset and baseline for pedestrian crosswalk behavior[C]//2017 IEEE International Conference on Computer Vision Workshops. Piscataway: IEEE, 2018: 206-213.
[20] SUN Pei, KRETZSCHMAR H, DOTIWALLA X, et al. Scalability in perception for autonomous driving: waymo open dataset[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 2443-2451.
[21] CAESAR H, BANKITI V, LANG A H, et al. nuScenes: a multimodal dataset for autonomous driving[EB/OL]. (2020-05-05)[2025-07-24]. https://arxiv.org/abs/1903.11027.
[22] CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[EB/OL]. (2016-04-07)[2025-07-24]. https://arxiv.org/abs/1604.01685.
[23] MARTIN M, ROITBERG A, HAURILET M, et al. Drive&Act: a multi-modal dataset for fine-grained driver behavior recognition in autonomous vehicles[C]//2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2020: 2801-2810.
[24] ORTEGA J D, KOSE N, CA?AS P, et al. DMD: a large-scale multi-modal driver monitoring dataset for attention and alertness analysis[C]//Computer Vision – ECCV 2020 Workshops. Cham: Springer, 2020: 387-405.
[25] ZHAO Chihang, GAO Yongsheng, HE Jie, et al. Recognition of driving postures by multiwavelet transform and multilayer perceptron classifier[J]. Engineering applications of artificial intelligence, 2012, 25(8): 1677-1686
[26] ABOUELNAGA Y, ERAQI H M, MOUSTAFA M N. Real-time distracted driver posture classification[EB/OL]. (2018-11-29)[2025-07-24]. https://arxiv.org/abs/1706.09498.
[27] FEICHTENHOFER C, FAN Haoqi, MALIK J, et al. SlowFast networks for video recognition[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6201-6210.
[28] WANG Huogen, SONG Zhanjie, LI Wanqing, et al. A hybrid network for large-scale action recognition from RGB and depth modalities[J]. Sensors, 2020, 20(11): 3305
[29] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL]. (2021-02-26)[2025-07-24]. https://arxiv.org/abs/2103.00020.
[30] CHENG Feng, WANG Xizi, LEI Jie, et al. VindLU: a recipe for effective video-and-language pretraining[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 10739-10750.
[31] LI Kunchang, LI Xinhao, WANG Yi, et al. VideoMamba: state space model forEfficient video understanding[C]//Computer Vision–ECCV 2024. Cham: Springer, 2025: 237-255.
[32] ZHANG Zhengyou. Flexible camera calibration by viewing a plane from unknown orientations[C]//Proceedings of the Seventh IEEE International Conference on Computer Vision. Piscataway: IEEE, 2002: 666-673.
[33] HUANG Zhilin, LIANG Quanmin, YU Yijie, et al. Bilateral event mining and complementary for event stream super-resolution[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 34-43.
相似文献/References:
[1]宋婉茹,赵晴晴,陈昌红,等.行人重识别研究综述[J].智能系统学报,2017,12(6):770.[doi:10.11992/tis.201706084]
 SONG Wanru,ZHAO Qingqing,CHEN Changhong,et al.Survey on pedestrian re-identification research[J].CAAI Transactions on Intelligent Systems,2017,12():770.[doi:10.11992/tis.201706084]
[2]朱文霖,刘华平,王博文,等.基于视-触跨模态感知的智能导盲系统[J].智能系统学报,2020,15(1):33.[doi:10.11992/tis.201908015]
 ZHU Wenlin,LIU Huaping,WANG Bowen,et al.An intelligent blind guidance system based on visual-touch cross-modal perception[J].CAAI Transactions on Intelligent Systems,2020,15():33.[doi:10.11992/tis.201908015]
[3]徐坚.语义图支持的阅读理解型问题的自动生成[J].智能系统学报,2024,19(2):420.[doi:10.11992/tis.202207001]
 XU Jian.Generating reading comprehension questions automatically based on semantic graphs[J].CAAI Transactions on Intelligent Systems,2024,19():420.[doi:10.11992/tis.202207001]
[4]吴一全,庞雅轩.手机表面缺陷的机器视觉检测方法研究进展[J].智能系统学报,2025,20(1):33.[doi:10.11992/tis.202312036]
 WU Yiquan,PANG Yaxuan.Research progress of mobile phone surface defect detection based on machine vision[J].CAAI Transactions on Intelligent Systems,2025,20():33.[doi:10.11992/tis.202312036]
[5]宫彦,王乃棒,张新钰,等.面向智能网联汽车的 BEV 感知技术与发展趋势[J].智能系统学报,2026,21(1):41.[doi:10.11992/tis.202505027]
 GONG Yan,WANG Naibang,ZHANG Xinyu,et al.BEV perception technologies and development trends for intelligent connected vehicles[J].CAAI Transactions on Intelligent Systems,2026,21():41.[doi:10.11992/tis.202505027]

备注/Memo

收稿日期:2025-7-16。
基金项目:北京市自然科学基金-小米创新联合基金(L233036).
作者简介:赵荣峰,硕士研究生,主要研究方向为智能座舱多模态、多模态大模型和视频理解。获得“优秀义务兵”及“嘉奖”,“青创北京”2022年“挑战杯”首都大学生创业计划竞赛“青绘团史”专项赛省级金奖,2022年国家励志奖学金,2023年北京市“优秀毕业生”称号。 E-mail:zhaorongfeng23@semi.ac.cn。;卢宝莉,助理研究员,博士,中国计算机学会高级会员、中国人工智能学会青年工作委员会委员,曾担任IEEE HPBD&IS 2021和IEEE HDIS 2022国际会议组织主席。主要研究方向为计算机视觉、智能系统、人工智能辅助诊疗。作为子课题负责人及项目骨干参与了国家重点研发计划、国家自然科学基金、北京市自然科学基金等项目10余项,获得发明专利授权10项,在2025 长三角(芜湖)算力算法创新应用大赛中荣获算法赛道冠军,发表学术论文20余篇。E-mail:lubaoli@semi.ac.cn。;宁欣,研究员,博士生导师。中国计算机学会、中国人工智能学会、中国图象图形学学会高级会员,入选2022—2024年全球2%顶尖科学家榜单,中国科学院青促会会员。主持国家重点研发计划、国家自然科学基金青年基金/面上基金、北京市自然科学基金等项目5项。获国家发明专利授权30余项,获中国电子学会科技进步二等奖,获中国科学院半导体研究所首届青年创芯奖一等奖,入选中国科学院半导体研究所青年研究员计划。发表学术论文100余篇,撰写英文专著1部。E-mail:ningxin@semi.ac.cn。
通讯作者:卢宝莉. E-mail:lubaoli@semi.ac.cn

更新日期/Last Update: 2026-01-05
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com