[1]ZHAO Rongfeng,LU Baoli,TANG Xiaojiang,et al.Multi-source hybrid-modality dataset and hierarchical fusion classification method for intelligent cockpits[J].CAAI Transactions on Intelligent Systems,2026,21(1):83-94.[doi:10.11992/tis.202507024]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
21
Number of periods:
2026 1
Page number:
83-94
Column:
学术论文—机器学习
Public date:
2026-03-05
- Title:
-
Multi-source hybrid-modality dataset and hierarchical fusion classification method for intelligent cockpits
- Author(s):
-
ZHAO Rongfeng1; 2; LU Baoli1; TANG Xiaojiang1; HU Min4; LI Weijun1; 3; NING Xin1; 2
-
1. Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083, China;
2. College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Sciences, Beijing 100049, China;
3. School of Integrated Circuits, University of Chinese Academy of Sciences, Beijing 100049, China;
4. Beijing Ratu Technology Co., Ltd, Beijing 100096, China
-
- Keywords:
-
intelligent cockpit; dataset; multimodal fusion; visual multimodality; behavior classification; dangerous behavior; behavior recognition; multi-source data
- CLC:
-
TP391.4
- DOI:
-
10.11992/tis.202507024
- Abstract:
-
The scarcity of open-source data for intelligent cockpits in the driving domain is characterized by limited modality dimensions, insufficient annotations, and restricted scene diversity. To address these challenges, a multi-source hybrid-modality dataset has been constructed. This dataset incorporates RGB, depth, and infrared visual data, along with structured textual data detailing vehicle information and driving scenarios. A dual-layer annotation scheme is applied to capture ten behavior categories. Leveraging this dataset, a hierarchical multi-modal fusion framework is proposed to enhance feature extraction via cross-modal information exchange and semantically guided fusion mechanisms. Experiments on video classification tasks reveal significant improvements in environmental understanding when combining RGB data with additional modalities. Using the full range of modalities leads to a 15.75% increase in accuracy compared to using only RGB data. These results validate the effectiveness of the multi-source hybrid-modality dataset in advancing intelligent cockpit systems.