[1]闫河,刘灵坤,黄俊滨,等.结合多尺度注意力机制和双向门控循环网络的视频摘要模型[J].智能系统学报,2024,19(2):446-454.[doi:10.11992/tis.202209048]
YAN He,LIU Lingkun,HUANG Junbin,et al.Video summarization model based on the multiscale attention mechanism and bidirectional gated recurrent network[J].CAAI Transactions on Intelligent Systems,2024,19(2):446-454.[doi:10.11992/tis.202209048]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
19
期数:
2024年第2期
页码:
446-454
栏目:
学术论文—人工智能基础
出版日期:
2024-03-05
- Title:
-
Video summarization model based on the multiscale attention mechanism and bidirectional gated recurrent network
- 作者:
-
闫河, 刘灵坤, 黄俊滨, 张烨, 段思宇
-
重庆理工大学 两江人工智能学院, 重庆 401135
- Author(s):
-
YAN He, LIU Lingkun, HUANG Junbin, ZHANG Ye, DUAN Siyu
-
Liangjiang College of Artificial Intelligence, Chongqing University of Technology, Chongqing 401135, China
-
- 关键词:
-
视频摘要; 自注意力机制; 重要性分数; 长程依赖; 计算机视觉; 双向门控循环神经网络; 非极大值抑制; 核时序分割方法
- Keywords:
-
video summary; self-attention mechanism; importance score; long-range dependence; computer vision; BiGRU; nonmaximum suppression (NMS); kernel temporal segmentation (KTS)
- 分类号:
-
TP391.41
- DOI:
-
10.11992/tis.202209048
- 文献标志码:
-
2023-11-14
- 摘要:
-
针对视频摘要任务中全局注意力在长距离视频序列上注意力值分布的方差较大,生成关键帧的重要性分数偏差较大,且时间序列节点边界值缺乏长程依赖导致的片段语义连贯性较差等问题,通过改进注意力模块,采用分段局部自注意力和全局自注意力机制相结合来获取局部和全局视频序列关键特征,降低注意力值的方差。同时通过并行地引入双向门控循环网络(bidirectional recurrent neural network, BiGRU),二者的输出分别输入到改进的分类回归模块后再将结果进行加性融合,最后利用非极大值抑制(non-maximum suppression, NMS)和核时序分割方法(kernel temporal segmentation, KTS)筛选片段并分割为高质量代表性镜头,通过背包组合优化算法生成最终摘要,从而提出一种结合多尺度注意力机制和双向门控循环网络的视频摘要模型(local and global attentions combine with the BiGRU, LG-RU)。该模型在TvSum和SumMe的标准和增强数据集上进行了对比试验,结果表明该模型取得了更高的F-score,证实了该视频摘要模型保持高准确率的同时可鲁棒地对视频完成摘要。
- Abstract:
-
In the video summary task, the variance of global attention value distribution on long distance video sequences is large, the importance score of generating key frames is large, and the semantic coherence of fragments is poor due to the lack of long-range dependence on the boundary values of time series nodes. Herein, by improving the attention module, segmented local self-attention and global self-attention mechanisms are merged to acquire the key features of local and global video sequences and lower the variance of attention values. Concurrently, the bidirectional gated recurrent neural network (BiGRU) is introduced in parallel, the output is input into the enhanced classification regression module, and afterward, the results are additively fused. Lastly, nonmaximum suppression and kernel temporal segmentation methods are applied to filter fragments and segment them into high-quality representative shots. The final summary is created by the knapsack combinatorial optimization algorithm. The video summary model LG-RU, which integrates the multiscale attention mechanism and BiGRU, is developed and compared with TvSum and SumMe’s standard and enhanced data sets. It is demonstrated that the model has a higher F-score, which verifies that this model can complete the video summary robustly while preserving high accuracy.
备注/Memo
收稿日期:2022-09-23。
基金项目:国家重点研发计划“智能机器人”重点专项项目(2018YFB1308602);国家自然科学基金面上项目(61173184);重庆市自然科学基金项目(cstc2018jcyjAX0694).
作者简介:闫河,博士,教授,主要研究方向为图像多尺度几何分析、目标跟踪、模式识别。主持国家自然科学基金面上项目、中国博士后基金项目各1项,重庆市自然科学基金项目、教育部重点实验室访问学者基金项目各2项;以单位负责人参加科技部“十三五”重点研发计划“智能机器人”重点专项项目1项;参研省部级项目10余项。发表学术论文90余篇。E-mail:yanhe@cqut.edu.cn;刘灵坤,硕士研究生,主要研究方向为与深度学习相结合的视频摘要处理、视频理解、目标检测。E-mail:LiuLingK@stu.cqut.edu.cn;黄俊滨黄骏滨,硕士研究生,主要研究方向为与深度学习相结合的视频摘要处理和视频描述方法。E-mail:huangjunbin@2020.cqut.edu.cn
通讯作者:闫河. E-mail:yanhe@cqut.edu.cn
更新日期/Last Update:
1900-01-01