[1]黄昱程,肖子旺,武丹凤,等.时空融合与判别力增强的孪生网络目标跟踪方法[J].智能系统学报,2024,19(5):1218-1227.[doi:10.11992/tis.202306005]
HUANG Yucheng,XIAO Ziwang,WU Danfeng,et al.Spatiotemporal fusion and discriminative augmentation for improved Siamese tracking[J].CAAI Transactions on Intelligent Systems,2024,19(5):1218-1227.[doi:10.11992/tis.202306005]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
19
期数:
2024年第5期
页码:
1218-1227
栏目:
学术论文—智能系统
出版日期:
2024-09-05
- Title:
-
Spatiotemporal fusion and discriminative augmentation for improved Siamese tracking
- 作者:
-
黄昱程1, 肖子旺1, 武丹凤2, 艾斯卡尔·艾木都拉1
-
1. 新疆大学 计算机科学与技术学院, 新疆 乌鲁木齐 830046;
2. 北京联合大学 机器人学院, 北京 100101
- Author(s):
-
HUANG Yucheng1, XIAO Ziwang1, WU Danfeng2, HAMDULLA A1
-
1. School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China;
2. College of Robotics, Beijing Union University, Beijing 100101, China
-
- 关键词:
-
人工智能; 深度学习; 计算机视觉; 目标跟踪; 神经网络; Transformer; 特征融合; 时序建模
- Keywords:
-
artificial intelligence; deep learning; computer vision; object tracking; neural network; Transformer; feature fusion; temporal modeling
- 分类号:
-
TP18
- DOI:
-
10.11992/tis.202306005
- 文献标志码:
-
2024-08-30
- 摘要:
-
孪生跟踪器的出现极大提升了跟踪任务性能。然而,当前跟踪器难以精准描述目标外观变化,造成面临遮挡和尺度变化等挑战时的性能衰减。另外,杂乱背景会产生干扰响应图,误导目标定位。为此,引入2个基于Transformer的跟踪模块用于提高孪生跟踪器性能。其中时空融合模块使用交叉注意力机制的全局特征关联,迭代累积历史线索从而提高目标外貌变化的鲁棒性。判别力增强模块关联目标和搜索区域的语义信息,以提高目标判别能力。此外,使用空间通道加权特征融合,充分发掘空间分布和语义相似性的时空信息。所提模块可嵌入主流孪生跟踪器,在公开数据集上的实验证明了方案的优越性。
- Abstract:
-
The development of Siamese trackers has considerably enhanced the tracking performance. However, current trackers have difficulty accurately describing changes in the appearance of the target, which results in performance degradation under occlusion and scale changes. Cluttered backgrounds can interfere with the tracker response and mislead target localization. Therefore, two Transformer-based modules are introduced to improve the performance of Siamese trackers. Specifically, the spatiotemporal fusion module uses a cross attention mechanism for global feature association to iteratively accumulate historical clues for improving the robustness of the target appearance change. Meanwhile, the discriminative enhancement module associates semantic information between the target and the search area to enhance the target discrimination capability. In addition, adaptive weighted channel-spatial fusion is utilized to fully explore the spatiotemporal information of spatial distribution and semantic similarity. The proposed module can be embedded into mainstream Siamese trackers and exhibits superior performance on public datasets.
更新日期/Last Update:
2024-09-05