[1]HUANG Yucheng,XIAO Ziwang,WU Danfeng,et al.Spatiotemporal fusion and discriminative augmentation for improved Siamese tracking[J].CAAI Transactions on Intelligent Systems,2024,19(5):1218-1227.[doi:10.11992/tis.202306005]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
19
Number of periods:
2024 5
Page number:
1218-1227
Column:
学术论文—智能系统
Public date:
2024-09-05
- Title:
-
Spatiotemporal fusion and discriminative augmentation for improved Siamese tracking
- Author(s):
-
HUANG Yucheng1; XIAO Ziwang1; WU Danfeng2; HAMDULLA A1
-
1. School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China;
2. College of Robotics, Beijing Union University, Beijing 100101, China
-
- Keywords:
-
artificial intelligence; deep learning; computer vision; object tracking; neural network; Transformer; feature fusion; temporal modeling
- CLC:
-
TP18
- DOI:
-
10.11992/tis.202306005
- Abstract:
-
The development of Siamese trackers has considerably enhanced the tracking performance. However, current trackers have difficulty accurately describing changes in the appearance of the target, which results in performance degradation under occlusion and scale changes. Cluttered backgrounds can interfere with the tracker response and mislead target localization. Therefore, two Transformer-based modules are introduced to improve the performance of Siamese trackers. Specifically, the spatiotemporal fusion module uses a cross attention mechanism for global feature association to iteratively accumulate historical clues for improving the robustness of the target appearance change. Meanwhile, the discriminative enhancement module associates semantic information between the target and the search area to enhance the target discrimination capability. In addition, adaptive weighted channel-spatial fusion is utilized to fully explore the spatiotemporal information of spatial distribution and semantic similarity. The proposed module can be embedded into mainstream Siamese trackers and exhibits superior performance on public datasets.