[1]YANG Yuyu,YANG Xiao,PAN Zaiyu,et al.Domain adaptive semantic segmentation based on prototype-guided and adaptive feature fusion[J].CAAI Transactions on Intelligent Systems,2025,20(1):150-161.[doi:10.11992/tis.202403010]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
20
Number of periods:
2025 1
Page number:
150-161
Column:
学术论文—智能系统
Public date:
2025-01-05
- Title:
-
Domain adaptive semantic segmentation based on prototype-guided and adaptive feature fusion
- Author(s):
-
YANG Yuyu; YANG Xiao; PAN Zaiyu; WANG Jun
-
School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
-
- Keywords:
-
deep learning; unsupervised learning; domain adaptation; semantic segmentation; attention mechanism; self-training learning; self-adaptive; transfer learning; prototype guidance
- CLC:
-
TP301
- DOI:
-
10.11992/tis.202403010
- Abstract:
-
Unsupervised domain adaptation techniques are of significant importance to reducing the data annotation workload for computer vision tasks, particularly in pixel-level semantic segmentation. However, challenges such as the dispersed feature distribution and class imbalance in the target domain, such as blurred class boundaries and insufficient samples for certain categories, pose challenges to this technology. To address these challenges, this paper proposes a prototype-guided adaptive feature fusion model. It incorporates a dual attention network guided by prototypes to fuse spatial and channel attention features, enhancing class-wise compactness. Furthermore, this paper introduces an adaptive feature fusion module that flexibly adjusts the importance of each feature, enabling the network to capture more class-discriminative features across different spatial locations and channels, thereby further enhancing the performance of semantic segmentation. Experimental results on two challenging synthetic-to-real benchmarks of GTA5-to-Cityscape and SYNTHIA-to-Cityscape demonstrate the effectiveness of our method, showcasing the model’s capability to handle complex scenes and imbalanced data.