[1]刘万军,赵思琪,曲海成,等.结合前景特征增强与区域掩码自注意力的细粒度图像分类[J].智能系统学报,2022,17(6):1134-1144.[doi:10.11992/tis.202109029]
LIU Wanjun,ZHAO Siqi,QU Haicheng,et al.Combining foreground feature reinforcement and region mask self-attention for fine-grained image classification[J].CAAI Transactions on Intelligent Systems,2022,17(6):1134-1144.[doi:10.11992/tis.202109029]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
17
期数:
2022年第6期
页码:
1134-1144
栏目:
学术论文—机器感知与模式识别
出版日期:
2022-11-05
- Title:
-
Combining foreground feature reinforcement and region mask self-attention for fine-grained image classification
- 作者:
-
刘万军, 赵思琪, 曲海成, 王宇萍
-
辽宁工程技术大学 软件学院,辽宁 葫芦岛 125105
- Author(s):
-
LIU Wanjun, ZHAO Siqi, QU Haicheng, WANG Yuping
-
School of Software, Liaoning Technical University, Huludao 125105, China
-
- 关键词:
-
细粒度图像分类; 目标定位; 区域掩码; 自注意力; 多样化特征; 特征增强; 残差网络; 深度学习
- Keywords:
-
fine-grained image classification; object localization; region-based mask; self-attention; diverse feature; feature reinforcement; residual network; deep learning
- 分类号:
-
TP391.4
- DOI:
-
10.11992/tis.202109029
- 文献标志码:
-
2022-10-08
- 摘要:
-
为解决细粒度图像分类中不相关背景信息干扰以及子类别差异特征难以提取等问题,提出了一种结合前景特征增强和区域掩码自注意力的细粒度图像分类方法。首先,利用ResNet50提取输入图片的全局特征;然后通过前景特征增强网络定位前景目标在输入图片中的位置,在消除背景信息干扰的同时对前景目标进行特征增强,有效突出前景物体;最后,将特征增强的前景目标通过区域掩码自注意力网络学习丰富、多样化且区别于其他子类的特征信息。在训练模型的整个过程,建立多分支损失函数约束特征学习。实验表明,该模型在细粒度图像数据集CUB-200-2011、Stanford Cars和FGVC-Aircraft的准确率分别达到了88.0%、95.3%和93.6%,优于其他主流方法。
- Abstract:
-
This study presents a method of foreground feature reinforcement and region mask self-attention for fine-grained image classification due to the difficulty in extracting subtle features of subordinate classes that are difficult to distinguish irrelevant background noise interference. The ResNet50 is used first to extract global features of the input image, followed by the foreground feature reinforcement, which predicts the position coordinates of the foreground object in the input image. While eliminating background information interference, the features of foreground objects are enhanced to effectively highlight foreground objects. Finally, the region mask self-attention network is used to teach feature-enhanced foreground objects with rich and diverse fine-grained information that is different from other subclasses. The multi-branch loss function constrains the network’s feature learning throughout the process. The comprehensive experiments show that our approach outperforms other mainstream methods on CUB-200-2011, Stanford Cars datasets, and FGVC-Aircraft, with 88.0%, 95.3%, and 93.6%, respectively.
更新日期/Last Update:
1900-01-01