[1]GONG Dahan,CHEN Hui,CHEN Shijiang,et al.Matching with agreement for cross-modal image-text retrieval[J].CAAI Transactions on Intelligent Systems,2021,16(6):1143-1150.[doi:10.11992/tis.202108013]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
16
Number of periods:
2021 6
Page number:
1143-1150
Column:
吴文俊人工智能科学技术奖论坛
Public date:
2021-11-05
- Title:
-
Matching with agreement for cross-modal image-text retrieval
- Author(s):
-
GONG Dahan1; 2; CHEN Hui2; 3; CHEN Shijiang4; BAO Yongjun5; DING Guiguang1; 2
-
1. School of Software, Tsinghua University, Beijing 100084, China;
2. Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China;
3. Department of Automation, Tsinghua University, Beijing 1000
-
- Keywords:
-
artificial intelligence; computer vision; vision and language; cross-modal retrieval; matching with agreement; attention; convolutional neural network; recurrent neural network; gated recurrent unit
- CLC:
-
TP18
- DOI:
-
10.11992/tis.202108013
- Abstract:
-
The task of cross-modal image-text retrieval is important to understand the correspondence between vision and language. Most existing methods leverage different attention modules to explore region-to-word and word-to-region alignments and study fine-grained cross-modal correlations. However, the inconsistent alignment problem based on attention has rarely been considered. This study proposes a matching with agreement (MAG) method, which aims to take advantage of the alignment consistency, enhancing the cross-modal retrieval performance. The attention mechanism is adopted to achieve the cross-modal association alignment, which is then used to perform a cross-modal matching agreement with a novel competitive voting strategy. This agreement evaluates the cross-modal matching consistency and effectively improves the performance. The extensive experiments on two benchmark datasets, namely, Flickr30K and MS COCO, show that our MAG method can achieve state-of-the-art performance, demonstrating its effectiveness well.