[1]XU Boxiang,LIU Li,QIU Taorong.Near-duplicate document image retrieval based on three-stream convolutional Siamese network[J].CAAI Transactions on Intelligent Systems,2022,17(3):515-522.[doi:10.11992/tis.202105018]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
17
Number of periods:
2022 3
Page number:
515-522
Column:
学术论文—机器感知与模式识别
Public date:
2022-05-05
- Title:
-
Near-duplicate document image retrieval based on three-stream convolutional Siamese network
- Author(s):
-
XU Boxiang; LIU Li; QIU Taorong
-
School of Information Engineering, Nanchang University, Nanchang 330031, China
-
- Keywords:
-
near-duplicate document image; image retrieval; three-stream convolutional Siamese network; triplet loss; image variations; triplet; feature extraction; robustness
- CLC:
-
TP391
- DOI:
-
10.11992/tis.202105018
- Abstract:
-
In the traditional near-duplicate document image retrieval methods, the variations among the near-duplicate document images had to be manually identified beforehand, which can be easily influenced by human subjectivity. To solve this problem, we propose a three-stream convolutional Siamese network orienting toward the near-duplicate text-image retrieval, which can automatically learn the variation types among the near-duplicate document images. The input to this network is a triplet, consisting of a query image, its near-duplicate image, and its non-near-duplicate image. Using the triplet loss, the distance between the query image and its near-duplicate image is guaranteed to be smaller than that between the query and its non-near-duplicate image. This approach achieves promising results with the mAP of 98.76% and 96.50% on two datasets, respectively, thereby greatly outperforming the state-of-the-art near-duplicate document image retrieval methods.