[1]宫大汉,陈辉,陈仕江,等.一致性协议匹配的跨模态图像文本检索方法[J].智能系统学报,2021,16(6):1143-1150.[doi:10.11992/tis.202108013]
 GONG Dahan,CHEN Hui,CHEN Shijiang,et al.Matching with agreement for cross-modal image-text retrieval[J].CAAI Transactions on Intelligent Systems,2021,16(6):1143-1150.[doi:10.11992/tis.202108013]
点击复制

一致性协议匹配的跨模态图像文本检索方法

参考文献/References:
[1] WANG Liwei, LI Yin, LAZEBNIK S. Learning deep structure-preserving image-text embeddings[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA, 2016: 5005-5013.
[2] FAGHRI F, FLEET D J, KIROS J R, et al. VSE++: Improving visual-semantic embeddings with hard negatives[EB/OL]. (2018-07-29)[2021-07-30] https://arxiv.org/pdf/1707.05612.
[3] KARPATHY A, LI Feifei. Deep visual-semantic alignments for generating image descriptions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015: 3128-3137.
[4] NAM H, HA J W, KIM J. Dual attention networks for multimodal reasoning and matching[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, 2017: 2156-2164.
[5] XU K, BA J, KIROS R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]//International Conference on Machine Learning. Sydney, Australia, 2015: 2048-2057.
[6] LEE K H, CHEN Xi, HUA Gang, et al. Stacked cross attention for image-text matching[M]//FERRARI V, HEBERT M, SMINCHISESCU C, et al. Proceedings of the 15th European Conference on Computer Vision-ECCV 2018. Munich, Germany: Springer, 2018: 201-216.
[7] FROME A, CORRADO G S, SHLENS J, et al. DeViSE: A deep visual-semantic embedding model[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Nevada, USA, 2013: 2121-2129.
[8] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10). https://arxiv.org/pdf/1409.1556.
[9] MIKOLOV T, CHEN Kai, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. (2013-09-07)[2021-07-30] https://arxiv.org/pdf/1301.3781.
[10] KIROS R, SALAKHUTDINOV R, ZEMEL R S. Unifying visual-semantic embeddings with multimodal neural language models[EB/OL]. (2014-11-10). https://arxiv.org/pdf/1411.2539.
[11] CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL]. (2014-12-11)[2021-07-30] https://arxiv.org/pdf/1412.3555.
[12] NIU Zhenxing, ZHOU Mo, WANG Le, et al. Hierarchical multimodal LSTM for dense visual-semantic embedding[C]//2017 IEEE International Conference on Computer Vision. Venice, Italy, 2017: 1899-1907.
[13] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada, 2015: 91-99.
[14] CHEN Hui, DING Guiguang, LIN Zijia, et al. Cross-modal image-text retrieval with semantic consistency[C]//Proceedings of the 27th ACM International Conference on Multimedia. Nice, French, 2019: 1749-1757.
[15] YOUNG P, LAI A, HODOSH M, et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions[J]. Transactions of the association for computational linguistics, 2014, 2(1): 67-78.
[16] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft coco: Common objects in context[C]//13th European Conference on Computer Vision-ECCV 2014. Zurich, Switzerland, 2014: 740-755.
[17] PASZKE A, GROSS S, CHINTALA S, et al. Automatic differentiation in PyTorch[C]//31st Conference on Neural Information Processing Systems. Long Beach, USA, 2017.
[18] KINGMA D P, BA J L. Adam: A method for stochastic optimization[EB/OL]. (2015-04-23)[2021-08-01] https://arxiv.org/pdf/1412.6980.
[19] ZHENG Zhedong, ZHENG Liang, GARRETT M, et al. Dual-path convolutional image-text embeddings with instance loss[J]. ACM transactions on multimedia computing, communications, and applications, 2020, 16(2): 51.
[20] HUANG Yan, WANG Wei, WANG Liang. Instance-aware image and sentence matching with selective multimodal LSTM[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA, 2017: 2310-2318.
[21] WANG Yaxiong, YANG Hao, QIAN Xueming, et al. Position focused attention network for image-text matching[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. Macao, China, 2019: 3792-3798.
[22] SONG Yale, SOLEYMANI M. Polysemous visual-semantic embedding for cross-modal retrieval[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA, 2019.
[23] 陈丹, 李永忠, 于沛泽, 等. 跨模态行人重识别研究与展望[J]. 计算机系统应用, 2020, 29(10): 20-28
CHEN Dan, LI Yongzhong, YU Peizhe, et al. Research and prospect of cross modality person re-identification[J]. Computer systems & applications, 2020, 29(10): 20-28
[24] 刘天瑜, 刘正熙. 跨模态行人重识别研究综述[J]. 现代计算机, 2021, 27(7): 135-139
LIU Tianyu, LIU Zhengxi. Overview of cross modality person Re-identification research[J]. Modern computer, 2021, 27(7): 135-139
[25] 姚伟娜. 基于深度哈希算法的图像—文本跨模态检索研究[D]. 北京: 北京交通大学, 2018.
YAO Weina. Image-text cross-modal retrieval based on deep hashing method[D]. Beijing: Beijing Jiaotong University, 2018.
相似文献/References:
[1]李德毅.网络时代人工智能研究与发展[J].智能系统学报,2009,4(1):1.
 LI De-yi.AI research and development in the network age[J].CAAI Transactions on Intelligent Systems,2009,4():1.
[2]赵克勤.二元联系数A+Bi的理论基础与基本算法及在人工智能中的应用[J].智能系统学报,2008,3(6):476.
 ZHAO Ke-qin.The theoretical basis and basic algorithm of binary connection A+Bi and its application in AI[J].CAAI Transactions on Intelligent Systems,2008,3():476.
[3]徐玉如,庞永杰,甘 永,等.智能水下机器人技术展望[J].智能系统学报,2006,1(1):9.
 XU Yu-ru,PANG Yong-jie,GAN Yong,et al.AUV—state-of-the-art and prospect[J].CAAI Transactions on Intelligent Systems,2006,1():9.
[4]王志良.人工心理与人工情感[J].智能系统学报,2006,1(1):38.
 WANG Zhi-liang.Artificial psychology and artificial emotion[J].CAAI Transactions on Intelligent Systems,2006,1():38.
[5]赵克勤.集对分析的不确定性系统理论在AI中的应用[J].智能系统学报,2006,1(2):16.
 ZHAO Ke-qin.The application of uncertainty systems theory of set pair analysis (SPU)in the artificial intelligence[J].CAAI Transactions on Intelligent Systems,2006,1():16.
[6]秦裕林,朱新民,朱 丹.Herbert Simon在最后几年里的两个研究方向[J].智能系统学报,2006,1(2):11.
 QIN Yu-lin,ZHU Xin-min,ZHU Dan.Herbert Simons two research directions in his lost years[J].CAAI Transactions on Intelligent Systems,2006,1():11.
[7]谷文祥,李 丽,李丹丹.规划识别的研究及其应用[J].智能系统学报,2007,2(1):1.
 GU Wen-xiang,LI Li,LI Dan-dan.Research and application of plan recognition[J].CAAI Transactions on Intelligent Systems,2007,2():1.
[8]杨春燕,蔡 文.可拓信息-知识-智能形式化体系研究[J].智能系统学报,2007,2(3):8.
 YANG Chun-yan,CAI Wen.A formalized system of extension information-knowledge-intelligence[J].CAAI Transactions on Intelligent Systems,2007,2():8.
[9]夏 凡,王 宏.基于局部异常行为检测的欺骗识别研究[J].智能系统学报,2007,2(5):12.
 XIA Fan,WANG Hong.Methodologies for deception detection based on abnormal b ehavior[J].CAAI Transactions on Intelligent Systems,2007,2():12.
[10]赵克勤.SPA的同异反系统理论在人工智能研究中的应用[J].智能系统学报,2007,2(5):20.
 ZHAO Ke-qin.The application of SPAbased identicaldiscrepancycontrary system theory in artificial intelligence research[J].CAAI Transactions on Intelligent Systems,2007,2():20.
[11]李雪,蒋树强.智能交互的物体识别增量学习技术综述[J].智能系统学报,2017,12(2):140.[doi:10.11992/tis.201701006]
 LI Xue,JIANG Shuqiang.Incremental learning and object recognition system based on intelligent HCI: a survey[J].CAAI Transactions on Intelligent Systems,2017,12():140.[doi:10.11992/tis.201701006]
[12]刘彪,黄蓉蓉,林和,等.基于卷积神经网络的盲文音乐识别研究[J].智能系统学报,2019,14(1):186.[doi:10.11992/tis.201805002]
 LIU Biao,HUANG Rongrong,LIN He,et al.Research on braille music recognition based on convolutional neural networks[J].CAAI Transactions on Intelligent Systems,2019,14():186.[doi:10.11992/tis.201805002]
[13]王凯诚,鲁华祥,龚国良,等.基于注意力机制的显著性目标检测方法[J].智能系统学报,2020,15(5):956.[doi:10.11992/tis.201903001]
 WANG Kaicheng,LU Huaxiang,GONG Guoliang,et al.Salient object detection method based on the attention mechanism[J].CAAI Transactions on Intelligent Systems,2020,15():956.[doi:10.11992/tis.201903001]
[14]宫大汉,于龙龙,陈辉,等.面向车规级芯片的对象检测模型优化方法[J].智能系统学报,2021,16(5):900.[doi:10.11992/tis.202107057]
 GONG Dahan,YU Longlong,CHEN Hui,et al.Object detection model optimization method for car-level chips[J].CAAI Transactions on Intelligent Systems,2021,16():900.[doi:10.11992/tis.202107057]

备注/Memo

收稿日期:2021-08-13。
基金项目:国家自然科学基金项目(61925107,U1936202);中国博士后科学基金创新人才支持计划项目(BX2021161)
作者简介:宫大汉,博士研究生,主要研究方向为图像语义理解、卷积神经网络压缩加速;陈辉,助理研究员,博士,主要研究方向为图像语义理解、多媒体信息处理;丁贵广,副教授,博士,主要研究方向为多媒体信息处理、计算机视觉感知。主持基金委重点项目、重点研发项目等国家级项目数十项。曾获国家科技进步二等奖、吴文俊人工智能科技进步一等奖、中国电子学会技术发明一等奖等。发表学术论文近百篇,引用量近7 000次.
通讯作者:丁贵广.E-mail:dinggg@tsinghua.edu.cn

更新日期/Last Update: 2021-12-25
Copyright @ 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134