[1]亢洁,刘威.特征融合的装修案例跨模态检索方法[J].智能系统学报,2024,19(2):429-437.[doi:10.11992/tis.202207030]
 KANG Jie,LIU Wei.A cross-modal retrieval algorithm of decoration cases on feature fusion[J].CAAI Transactions on Intelligent Systems,2024,19(2):429-437.[doi:10.11992/tis.202207030]
点击复制

特征融合的装修案例跨模态检索方法

参考文献/References:
[1] 刘颖,郭莹莹,房杰,等. 深度学习跨模态图文检索研究综述[J]. 计算机科学与探索, 2022, 16(3): 489–511
LIU Ying, GUO Yingying, FANG Jie, et al. Survey of research on deep learning cross-modal image-text retrieval[J]. Journal of frontiers of computer science and technology, 2022, 16(3): 489–511
[2] 徐文婉, 周小平, 王佳. 跨模态检索技术研究综述[J]. 计算机工程与应用, 2022, 58(23): 12–23
XU Wenwan, ZHOU Xiaoping, WANG Jia. Overview of cross-modal retrieval technology[J]. Computer engineering and applications, 2022, 58(23): 12–23
[3] 宫大汉, 陈辉, 陈仕江, 等. 一致性协议匹配的跨模态图像文本检索方法[J]. 智能系统学报, 2021, 16(6): 1143–1150
GONG Dahan, CHEN Hui, CHEN Shijiang, et al. Matching with agreement for cross-modal image-text retrieval[J]. CAAI transactions on intelligent systems, 2021, 16(6): 1143–1150
[4] 刘卓锟, 刘华平, 黄文美, 等. 视听觉跨模态表面材质检索[J]. 智能系统学报, 2019, 14(3): 423–429
LIU Zhuokun, LIU Huaping, HUANG Wenmei, et al. Audiovisual cross-modal retrieval for surface material[J]. CAAI transactions on intelligent systems, 2019, 14(3): 423–429
[5] ZHANG Qi, LEI Zhen, ZHANG Zhaoxiang, et al. Context-aware attention network for image-text retrieval[C]// Computer Vision and Pattern Recognition.Seattle: IEEE, 2020: 3536-3545.
[6] DONG Jianfeng, LI Xirong, XU Chaoxi, et al. Dual encoding for video retrieval by text[EB/OL]. (2020-10-10)[2022-01-01]. http://arxiv.org/abs/2009.05381.
[7] RAMACHANDRAM D, TAYLOR G W. Deep multimodal learning: a survey on recent advances and trends[J]. IEEE signal processing magazine, 2017, 34(6): 96–108.
[8] 彭良康, 卢向明, 徐清波. 基于深度学习的跨模态哈希检索研究进展[J]. 数据通信, 2022(3): 32–38
PENG Liangkang, LU Xiangming, XU Qingbo. Research progress of cross-modal hash retrieval based on deep learning[J]. Data communications, 2022(3): 32–38
[9] 许炫淦, 房小兆, 孙为军, 等. 语义嵌入重构的跨模态哈希检索[J]. 计算机应用研究, 2022, 39(6): 1645–1650,1672
XU Xugeng, FANG Xiaozhao, SUN Weijun, et al. Semantics embedding and reconstructing for cross-modal hashing retrieval[J]. Application research of computers, 2022, 39(6): 1645–1650,1672
[10] 冯霞, 胡志毅, 刘才华. 跨模态检索研究进展综述[J]. 计算机科学, 2021, 48(8): 13–23
FENG Xia, HU Zhiyi, LIU Caihua. Survey of research progress on cross-modal retrieval[J]. Computer science, 2021, 48(8): 13–23
[11] 尹奇跃, 黄岩, 张俊格, 等. 基于深度学习的跨模态检索综述[J]. 中国图象图形学报, 2021, 26(6): 1368–1388
YIN Qiyue, HUANG Yan, ZHANG Junge, et al. Survey on deep learning based cross-modal retrieval[J]. Journal of image and graphics, 2021, 26(6): 1368–1388
[12] RASIWASIA N, PEREIRA J C, COVIELLO E, et al. A new approach to cross-modal multimedia retrieval[C]//Proceedings of the 18th ACM International Conference on Multimedia. New York: ACM, 2010: 251-260.
[13] ZHANG Hong, LIU Yun, MA Zhigang. Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval[J]. Neurocomputing, 2013, 119: 10–16.
[14] KAN Meina, SHAN Shiguang, ZHANG Haihong, et al. Multi-view discriminant analysis[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 38(1): 188-194.
[15] YANG Xiangfei , LI Chunna, SHAO Yuanhai. Robust multi-view discriminant analysis with view-consistency[J]. Information sciences, 2022, 596: 153–168.
[16] PENG Yuxin, QI Jinwei, HUANG Xin, et al. CCL: cross-modal correlation learning with multigrained fusion by hierarchical network[J]. IEEE transactions on multimedia, 2018, 20(2): 405–420.
[17] ZHANG Ying, LU Huchuan. Deep cross-modal projection learning for image-text matching[C]//European Conference on Computer Vision. Cham: Springer, 2018: 707-723.
[18] 陈曦, 彭姣, 张鹏飞, 等. 基于预训练模型和编码器的图文跨模态检索算法[J]. 北京邮电大学学报, 2023, 46(5): 112–117
CHEN Xi, PENG Jiao, ZHANG Pengfei, et al. Cross-modal retrieval algorithm for image and text based on pretrained models and encoders[J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(5): 112–117
[19] ZHEN Liangli, HU Peng, WANG Xu, et al. Deep supervised cross-modal retrieval[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2020: 10386-10395.
[20] ZHANG Yifan, ZHOU Wengang, WANG Min, et al. Deep relation embedding for cross-modal retrieval[J]. IEEE transactions on image processing, 2021, 30: 617–627.
[21] WANG Cheng, YANG Haojin, MEINEL C. Deep semantic mapping for cross-modal retrieval[C]//2015 IEEE 27th International Conference on Tools with Artificial Intelligence. Vietri sul Mare: IEEE, 2016: 234-241.
[22] XU Xing, HE Li, LU Huimin, et al. Deep adversarial metric learning for cross-modal retrieval[J]. World wide web, 2019, 22(2): 657–672.
[23] ZHANG Qi, LEI Zhen, ZHANG Zhaoxiang, et al. Context-aware attention network for image-text retrieval[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 3533-3542.
[24] 亢洁, 刘威. 面向装修案例智能匹配的跨模态检索方法[J]. 智能系统学报, 2022, 17(4): 714–720
KANG Jie, LIU Wei. A crossmodal retrieval method for intelligent matching of decoration cases[J]. CAAI transactions on intelligent systems, 2022, 17(4): 714–720
[25] DEVLIN J, CHANG Mingwei, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. (2018-11-11)[2022-01-01]. https://arxiv.org/abs/1810.04805.pdf.
[26] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]// International Conference on Learning Representations. Cambridge: MIT Press, 2015: 1768-1776.
[27] HU Jie, SHEN Li, SUN Gang. Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
[28] GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 2414-2423.
[29] WANG Zihao, LIU Xihui, LI Hongsheng, et al. CAMP: cross-modal adaptive message passing for text-image retrieval[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2020: 5763-5772.
[30] FUKUI A, PARK D H, YANG D, et al. Multimodal compact bilinear pooling for visual question answering and visual grounding[EB/OL]. (2016-06-06)[2022-01-01].https://arxiv.org/abs/1606.01847.pdf.
相似文献/References:
[1]亢洁,刘威.面向装修案例智能匹配的跨模态检索方法[J].智能系统学报,2022,17(4):714.[doi:10.11992/tis.202106012]
 KANG Jie,LIU Wei.A crossmodal retrieval method for intelligent matching of decoration cases[J].CAAI Transactions on Intelligent Systems,2022,17():714.[doi:10.11992/tis.202106012]

备注/Memo

收稿日期:2022-07-20。
基金项目:陕西省重点研发计划项目(2021GY-022).
作者简介:亢洁,副教授,主要研究方向为机器学习、模式识别。近几年主持和参与教学科研项目20余项,授权发明专利2项,发表学术论文20余篇。 E-mail:kangjie@sust.edu.cn;刘威,硕士研究生,主要研究方向为数字图像处理、多模态表示学习。E-mail:535473833@qq.com
通讯作者:亢洁. E-mail:kangjie@sust.edu.cn

更新日期/Last Update: 1900-01-01
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com