[1]黄庆明,王树徽,许倩倩,等.以图像视频为中心的跨媒体分析与推理[J].智能系统学报,2021,16(5):835-848.[doi:10.11992/tis.202105042]
 HUANG Qingming,WANG Shuhui,XU Qianqian,et al.Image video centered cross-media analysis and reasoning[J].CAAI Transactions on Intelligent Systems,2021,16(5):835-848.[doi:10.11992/tis.202105042]
点击复制

以图像视频为中心的跨媒体分析与推理

参考文献/References:
[1] HUBEL D H, WIESEL T N. Early exploration of the visual cortex[J]. Neuron, 1998, 20(3): 401?412.
[2] MARR D. Vision?: a computational investigation into the human representation and processing of visual information[M]. Cambridge: The MIT Press, 2010.
[3] CHOMSKY N. Aspects of the theory of syntax[M]. Cambridge, MA: The MIT Press, 1965.
[4] MCGURK H, MACDONALD J. Hearing lips and seeing voices[J]. Nature, 1976, 264(5588): 746–748.
[5] NEWELL A, SIMON H A. Computer science as empirical inquiry: symbols and search[J]. Communications of the ACM, 1976, 19(3): 113–126.
[6] PETERSON G W, SAMPSON J R JR, REARDON R C. Career development and services: a cognitive approach[M]. Thomson Brooks/Cole Publishing Co, 1991.
[7] DESOLNEUX A, MOISAN L, MOREL J M. From gestalt theory to image analysis[M]. New York: Springer, 2008.
[8] PARK H J, FRISTON K. Structural and functional brain networks: from connections to cognition[J]. Science, 2013, 342(6158): 1238411.
[9] 王树徽, 闫旭, 黄庆明. 跨媒体分析与推理技术研究综述[J]. 计算机科学, 2021, 48(3): 79–86
WANG Shuhui, YAN Xu, HUANG Qingming. Overview of research on cross-media analysis and reasoning technology[J]. Computer science, 2021, 48(3): 79–86
[10] ZHANG Shiliang, TIAN Qi, HUA Gang, et al. Descriptive visual words and visual phrases for image applications[C]//Proceedings of the 17th ACM International Conference on Multimedia. Beijing, China, 2009: 75-84.
[11] WU Yiling, WANG Shuhui, SONG Guoli, et al. Learning fragment self-attention embeddings for image-text matching[C]//Proceedings of the 27th ACM International Conference on Multimedia. Nice, France, 2019: 2088-2096.
[12] SONG Guoli, WANG Shuhui, HUANG Qingming, et al. Harmonized multimodal learning with gaussian process latent variable models[J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 43(3): 858–872.
[13] WANG Shuhui, HUANG Qingming, JIANG Shuqiang, et al. S3MKL: scalable semi-supervised multiple kernel learning for real-world image applications[J]. IEEE transactions on multimedia, 2012, 14(4): 1259–1274.
[14] XU Qianqian, HUANG Qingming, JIANG Tingting, et al. HodgeRank on random graphs for subjective video quality assessment[J]. IEEE transactions on multimedia, 2012, 14(3): 844–857.
[15] LI Liang, JIANG Shuqiang, HUANG Qingming. Learning hierarchical semantic description via mixed-norm regularization for image understanding[J]. IEEE transactions on multimedia, 2012, 14(5): 1401–1413.
[16] SHEN Li, WANG Shuhui, SUN Gang, et al. Multi-level discriminative dictionary learning towards hierarchical visual categorization[C]//2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA, 2013: 383-390.
[17] DENG Jincan, LI Liang, ZHANG Beichen, et al. Syntax-guided hierarchical attention network for video captioning[J]. IEEE transactions on circuits and systems for video technology, 2021(99):1.
[18] CHEN Yangyu, WANG Shuhui, ZHANG Weigang, et al. Less is more: picking informative frames for video captioning[C]//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany, 2018: 367-384.
[19] LIU Zhenhuan, DENG Jincan, LI Liang, et al. IR-GAN: image manipulation with linguistic instruction by increment reasoning[C]//Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA, 2020: 322-330.
[20] QI Zhaobo, WANG Shuhui, SU Chi, et al. Towards more explainability: concept knowledge mining network for event recognition[C]//Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA, 2020: 3857-3865.
[21] HAN Xinzhe, WANG Shuhui, SU Chi, et al. Interpretable visual reasoning via probabilistic formulation under natural supervision[C]//Proceedings of the 16th European Conference. Glasgow, UK, 2020: 553-570.
[22] LI Zhaopeng, XU Qianqian, JIANG Yangbangyan, et al. Quaternion-based knowledge graph network for recommendation[C]//Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA, 2020: 880-888.
[23] ZHANG Beichen, LI Liang, SU Li, et al. Structural semantic adversarial active learning for image captioning[C]//Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA, 2020: 1112-1121.
[24] YANG Shijie, LI Liang, WANG Shuhui, et al. Structured stochastic recurrent network for linguistic video prediction[C]//Proceedings of the 27th ACM International Conference on Multimedia. Nice, France, 2019: 21-29.

备注/Memo

收稿日期:2021-05-27。
基金项目:科技创新2030-新一代人工智能重大项目(2018AAA0102000);国家自然科学基金项目(62022083,61976202,61771457,61732007)
作者简介:黄庆明,教授,博士生导师,主要研究方向为多媒体分析与计算机视觉。IEEE Fellow,享受国务院政府特殊津贴,IEEE TCSVT、自动化学报等期刊的编委,获吴文俊人工智能自然科学奖一等奖 (第一完成人)。主持科技创新2030?“新一代人工智能”重大项目、国家自然科学基金重点项目和重点国际合作项目、国家973 计划课题、科学院前沿科学研究重点计划等项目多项。发表学术论文170余篇;王树徽,研究员,博士生导师,主要研究方向为跨媒体分析推理与图像视频理解。获 2020 年吴文俊人工智能自然科学一等奖 (第二完成人)、CCF 科学技术奖 (2012)、全国多媒体大会最佳论文奖等。发表学术论文50余篇;许倩倩,副研究员,主要研究方向为数据挖掘和机器学习。获吴文俊人工智能自然科学奖一等奖 (第三完成人)、中国人工智能学会最佳青年科技成果奖、中国图象图形学学会石青云女科学家奖、吴文俊人工智能优秀青年ACM中国 SIGMM 新星奖、中国人工智能学会优秀博士学位论文、中科院百篇优秀博士学位论文、CCF-腾讯犀牛鸟科研金、首届 CAAI-华为 MindSpore 学术奖励基金等。发表学术论文40余篇。
通讯作者:王树徽.E-mail:wangshuhui@ict.ac.cn

更新日期/Last Update: 1900-01-01
Copyright © 《 智能系统学报》 编辑部
地址:(150001)黑龙江省哈尔滨市南岗区南通大街145-1号楼 电话:0451- 82534001、82518134 邮箱:tis@vip.sina.com