<-上一篇/Previous Article 下一篇/Next Article->

[1]黄庆明,王树徽,许倩倩,等.以图像视频为中心的跨媒体分析与推理[J].智能系统学报,2021,16(5):835-848.[doi:10.11992/tis.202105042]
　HUANG Qingming,WANG Shuhui,XU Qianqian,et al.Image video centered cross-media analysis and reasoning[J].CAAI Transactions on Intelligent Systems,2021,16(5):835-848.[doi:10.11992/tis.202105042]

点击复制

以图像视频为中心的跨媒体分析与推理

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 16 期数: 2021年第5期页码: 835-848 栏目: 吴文俊人工智能自然科学奖一等奖出版日期: 2021-09-05

Title:: Image video centered cross-media analysis and reasoning

作者:: 黄庆明^1,2, 王树徽², 许倩倩², 李亮², 蒋树强²; 1. 中国科学院大学计算机科学与技术学院，北京 100049;
2. 中国科学院计算技术研究所智能信息处理实验室，北京 100190

Author(s):: HUANG Qingming^1,2, WANG Shuhui², XU Qianqian², LI Liang², JIANG Shuqiang²; 1. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China;
2. Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

关键词:: 跨媒体; 图像视频; 统一表征; 关联理解; 可解释推理; 人机协同; 知识图谱; 内容管理与服务

Keywords:: cross-media; image video; unified representation; correlative understanding; explainable reasoning; Human-computer collaboration; knowledge graph; content management and service

分类号:: TP37

DOI:: 10.11992/tis.202105042

摘要:: 如何跨越从跨媒体数据到跨媒体知识所面临的“异构鸿沟”和“语义鸿沟”，对体量巨大的跨媒体数据进行有效管理与利用，是发展新一代人工智能亟待突破的瓶颈问题。针对以图像视频为代表的海量网络跨媒体内容，借鉴人类感知与认知机理，本文对跨媒体内容统一表征与符号化表征、跨媒体深度关联理解、类人跨媒体智能推理等关键技术开展研究。基于上述关键技术，着力于解决发展新一代人工智能的知识匮乏共性难题，开展大规模跨媒体知识图谱的构建及人机协同标注技术研究，为跨媒体感知进阶到认知提供关键支撑，进一步为跨媒体理解、检索、内容转换生成等跨媒体内容管理与服务热点应用领域提供了可行思路。

Abstract:: How to surpass the heterogeneity gap and semantic gap between the cross-media content and cross-media knowledge, and how to manage and utilize the huge amount of cross-media data effectively are urgent bottleneck problems of developing a new generation of artificial intelligence. Aiming at massive online cross-media content represented by image video and by referring to human perception and cognition mechanisms, this paper undertakes studies on such key technologies as unified representation and symbolic representation of cross-media content, deep correlative understanding of cross-media and human-like cross-media intelligent reasoning. Based on the above technologies, this paper focuses on solving the common problem of knowledge shortage in the development of a new generation of artificial intelligence and carries out a research on the construction of large-scale cross-media knowledge graph and the human-machine cooperation based labeling technology, to provide strong support for the advancement from cross-media perception to cognition and further provide feasible solutions towards cross-media content management and popular service applications, e.g., cross-media content understanding, retrieval, content transformation and generation, etc.

参考文献/References:: [1] HUBEL D H, WIESEL T N. Early exploration of the visual cortex[J]. Neuron, 1998, 20(3): 401?412.
[2] MARR D. Vision?: a computational investigation into the human representation and processing of visual information[M]. Cambridge: The MIT Press, 2010.
[3] CHOMSKY N. Aspects of the theory of syntax[M]. Cambridge, MA: The MIT Press, 1965.
[4] MCGURK H, MACDONALD J. Hearing lips and seeing voices[J]. Nature, 1976, 264(5588): 746–748.
[5] NEWELL A, SIMON H A. Computer science as empirical inquiry: symbols and search[J]. Communications of the ACM, 1976, 19(3): 113–126.
[6] PETERSON G W, SAMPSON J R JR, REARDON R C. Career development and services: a cognitive approach[M]. Thomson Brooks/Cole Publishing Co, 1991.
[7] DESOLNEUX A, MOISAN L, MOREL J M. From gestalt theory to image analysis[M]. New York: Springer, 2008.
[8] PARK H J, FRISTON K. Structural and functional brain networks: from connections to cognition[J]. Science, 2013, 342(6158): 1238411.
[9] 王树徽, 闫旭, 黄庆明. 跨媒体分析与推理技术研究综述[J]. 计算机科学, 2021, 48(3): 79–86
WANG Shuhui, YAN Xu, HUANG Qingming. Overview of research on cross-media analysis and reasoning technology[J]. Computer science, 2021, 48(3): 79–86
[10] ZHANG Shiliang, TIAN Qi, HUA Gang, et al. Descriptive visual words and visual phrases for image applications[C]//Proceedings of the 17th ACM International Conference on Multimedia. Beijing, China, 2009: 75-84.
[11] WU Yiling, WANG Shuhui, SONG Guoli, et al. Learning fragment self-attention embeddings for image-text matching[C]//Proceedings of the 27th ACM International Conference on Multimedia. Nice, France, 2019: 2088-2096.
[12] SONG Guoli, WANG Shuhui, HUANG Qingming, et al. Harmonized multimodal learning with gaussian process latent variable models[J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 43(3): 858–872.
[13] WANG Shuhui, HUANG Qingming, JIANG Shuqiang, et al. S³MKL: scalable semi-supervised multiple kernel learning for real-world image applications[J]. IEEE transactions on multimedia, 2012, 14(4): 1259–1274.
[14] XU Qianqian, HUANG Qingming, JIANG Tingting, et al. HodgeRank on random graphs for subjective video quality assessment[J]. IEEE transactions on multimedia, 2012, 14(3): 844–857.
[15] LI Liang, JIANG Shuqiang, HUANG Qingming. Learning hierarchical semantic description via mixed-norm regularization for image understanding[J]. IEEE transactions on multimedia, 2012, 14(5): 1401–1413.
[16] SHEN Li, WANG Shuhui, SUN Gang, et al. Multi-level discriminative dictionary learning towards hierarchical visual categorization[C]//2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA, 2013: 383-390.
[17] DENG Jincan, LI Liang, ZHANG Beichen, et al. Syntax-guided hierarchical attention network for video captioning[J]. IEEE transactions on circuits and systems for video technology, 2021(99):1.
[18] CHEN Yangyu, WANG Shuhui, ZHANG Weigang, et al. Less is more: picking informative frames for video captioning[C]//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany, 2018: 367-384.
[19] LIU Zhenhuan, DENG Jincan, LI Liang, et al. IR-GAN: image manipulation with linguistic instruction by increment reasoning[C]//Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA, 2020: 322-330.
[20] QI Zhaobo, WANG Shuhui, SU Chi, et al. Towards more explainability: concept knowledge mining network for event recognition[C]//Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA, 2020: 3857-3865.
[21] HAN Xinzhe, WANG Shuhui, SU Chi, et al. Interpretable visual reasoning via probabilistic formulation under natural supervision[C]//Proceedings of the 16th European Conference. Glasgow, UK, 2020: 553-570.
[22] LI Zhaopeng, XU Qianqian, JIANG Yangbangyan, et al. Quaternion-based knowledge graph network for recommendation[C]//Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA, 2020: 880-888.
[23] ZHANG Beichen, LI Liang, SU Li, et al. Structural semantic adversarial active learning for image captioning[C]//Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA, 2020: 1112-1121.
[24] YANG Shijie, LI Liang, WANG Shuhui, et al. Structured stochastic recurrent network for linguistic video prediction[C]//Proceedings of the 27th ACM International Conference on Multimedia. Nice, France, 2019: 21-29.

备注/Memo

收稿日期:2021-05-27。
基金项目:科技创新2030-新一代人工智能重大项目(2018AAA0102000)；国家自然科学基金项目(62022083，61976202，61771457，61732007)
作者简介:黄庆明，教授，博士生导师，主要研究方向为多媒体分析与计算机视觉。IEEE Fellow，享受国务院政府特殊津贴，IEEE TCSVT、自动化学报等期刊的编委，获吴文俊人工智能自然科学奖一等奖 (第一完成人)。主持科技创新2030?“新一代人工智能”重大项目、国家自然科学基金重点项目和重点国际合作项目、国家973 计划课题、科学院前沿科学研究重点计划等项目多项。发表学术论文170余篇;王树徽，研究员，博士生导师，主要研究方向为跨媒体分析推理与图像视频理解。获 2020 年吴文俊人工智能自然科学一等奖 (第二完成人)、CCF 科学技术奖 (2012)、全国多媒体大会最佳论文奖等。发表学术论文50余篇;许倩倩，副研究员，主要研究方向为数据挖掘和机器学习。获吴文俊人工智能自然科学奖一等奖 (第三完成人)、中国人工智能学会最佳青年科技成果奖、中国图象图形学学会石青云女科学家奖、吴文俊人工智能优秀青年ACM中国 SIGMM 新星奖、中国人工智能学会优秀博士学位论文、中科院百篇优秀博士学位论文、CCF-腾讯犀牛鸟科研金、首届 CAAI-华为 MindSpore 学术奖励基金等。发表学术论文40余篇。
通讯作者:王树徽.E-mail:wangshuhui@ict.ac.cn

更新日期/Last Update: 1900-01-01

以图像视频为中心的跨媒体分析与推理 PDF下载HTML

备注/Memo

以图像视频为中心的跨媒体分析与推理

PDF下载 HTML