[1]HUANG Qingming,WANG Shuhui,XU Qianqian,et al.Image video centered cross-media analysis and reasoning[J].CAAI Transactions on Intelligent Systems,2021,16(5):835-848.[doi:10.11992/tis.202105042]
Copy

Image video centered cross-media analysis and reasoning

References:
[1] HUBEL D H, WIESEL T N. Early exploration of the visual cortex[J]. Neuron, 1998, 20(3): 401?412.
[2] MARR D. Vision?: a computational investigation into the human representation and processing of visual information[M]. Cambridge: The MIT Press, 2010.
[3] CHOMSKY N. Aspects of the theory of syntax[M]. Cambridge, MA: The MIT Press, 1965.
[4] MCGURK H, MACDONALD J. Hearing lips and seeing voices[J]. Nature, 1976, 264(5588): 746–748.
[5] NEWELL A, SIMON H A. Computer science as empirical inquiry: symbols and search[J]. Communications of the ACM, 1976, 19(3): 113–126.
[6] PETERSON G W, SAMPSON J R JR, REARDON R C. Career development and services: a cognitive approach[M]. Thomson Brooks/Cole Publishing Co, 1991.
[7] DESOLNEUX A, MOISAN L, MOREL J M. From gestalt theory to image analysis[M]. New York: Springer, 2008.
[8] PARK H J, FRISTON K. Structural and functional brain networks: from connections to cognition[J]. Science, 2013, 342(6158): 1238411.
[9] 王树徽, 闫旭, 黄庆明. 跨媒体分析与推理技术研究综述[J]. 计算机科学, 2021, 48(3): 79–86
WANG Shuhui, YAN Xu, HUANG Qingming. Overview of research on cross-media analysis and reasoning technology[J]. Computer science, 2021, 48(3): 79–86
[10] ZHANG Shiliang, TIAN Qi, HUA Gang, et al. Descriptive visual words and visual phrases for image applications[C]//Proceedings of the 17th ACM International Conference on Multimedia. Beijing, China, 2009: 75-84.
[11] WU Yiling, WANG Shuhui, SONG Guoli, et al. Learning fragment self-attention embeddings for image-text matching[C]//Proceedings of the 27th ACM International Conference on Multimedia. Nice, France, 2019: 2088-2096.
[12] SONG Guoli, WANG Shuhui, HUANG Qingming, et al. Harmonized multimodal learning with gaussian process latent variable models[J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 43(3): 858–872.
[13] WANG Shuhui, HUANG Qingming, JIANG Shuqiang, et al. S3MKL: scalable semi-supervised multiple kernel learning for real-world image applications[J]. IEEE transactions on multimedia, 2012, 14(4): 1259–1274.
[14] XU Qianqian, HUANG Qingming, JIANG Tingting, et al. HodgeRank on random graphs for subjective video quality assessment[J]. IEEE transactions on multimedia, 2012, 14(3): 844–857.
[15] LI Liang, JIANG Shuqiang, HUANG Qingming. Learning hierarchical semantic description via mixed-norm regularization for image understanding[J]. IEEE transactions on multimedia, 2012, 14(5): 1401–1413.
[16] SHEN Li, WANG Shuhui, SUN Gang, et al. Multi-level discriminative dictionary learning towards hierarchical visual categorization[C]//2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA, 2013: 383-390.
[17] DENG Jincan, LI Liang, ZHANG Beichen, et al. Syntax-guided hierarchical attention network for video captioning[J]. IEEE transactions on circuits and systems for video technology, 2021(99):1.
[18] CHEN Yangyu, WANG Shuhui, ZHANG Weigang, et al. Less is more: picking informative frames for video captioning[C]//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany, 2018: 367-384.
[19] LIU Zhenhuan, DENG Jincan, LI Liang, et al. IR-GAN: image manipulation with linguistic instruction by increment reasoning[C]//Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA, 2020: 322-330.
[20] QI Zhaobo, WANG Shuhui, SU Chi, et al. Towards more explainability: concept knowledge mining network for event recognition[C]//Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA, 2020: 3857-3865.
[21] HAN Xinzhe, WANG Shuhui, SU Chi, et al. Interpretable visual reasoning via probabilistic formulation under natural supervision[C]//Proceedings of the 16th European Conference. Glasgow, UK, 2020: 553-570.
[22] LI Zhaopeng, XU Qianqian, JIANG Yangbangyan, et al. Quaternion-based knowledge graph network for recommendation[C]//Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA, 2020: 880-888.
[23] ZHANG Beichen, LI Liang, SU Li, et al. Structural semantic adversarial active learning for image captioning[C]//Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA, 2020: 1112-1121.
[24] YANG Shijie, LI Liang, WANG Shuhui, et al. Structured stochastic recurrent network for linguistic video prediction[C]//Proceedings of the 27th ACM International Conference on Multimedia. Nice, France, 2019: 21-29.
Similar References:

Memo

-

Last Update: 1900-01-01

Copyright © CAAI Transactions on Intelligent Systems