LI Xue,JIANG Shuqiang.Incremental learning and object recognition system based on intelligent HCI: a survey[J].CAAI Transactions on Intelligent Systems,2017,(02):140-149.[doi:10.11992/tis.201701006]





Incremental learning and object recognition system based on intelligent HCI: a survey
李雪12 蒋树强2
1. 山东科技大学 计算机科学与工程学院, 山东 青岛 266590;
2. 中国科学院计算技术研究所 智能信息处理重点实验室, 北京 100190
LI Xue12 JIANG Shuqiang2
1. College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China;
2. Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
artificial intelligencehuman-computer interactioncomputer visionobject recognitionmachine learningmultimodalityroboticsinteractive learning
Intelligent HCI systems focus on the interaction between computers and humans and study whether computers are able to apprehend human instructions. Moreover, this study aims to make the interaction more independent and interactive. To some extent, incremental learning is a way to realize this goal. This study briefly introduces the tasks, background, and information source of intelligent HCI systems; in addition, it focuses on the summary of incremental learning. Similar to the learning mechanism of humans, incremental learning involves acquiring new knowledge on a continuous basis. This allows for the intelligent HCI systems to have the ability of self-growth. This study surveys the works that focus on incremental learning, including the mechanisms and their respective advantages and disadvantages, and highlights the future research directions.


[1] ERNST M O, BüLTHOFF H H. Merging the senses into a robust percept[J]. Trends in cognitive sciences, 2004, 8(4): 162-169.
[2] CORRADINI A, MEHTA M, BERNSEN N O, et al. Multimodal input fusion in human-computer interaction[J]. NATO Science Series Sub Series III Computer and Systems Sciences, 2005, 198: 223.
[3] NODA K, ARIE H, SUGA Y, et al. Multimodal integration learning of robot behavior using deep neural networks[J]. Robotics and autonomous systems, 2014, 62(6): 721-736.
[4] MERI?LI C, KLEE S D, PAPARIAN J, et al. An interactive approach for situated task specification through verbal instructions[C]//Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. Paris, France: International Foundation for Autonomous Agents and Multiagent Systems, 2014: 1069-1076.
[5] CANTRELL R, BENTON J, TALAMADUPULA K, et al. Tell me when and why to do it! Run-time planner model updates via natural language instruction[C]//Proceedings of the 2012 IEEE International Conference on Human-Robot Interaction. Boston, MA: IEEE, 2012: 471-478.
[6] THOMASON J, ZHANG S Q, MOONEY R, et al. Learning to interpret natural language commands through human-robot dialog[C]//Proceedings of the 24th international conference on Artificial Intelligence. Buenos Aires, Argentina: AAAI Press, 2015.
[7] EBERHARD K M, NICHOLSON H, SANDRA K, et al. The Indiana “Cooperative Remote Search Task”(CReST) corpus[C]//Proceedings of the 2010 International Conference on Language Resources and Evaluation. Valletta, Malta: LREC, 2010.
[8] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International journal of computer vision, 2004, 60(2): 91-110.
[9] MORISSET B, RUSU R B, SUNDARESAN A, et al. Leaving flatland: toward real-time 3D navigation[C]//Proceedings of the 2009 IEEE International Conference on Robotics and Automation. Kobe: IEEE, 2009: 3786-3793.
[10] HINTERSTOISSER S, HOLZER S, CAGNIART C, et al. Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes[C]//Proceedings of the 2011 IEEE International Conference on Computer Vision. Barcelona: IEEE, 2011: 858-865.
[11] WANG Anran, LU Jiwen, CAI Jianfei, et al. Large-margin multi-modal deep learning for RGB-D object recognition[J]. IEEE transactions on multimedia, 2015, 17(11): 1887-1898.
[12] LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural computation, 1989, 1(4): 541-551.
[13] THOMASON J, SINAPOV J, SVETLIK M, et al. Learning multi-modal grounded linguistic semantics by playing I spy[C]//Proceedings of the 25th International Joint Conference on Artificial Intelligence. New York, 2016.
[14] LIU C S, CHAI J Y. Learning to mediate perceptual differences in situated human-robot dialogue[C]//Proceedings of the Twenty-Ninth American Association Conference on Artificial Intelligence. Austin, Texas: AAAI Press, 2015: 2288-2294.
[15] PARDE N, HAIR A, PAPAKOSTAS M, et al. Grounding the meaning of words through vision and interactive gameplay[J]. Proceedings of the 24th International Conference on Artificial Intelligence. Buenos Aires, Argentina: AAAI Press, 2015.
[16] MATUSZEK C, FITZGERALD N, ZETTLEMOYER L, et al. A joint model of language and perception for grounded attribute learning[C]//Proceedings of the 29th International Conference on Machine Learning. Edinburgh, Scotland, 2012.
[17] 赵鹏, 陈浩, 刘慧婷, 等. 一种基于图的多模态随机游走重排序算法[J]. 哈尔滨工程大学学报, 2016, 37(10): 1387-1393. ZHAO Peng, CHEN Hao, LIU Huiting, et al. A multimodal graph-based re-ranking through random walk algrithm[J]. Journal of Harbin Engineering University, 2016, 37(10): 1387-1393.
[18] 段喜萍, 刘家锋, 王建华, 等. 多模态特征联合稀疏表示的视频目标跟踪[J]. 哈尔滨工程大学学报, 2015, 36(12): 1609-1613. DUAN Xiping, LIU Jiafeng, WANG Jianhua, et al. Visual target tracking via multi-cue joint sparse representation[J]. Journal of Harbin Engineering University, 2015, 36(12): 1609-1613.
[19] FISHER J W, DARRELL T. Signal level fusion for multimodal perceptual user interface[C]//Proceedings of the 2001 Workshop on Perceptive User Interfaces. New York, NY, USA: ACM, 2001: 1-7.
[20] JOHNSTON M, BANGALORE S. Finite-state multimodal parsing and understanding[C]//Proceedings of the 18th conference on Computational linguistics. Saarbrücken, Germany: ACM, 2000: 369-375.
[21] BETTERIDGE J, CARLSON A, HONG S A, et al. Toward never ending language learning[C]//Proceedings of the American Association for Artificial Intelligence. 2009: 1-2.
[22] CHERNOVA S, THOMAZ A L. Robot learning from human teachers[M]. San Rafael, CA, USA: IEEE, 2014.
[23] MATUSZEK C, BO L F, ZETTLEMOYER L, et al. Learning from unscripted deictic gesture and language for human-robot interactions[C]//Proceedings of the 28th American Association Conference on Artificial Intelligence. Québec City, Québec, Canada: AAAI Press, 2014: 2556-2563.
[24] CUAYáHUITL H, DETHLEFS N. Dialogue systems using online learning: beyond empirical methods[C]//Proceedings of the NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community: Tools and Data. Montreal, Canada: Association for Computational Linguistics, 2012: 7-8.
[25] 顾海巍, 樊绍巍, 金明河, 等. 基于灵巧手触觉信息的未知物体类人探索策略[J]. 哈尔滨工程大学学报, 2016, 37(10): 1400-1407. GU Haiwei, FAN Shaowei, JIN Minghe, et al. An anthropomorphic exploration strategy of unknown object based on haptic information of dexterous robot hand[J]. Journal of Harbin Engineering University, 2016, 37(10): 1400-1407.
[26] KEIZER S, FOSTER M E, WANG Z R, et al. Machine learning for social multiparty human-robot interaction[J]. ACM transactions on interactive intelligent systems (TIIS), 2014, 4(3): 14.
[27] BOHUS D, SAW C W, HORVITZ E. Directions robot: In-the-wild experiences and lessons learned[C]//Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. Richland, SC, 2014: 637-644.
[28] KRAUSE E A, ZILLICH M, WILLIAMS T E, et al. Learning to recognize novel objects in one shot through human-robot interactions in natural language dialogues[C]//Proceedings of the 28th American Association Conference on Artificial Intelligence. Québec City, Québec, Canada: AAAI Press, 2014: 2796-2802.
[29] MENSINK T, VERBEEK J J, PERRONNIN F, et al. Distance-based image classification: generalizing to new classes at near-zero cost[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(11): 2624-2637.
[30] IBA W, WOGULIS J, LANGLEY P A T. Trading off simplicity and coverage in incremental concept learning[C]//Proceedings of the Fifth International Conference on Machine Learning. Ann Arbor: University of Michigan, 1988: 73.
[31] GROSSBERG S. Nonlinear neural networks: Principles, mechanisms, and architectures[J]. Neural networks, 1988, 1(1): 17-61.
[32] POLIKAR R, UPDA L, UPDA S S, et al. Learn++: An incremental learning algorithm for supervised neural networks[J]. IEEE transactions on systems, man, and cybernetics, part C (Applications and reviews), 2001, 31(4): 497-508.
[33] 贾刚, 王宗义. 混合迁移学习方法在医学图像检索中的应用[J]. 哈尔滨工程大学学报, 2015, 36(7): 938-942. JIA Gang, WANG Zongyi. The application of mixed migration learning in medical image retrieval[J]. Journal of Harbin Engineering University, 2015, 36(7): 938-942.
[34] RüPING S. Incremental learning with support vector machines[C]//Proceedings of the 2011 IEEE International Conference on Data Mining. Washington, DC, USA: IEEE, 2001: 641.
[35] CAUWENBERGHS G, POGGIO T. Incremental and decremental support vector machine learning[C]//Proceedings of the 13th International Conference on Advances in neural information processing systems. Cambridge, MA, USA: MIT Press, 2000, 13: 409.
[36] JORDAN M I, JACOBS R A. Hierarchical mixtures of experts and the EM algorithm[J]. Neural computation, 1994, 6(2): 181-214.
[37] WANG E H C, KUH A. A smart algorithm for incremental learning[C]//Proceedings of the 1992 IEEE International Joint Conference on Neural Networks. Baltimore: IEEE, 1992, 3: 121-126.
[38] ENGELBRECHT A P, CLOETE I. Incremental learning using sensitivity analysis[C]//Proceedings of the 1999 International Joint Conference on Neural Networks. Washington DC: IEEE, 1999.
[39] ZHANG B T. An incremental learning algorithm that optimizes network size and sample size in one trial[C]//Proceedings of the 1994 IEEE World Congress on Computational Intelligence. Orlando, FL, USA: IEEE, 1994, 1: 215-220.
[40] LI F F, FERGUS R, PERONA P. One-shot learning of object categories[J]. IEEE transactions on pattern analysis and machine intelligence, 2006, 28(4): 594-611.
[41] TOMMASI T, ORABONA F, CAPUTO B. Learning categories from few examples with multi model knowledge transfer[J]. IEEE transactions on pattern analysis and machine intelligence, 2014, 36(5): 928-941.
[42] LAMPERT C H, NICKISCH H, HARMELING S. Learning to detect unseen object classes by between-class attribute transfer[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL: IEEE, 2009: 951-958.
[43] KUZBORSKIJ I, ORABONA F, CAPUTO B. From N to N+1: Multiclass transfer incremental learning[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR: IEEE, 2013: 3358-3365.
[44] RISTIN M, GUILLAUMIN M, GALL J, et al. Incremental learning of NCM forests for large-scale image classification[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH: IEEE, 2014: 3654-3661.
[45] DA Qing, YU Yang, ZHOU Zhihua. Learning with augmented class by exploiting unlabeled data[C]//Proceedings of the 28th American Association Conference on Artificial Intelligence. Québec, Canada: AAAI Press, 2014: 1760-1766.
[46] CARPENTER G A, GROSSBERG S, REYNOLDS J H. ARTMAP: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network[J]. Neural networks, 1991, 4(5): 565-588.
[47] VIJAYAKUMAR S, OGAWA H. RKHS-based functional analysis for exact incremental learning[J]. Neurocomputing, 1999, 29(1/2/3): 85-113.
[48] KARASUYAMA M, TAKEUCHI I. Multiple incremental decremental learning of support vector machines[J]. IEEE transactions on neural networks archive, 2010, 21(7): 1048-1059.
[49] GRETTON A, DESOBRY F. On-line one-class support vector machines. an application to signal segmentation[C]//Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing. Hong Kong, China: IEEE, 2003.
[50] LASKOV P, GEHL C, KRüGER S, et al. Incremental support vector learning: Analysis, implementation and applications[J]. The Journal of machine learning research archive, 2006, 7: 1909-1936.
[51] XIAO Tianjun, ZHANG Jiaxing, YANG Kuiyuan, et al. Error-driven incremental learning in deep convolutional neural network for large-scale image classification[C]//Proceedings of the 22nd ACM international conference on Multimedia. New York, NY: ACM, 2014: 177-186.
[52] LOMONACO V, MALTONI D. Comparing incremental learning strategies for convolutional neural networks[M]//SCHWENKER F, ABBAS H, EL GAYAR N, et al, eds. Artificial Neural Networks in Pattern Recognition. ANNPR 2016. Lecture Notes in Computer Science. Cham: Springer, 2016.
[53] GRIPPO L. Convergent on-line algorithms for supervised learning in neural networks[J]. IEEE transactions on neural networks, 2000, 11(6): 1284-1299.
[54] FU Limin, HSU H H, PRINCIPE J C. Incremental backpropagation learning networks[J]. IEEE transactions on neural networks, 1996, 7(3): 757-761.
[55] GOBET F, LANE P C R, CROKER S, et al. Chunking mechanisms in human learning[J]. Trends in cognitive sciences, 2001, 5(6): 236-243.


 LI De-yi.AI research and development in the network age[J].CAAI Transactions on Intelligent Systems,2009,(02):1.
 ZHAO Ke-qin.The theoretical basis and basic algorithm of binary connection A+Bi and its application in AI[J].CAAI Transactions on Intelligent Systems,2008,(02):476.
[3]徐玉如,庞永杰,甘 永,等.智能水下机器人技术展望[J].智能系统学报,2006,(01):9.
 XU Yu-ru,PANG Yong-jie,GAN Yong,et al.AUV—state-of-the-art and prospect[J].CAAI Transactions on Intelligent Systems,2006,(02):9.
 WANG Zhi-liang.Artificial psychology and artificial emotion[J].CAAI Transactions on Intelligent Systems,2006,(02):38.
 ZHAO Ke-qin.The application of uncertainty systems theory of set pair analysis (SPU)in the artificial intelligence[J].CAAI Transactions on Intelligent Systems,2006,(02):16.
[6]秦裕林,朱新民,朱 丹.Herbert Simon在最后几年里的两个研究方向[J].智能系统学报,2006,(02):11.
 QIN Yu-lin,ZHU Xin-min,ZHU Dan.Herbert Simons two research directions in his lost years[J].CAAI Transactions on Intelligent Systems,2006,(02):11.
[7]谷文祥,李 丽,李丹丹.规划识别的研究及其应用[J].智能系统学报,2007,(01):1.
 GU Wen-xiang,LI Li,LI Dan-dan.Research and application of plan recognition[J].CAAI Transactions on Intelligent Systems,2007,(02):1.
[8]杨春燕,蔡 文.可拓信息-知识-智能形式化体系研究[J].智能系统学报,2007,(03):8.
 YANG Chun-yan,CAI Wen.A formalized system of extension information-knowledge-intelligence[J].CAAI Transactions on Intelligent Systems,2007,(02):8.
[9]张 菁,沈兰荪,David Dagan FENG.图像搜索中人机交互技术的新进展[J].智能系统学报,2007,(04):14.
 ZHANG Jing,SHEN Lan-sun,David Dagan FENG.computer interaction technology in image searches: a survey[J].CAAI Transactions on Intelligent Systems,2007,(02):14.
 ZHAO Ke-qin.The application of SPAbased identicaldiscrepancycontrary system theory in artificial intelligence research[J].CAAI Transactions on Intelligent Systems,2007,(02):20.


通讯作者:蒋树强. E-mail:sqjiang@ict.ac.cn.
更新日期/Last Update: 1900-01-01