<-上一篇/Previous Article 下一篇/Next Article->

[1]刘冰,李瑞麟,封举富.深度度量学习综述[J].智能系统学报,2019,14(6):1064-1072.[doi:10.11992/tis.201906045]
　LIU Bing,LI Ruilin,FENG Jufu.A brief introduction to deep metric learning[J].CAAI Transactions on Intelligent Systems,2019,14(6):1064-1072.[doi:10.11992/tis.201906045]

点击复制

深度度量学习综述

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 14 期数: 2019年第6期页码: 1064-1072 栏目: 综述出版日期: 2019-11-05

Title:: A brief introduction to deep metric learning

作者:: 刘冰^1,2, 李瑞麟^1,2, 封举富^1,2; 1. 北京大学信息科学技术学院, 北京 100871;
2. 北京大学机器感知与智能教育部重点实验室, 北京 100871

Author(s):: LIU Bing^1,2, LI Ruilin^1,2, FENG Jufu^1,2; 1. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;
2. Key Laboratory of Machine Perception (MOE), Peking University, Beijing 100871, China

关键词:: 深度度量学习; 深度学习; 机器学习; 对比损失; 三元组损失; 代理损失; softmax分类; 温度值

Keywords:: deep metric learning; deep learning; machine learning; contrastive loss; triplet loss; proxy loss; softmax classification; temperature

分类号:: TP181

DOI:: 10.11992/tis.201906045

摘要:: 深度度量学习已成为近年来机器学习最具吸引力的研究领域之一，如何有效的度量物体间的相似性成为问题的关键。现有的依赖成对或成三元组的损失函数，由于正负样本可组合的数量极多，因此一种合理的解决方案是仅对训练有意义的正负样本采样，也称为“难例挖掘”。为减轻挖掘有意义样本时的计算复杂度，代理损失设置了数量远远小于样本集合的代理点集。该综述按照时间顺序，总结了深度度量学习领域比较有代表性的算法，并探讨了其与softmax分类的联系，发现两条看似平行的研究思路，实则背后有着一致的思想。进而文章探索了许多致力于提升softmax判别性能的改进算法，并将其引入到度量学习中，从而进一步缩小类内距离、扩大类间距，提高算法的判别性能。

Abstract:: Recently, deep metric learning (DML) has become one of the most attractive research areas in machine learning. Learning an effective deep metric to measure the similarity between subjects is a key problem. As to existing loss functions that rely on pairwise or triplet-wise, as training data increases, and since the number of positive and negative samples that can be combined is extremely large, a reasonable solution is to sample only positive and negative samples that are meaningful for training, also known as Difficult Case Mining. To alleviate computational complexity of mining meaningful samples, the proxy loss chooses proxy sets that are much smaller than the sample sets. This review summarizes some algorithms representative of DML, according to the time order, and discusses their relationship with softmax classification. It was found that these two seemingly parallel research methods have a consistent idea behind them. This paper explores some improved algorithms that aim to improve the softmax discriminative performance, and introduces them into metric learning, so as to further reduce intra-class distance, expand inter-class distance, and, finally, improve the discriminant performance of the algorithm.

参考文献/References:: [1] XING E P, NG A Y, JORDAN M I, et al. Distance metric learning, with application to clustering with side-information[C]//Proceedings of the 15th International Conference on Neural Information Processing Systems. Cambridge, USA, 2002:521-528.
[2] LOWE D G. Similarity metric learning for a variable-kernel classifier[J]. Neural computation, 1995, 7(1):72-85.
[3] COVER T M, HART P. Nearest neighbor pattern classification[J]. IEEE transactions on information theory, 1967, 13(1):21-27.
[4] SUáREZ J L, GARCíA S, HERRERA F. A tutorial on distance metric learning:mathematical foundations, algorithms and software[J]. arXiv preprint arXiv:1812.05944, 2018.
[5] WEINBERGER K Q, SAUL L K. Distance metric learning for large margin nearest neighbor classification[J]. Journal of machine learning research, 2009, 10:207-244.
[6] GOLDBERGER J, ROWEIS S, HINTON G, et al. Neighbourhood components analysis[C]//Proceedings of the 17th International Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada, 2004:513-520.
[7] VAN DER MAATEN L, POSTMA E, VAN DEN HERIK J. Dimensionality reduction:a comparative[J]. Journal of machine learning research, 2009, 10:66-71.
[8] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2012:1097-1105.
[9] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
[10] SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015:1-9.
[11] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016:770-778.
[12] HUANG Gao, LIU Zhuang, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, 2017:4700-4708.
[13] HU Jie, SHEN Li, SUN Gang. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018:7132-7141.
[14] CHECHIK G, SHARMA V, SHALIT U, et al. Large scale online learning of image similarity through ranking[J]. Journal of machine learning research, 2010, 11:1109-1135.
[15] SOHN K. Improved deep metric learning with multi-class n-pair loss objective[C]//Proceedings of the 39th Conference on Neural Information Processing Systems. Barcelona, Spain, 2016:1857-1865.
[16] MOVSHOVITZ-ATTIAS Y, TOSHEV A, LEUNG T K, et al. No fuss distance metric learning using proxies[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy, 2017:360-368.
[17] HERSHEY J R, CHEN Zhuo, LE ROUX J, et al. Deep clustering:Discriminative embeddings for segmentation and separation[C]//2016 IEEE International Conference on Acoustics, Speech and Signal Processing. Shanghai, China, 2016:31-35.
[18] SONG H O, XIANG Yu, JEGELKA S, et al. Deep metric learning via lifted structured feature embedding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016:4004-4012.
[19] SENER O, SONG H O, SAXENA A, et al. Learning transferrable representations for unsupervised domain adaptation[C]//Proceedings of the 30th Conference on Neural Information Processing Systems. Barcelona, Spain, 2016:2110-2118.
[20] BROMLEY J, GUYON I, LECUN Y, et al. Signature verification using a "siamese" time delay neural network[C]//Proceedings of the 6th International Conference on Neural Information Processing Systems. Denver, USA, 1993:737-744.
[21] CHOY C B, GWAK J, SAVARESE S, et al. Universal correspondence network[C]//Proceedings of the 30th Conference on Neural Information Processing Systems. Barcelona, Spain, 2016:2414-2422.
[22] PRABHU Y, VARMA M. FastXML:A fast, accurate and stable tree-classifier for extreme multi-label learning[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA, 2014:263-272.
[23] YEN I E H, HUANG Xiangru, ZHONG Kai, et al. PD-sparse:a primal and dual sparse approach to extreme multiclass and multilabel classification[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York, USA, 2016:3069-3077.
[24] CHOROMANSKA A, AGARWAL A, LANGFORD J. Extreme multi class classification[C]//Neural Information Processing Systems Conference. Lake Tahoe, USA, 2013.
[25] SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet:A unified embedding for face recognition and clustering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015:815-823.
[26] HADSELL R, CHOPRA S, LECUN Y. Dimensionality reduction by learning an invariant mapping[C]//2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA, 2006, 2:1735-1742.
[27] ZHAI A, WU Haoyu. Making classification competitive for deep metric learning[J]. arXiv preprint arXiv:1811.12649, 2018.
[28] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
[29] ZHANG Xu, YU F X, KARAMAN S, et al. Heated-up softmax embedding[J]. arXiv preprint arXiv:1809.04157, 2018.
[30] CHOPRA S, HADSELL R, LECUN Y. Learning a similarity metric discriminatively, with application to face verification[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA, 2005:539-546
[31] ROWEIS S T, SAUL L K. Nonlinear dimensionality reduction by locally linear embedding[J]. Science, 2000, 290(5500):2323-2326.
[32] DONOHO D L, GRIMES C E. Hessian eigenmaps:Locally linear embedding techniques for high-dimensional data[J]. Proceedings of the national academy of sciences of the United States of America, 2003, 100(10):5591-5596.
[33] JOLLIFFE I T. Principal component analysis[M]. Berlin:Springer, 2011.
[34] NOROUZI M, FLEET D J, SALAKHUTDINOV R. Hamming distance metric learning[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2012:1061-1069.
[35] CUI Yin, ZHOU Feng, LIN Yuanqing, et al. Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016:1153-1162.
[36] MISHCHUK A, MISHKIN D, RADENOVIC F, et al. Working hard to know your neighbor’s margins:Local descriptor learning loss[C]//Advances in Neural Information Processing Systems. Long Beach, USA, 2017:4826-4837.
[37] HARWOOD B, KUMAR B G, CARNEIRO G, et al. Smart mining for deep metric learning[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy, 2017:2821-2829.
[38] YUAN Yuhui, YANG Kuiyuan, ZHANG Chao. Hard-aware deeply cascaded embedding[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy, 2017:814-823.
[39] WU Chaoyuan, MANMATHA R, SMOLA A J, et al. Sampling matters in deep embedding learning[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy, 2017:2840-2848.
[40] USTINOVA E, LEMPITSKY V. Learning deep embeddings with histogram loss[C]//Proceedings of the 30th Conference on Neural Information Processing Systems. Barcelona, Spain, 2016:4170-4178.
[41] LIU Hongye, TIAN Yonghong, WANG Yaowei, et al. Deep relative distance learning:Tell the difference between similar vehicles[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016:2167-2175.
[42] LAW M T, URTASUN R, ZEMEL R S. Deep spectral clustering learning[C]//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia, 2017:1985-1994.
[43] FUKUNAGA K, NARENDRA P M. A branch and bound algorithm for computing k-nearest neighbors[J]. IEEE transactions on computers, 1975, C-24(7):750-753.
[44] WEN Yandong, ZHANG Kaipeng, LI Zhifeng, et al. A discriminative feature learning approach for deep face recognition[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands, 2016:499-515.
[45] SONG H O, JEGELKA S, RATHOD V, et al. Deep metric learning via facility location[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, 2017:5382-5390.
[46] BELL S, BALA K. Learning visual similarity for product design with convolutional neural networks[J]. ACM transactions on graphics (TOG), 2015, 34(4):98.
[47] KAUFMAN L, ROUSSEEUW P J, DODGE Y. Clustering by Means of Medoids[M]//Dodge Y. Statistical Data Analysis Based on the L1-Norm and Related Methods. North-Holland:Elsevier, 1987.
[48] LIN Hui, BILMES J A. Learning mixtures of submodular shells with application to document summarization[C]//Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence. Catalina Island, USA, 2012:479-490.
[49] TSCHIATSCHEK S, IYER R K, WEI Haochen, et al. Learning mixtures of submodular functions for image collection summarization[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014:1413-1421.
[50] EMERSON A E. Handbook of theoretical computer science[M]. Amsterdam:Elsevier, 1990.
[51] KNUTH D E. Postscript about NP-hard problems[J]. ACM SIGACT news, 1974, 6(2):15-16.
[52] MANNING C D, RAGHAVAN P, SCHüTZE H. Introduction to information retrieval[M]. New York:Cambridge University Press, 2008.
[53] IONESCU C, VANTZOS O, SMINCHISESCU C. Training deep networks with structured layers by matrix backpropagation[J]. arXiv preprint arXiv:1509.07838, 2015.
[54] WANG Xinshao, HUA Yang, KODIROV E, et al. Ranked list loss for deep metric learning[J]. arXiv preprint arXiv:1903.03238, 2019.
[55] SHARIF RAZAVIAN A, AZIZPOUR H, SULLIVAN J, et al. CNN features off-the-shelf:an astounding baseline for recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Columbus, USA, 2014:806-813.
[56] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving neural networks by preventing co-adaptation of feature detectors[J]. arXiv preprint arXiv:1207.0580, 2012.
[57] SERMANET P, EIGEN D, ZHANG Xiang, et al. OverFeat:Integrated recognition, localization and detection using convolutional networks[J]. arXiv preprint arXiv:1312.6229, 2013.
[58] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Delving deep into rectifiers:Surpassing human-level performance on imagenet classification[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile, 2015:1026-1034.
[59] TAIGMAN Y, Yang MING, RANZATO M A, et al. DeepFace:Closing the gap to human-level performance in face verification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014:1701-1708.
[60] SUN Yi, CHEN Yuheng, WANG Xiaogang, et al. Deep learning face representation by joint identification-verification[C]//Advances in Neural Information Processing Systems. Montreal, Quebec, Canada, 2014:1988-1996.
[61] SUN Yi, WANG Xiaogang, TANG Xiaoou. Deeply learned face representations are sparse, selective, and robust[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015:2892-2900.
[62] WAN Li, ZEILER M, ZHANG Sixin, et al. Regularization of neural networks using DropConnect[C]//Proceedings of the 30th International Conference on Machine Learning. Atlanta, GA, USA, 2013:1058-1066.
[63] DENG Jiankang, ZHOU Yuxiang, ZAFEIRIOU S. Marginal loss for deep face recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, USA, 2017:60-68.
[64] ZHANG Xiao, FANG Zhiyuan, WEN Yandong, et al. Range loss for deep face recognition with long-tailed training data[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy, 2017:5409-5418.
[65] WANG Feng, CHENG Jian, LIU Weiyang, et al. Additive margin softmax for face verification[J]. IEEE signal processing letters, 2018, 25(7):926-930.
[66] CHEN Binghui, DENG Weihong, DU Junping. Noisy softmax:Improving the generalization ability of DCNN via postponing the early softmax saturation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, 2017:5372-5381.
[67] WAN Weitao, ZHONG Yuanyi, LI Tianpeng, et al. Rethinking feature distribution for loss functions in image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018:9117-9126.
[68] QI Xianbiao, ZHANG Lei. Face recognition via centralized coordinate learning[J]. arXiv preprint arXiv:1801.05678, 2018.
[69] LIU Weiyang, WEN Yandong, YU Zhiding, et al. Sphereface:SphereFace:Deep hypersphere embedding for face recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA, 2017:212-220.
[70] WANG Hao, WANG Yitong, ZHOU Zheng, et al. CosFace:Large margin cosine loss for deep face recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018:5265-5274.
[71] LIU Weiyang, WEN Yandong, YU Zhiding, et al. Large-Margin Softmax Loss for Convolutional Neural Networks[C]//Proceedings of the 33rd International Conference on Machine Learning. New York, USA, 2016, 2(3):7.
[72] BUCILU? C, CARUANA R, NICULESCU-MIZIL A. Model compression[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, USA, 2006:535-541.
[73] WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD Birds-200-2011 Dataset[R]. Computation & Neural Systems Technical Report, CNS-TR-2011-001, Pasadena, CA, USA:California Institute of Technology, 2011.

相似文献/References:: [1]张媛媛,霍静,杨婉琪,等.深度信念网络的二代身份证异构人脸核实算法[J].智能系统学报,2015,10(2):193.[doi:10.3969/j.issn.1673-4785.201405060]
　ZHANG Yuanyuan,HUO Jing,YANG Wanqi,et al.A deep belief network-based heterogeneous face verification method for the second-generation identity card[J].CAAI Transactions on Intelligent Systems,2015,10():193.[doi:10.3969/j.issn.1673-4785.201405060]
[2]丁科,谭营.GPU通用计算及其在计算智能领域的应用[J].智能系统学报,2015,10(1):1.[doi:10.3969/j.issn.1673-4785.201403072]
　DING Ke,TAN Ying.A review on general purpose computing on GPUs and its applications in computational intelligence[J].CAAI Transactions on Intelligent Systems,2015,10():1.[doi:10.3969/j.issn.1673-4785.201403072]
[3]马晓,张番栋,封举富.基于深度学习特征的稀疏表示的人脸识别方法[J].智能系统学报,2016,11(3):279.[doi:10.11992/tis.201603026]
　MA Xiao,ZHANG Fandong,FENG Jufu.Sparse representation via deep learning features based face recognition method[J].CAAI Transactions on Intelligent Systems,2016,11():279.[doi:10.11992/tis.201603026]
[4]刘帅师,程曦,郭文燕,等.深度学习方法研究新进展[J].智能系统学报,2016,11(5):567.[doi:10.11992/tis.201511028]
　LIU Shuaishi,CHENG Xi,GUO Wenyan,et al.Progress report on new research in deep learning[J].CAAI Transactions on Intelligent Systems,2016,11():567.[doi:10.11992/tis.201511028]
[5]马世龙,乌尼日其其格,李小平.大数据与深度学习综述[J].智能系统学报,2016,11(6):728.[doi:10.11992/tis.201611021]
　MA Shilong,WUNIRI Qiqige,LI Xiaoping.Deep learning with big data: state of the art and development[J].CAAI Transactions on Intelligent Systems,2016,11():728.[doi:10.11992/tis.201611021]
[6]王亚杰,邱虹坤,吴燕燕,等.计算机博弈的研究与发展[J].智能系统学报,2016,11(6):788.[doi:10.11992/tis.201609006]
　WANG Yajie,QIU Hongkun,WU Yanyan,et al.Research and development of computer games[J].CAAI Transactions on Intelligent Systems,2016,11():788.[doi:10.11992/tis.201609006]
[7]黄心汉.A3I:21世纪科技之光[J].智能系统学报,2016,11(6):835.[doi:10.11992/tis.201605022]
　HUANG Xinhan.A3I: the star of science and technology for the 21st century[J].CAAI Transactions on Intelligent Systems,2016,11():835.[doi:10.11992/tis.201605022]
[8]宋婉茹,赵晴晴,陈昌红,等.行人重识别研究综述[J].智能系统学报,2017,12(6):770.[doi:10.11992/tis.201706084]
　SONG Wanru,ZHAO Qingqing,CHEN Changhong,et al.Survey on pedestrian re-identification research[J].CAAI Transactions on Intelligent Systems,2017,12():770.[doi:10.11992/tis.201706084]
[9]杨梦铎,栾咏红,刘文军,等.基于自编码器的特征迁移算法[J].智能系统学报,2017,12(6):894.[doi:10.11992/tis.201706037]
　YANG Mengduo,LUAN Yonghong,LIU Wenjun,et al.Feature transfer algorithm based on an auto-encoder[J].CAAI Transactions on Intelligent Systems,2017,12():894.[doi:10.11992/tis.201706037]
[10]王科俊,赵彦东,邢向磊.深度学习在无人驾驶汽车领域应用的研究进展[J].智能系统学报,2018,13(1):55.[doi:10.11992/tis.201609029]
　WANG Kejun,ZHAO Yandong,XING Xianglei.Deep learning in driverless vehicles[J].CAAI Transactions on Intelligent Systems,2018,13():55.[doi:10.11992/tis.201609029]

备注/Memo

收稿日期:2019-06-24。
基金项目:国家自然科学基金重点项目（61333015）.
作者简介:刘冰,女,1994年生,博士研究生,主要研究方向为深度学习、计算机视觉和生物特征识别;李瑞麟,男,1995年生,硕士研究生,主要研究方向为深度学习、计算机视觉和生物特征识别;封举富,男,1967年生,教授,博士生导师,主要研究方向为图像处理、模式识别、机器学习和生物特征识别。主持和参与国家自然科学基金、"十一五"国家科技支撑计划课题、973计划等项目多项。曾获中国高校科技二等奖、第一届亚洲计算机视觉国际会议优秀论文奖、北京大学安泰教师奖、北京大学大众电脑优秀奖、北京大学安泰项目奖等奖励多项。发表学术论文300余篇。
通讯作者:封举富.E-mail:fjf@cis.pku.edu.cn

更新日期/Last Update: 2019-12-25

深度度量学习综述 PDF下载HTML

备注/Memo

深度度量学习综述

PDF下载 HTML