[1]胡越,罗东阳,花奎,等.关于深度学习的综述与讨论[J].智能系统学报,2019,14(01):1-19.[doi:10.11992/tis.201808019]
 HU Yue,LUO Dongyang,HUA Kui,et al.Overview on deep learning[J].CAAI Transactions on Intelligent Systems,2019,14(01):1-19.[doi:10.11992/tis.201808019]
点击复制

关于深度学习的综述与讨论(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第14卷
期数:
2019年01期
页码:
1-19
栏目:
出版日期:
2019-01-05

文章信息/Info

Title:
Overview on deep learning
作者:
胡越1 罗东阳1 花奎1 路海明2 张学工13
1. 清华大学 自动化系, 北京 100084;
2. 清华大学 信息技术研究院, 北京 100084;
3. 清华大学 生命学院, 北京 100084
Author(s):
HU Yue1 LUO Dongyang1 HUA Kui1 LU Haiming2 ZHANG Xuegong13
1. Department of Automation, Tsinghua University, Beijing 100084, China;
2. Institute of Information Technology, Tsinghua University, Beijing 100084, China;
3. School of Life Sciences, Tsinghua University, Beijing 100084, China
关键词:
深度学习机器学习卷积神经网络递归神经网络多层感知器自编码机学习算法机器学习理论
Keywords:
deep learningmachine learningconvolutional neural networkrecursive neural networkmultilayer perceptronauto-encoderlearning algorithmsmachine learning theory
分类号:
TP18
DOI:
10.11992/tis.201808019
摘要:
机器学习是通过计算模型和算法从数据中学习规律的一门学问,在各种需要从复杂数据中挖掘规律的领域中有很多应用,已成为当今广义的人工智能领域最核心的技术之一。近年来,多种深度神经网络在大量机器学习问题上取得了令人瞩目的成果,形成了机器学习领域最亮眼的一个新分支——深度学习,也掀起了机器学习理论、方法和应用研究的一个新高潮。对深度学习代表性方法的核心原理和典型优化算法进行了综述,回顾与讨论了深度学习与以往机器学习方法之间的联系与区别,并对深度学习中一些需要进一步研究的问题进行了初步讨论。
Abstract:
Machine learning is a discipline that involves learning rules from data with mathematical models and computer algorithms. It is becoming one of the core technologies in the field of artificial intelligence, and it is useful for many applications that require mining rules from complex data. In recent years, various deep neural network models have achieved remarkable results in many fields, and this has given rise to an interesting new branch of the machine learning:deep learning. Deep learning leads the new wave of studies on theories, methods, and applications of machine learning. This article reviews the relationships and differences between deep learning and previous machine learning methods, summarizes the key principles and typical optimization algorithms of representative deep learning methods, and discusses some remaining problems that need to be further addressed.

参考文献/References:

[1] MCMAHAN H B, HOLT G, SCULLEY D, et al. Ad click prediction:a view from the trenches[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago, USA, 2013:1222-1230.
[2] GRAEPEL T, CANDELA J Q, BORCHERT T, et al. Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s bing search engine[C]//Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa, Israel, 2010:13-20.
[3] HE Xinran, PAN Junfeng, JIN Ou, et al. Practical lessons from predicting clicks on ads at Facebook[C]//Proceedings of the 8th International Workshop on Data Mining for Online Advertising. New York, USA, 2014:1-9.
[4] CHEN Tianqi, HE Tong. Higgs boson discovery with boosted trees[C]//Proceedings of the 2014 International Conference on High-Energy Physics and Machine Learning. Montreal, Canada, 2014:69-80.
[5] GOLUB T R, SLONIM D K, TAMAYO P, et al. Molecular classification of cancer:class discovery and class prediction by gene expression monitoring[J]. Science, 1999, 286(5439):531-537.
[6] POON T C W, CHAN A T C, ZEE B, et al. Application of classification tree and neural network algorithms to the identification of serological liver marker profiles for the diagnosis of hepatocellular carcinoma[J]. Oncology, 2001, 61(4):275-283.
[7] AGARWAL D. Computational advertising:the linkedin way[C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. San Francisco, USA, 2013:1585-1586.
[8] MCCULLOCH W S, PITTS W. A logical calculus of the ideas immanent in nervous activity[J]. Bulletin of mathematical biology, 1990, 52(1/2):99-115.
[9] HEBB D O. The organization of behavior:a neuropsychological theory[M]. New York:John Wiley and Sons, 1949:12-55.
[10] ROSENBLATT F. The perceptron?a perceiving and recognizing automaton[R]. Ithaca, NY:Cornell Aeronautical Laboratory, 1957.
[11] MINSKY M L, PAPERT S A. Perceptrons:an introduction to computational geometry[M]. Cambridge:MIT Press, 1969:227?246.
[12] HAUGELAND J. Artificial intelligence:the very idea[M]. Cambridge:MIT Press, 1989:3-11.
[13] MCCORDUCK P. Machines who think:a personal inquiry into the history and prospects of artificial intelligence[M]. 2nd ed. Natick:A. K. Peters/CRC Press, 2004:2?12.
[14] HOPFIELD J J. Neural networks and physical systems with emergent collective computational abilities[J]. Proceedings of the national academy of sciences of the United States of America, 1982, 79(8):2554-2558.
[15] LE CUN Y. Learning process in an asymmetric threshold network[M]//BIENENSTOCK E, SOULIÉ F, WEISBUCH G. Disordered Systems and Biological Organization. Berlin, Heidelberg:Springer, 1986.
[16] RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors[J]. Nature, 1986, 323(6088):533-536.
[17] PARKER D B. Learning-logic[R]. Technical Report TR-47. Cambridge, MA:Center for Computational Research in Economics and Management Science, Massachusetts Institute of Technology, 1985.
[18] RUMELHART D E, MCCLELLAND J L. Readings in congnitive science[M]. San Francisco:Margan Kaufmann, 1988:399?421.
[19] KOHONEN T. Self-organization and associative memory[M]. 3rd ed. Berlin Heidelberg:Springer-Verlag, 1989:119?155.
[20] SMOLENSKY P. Information processing in dynamical systems:foundations of harmony theory[M]//RUMELHART D E, MCCLELLAND J L. Parallel Distributed Processing, Vol. 1. Cambridge:MIT Press, 1986:194-281.
[21] CORTES C, VAPNIK V. Support-vector networks[J]. Machine learning, 1995, 20(3):273-297.
[22] BOSER B E, GUYON I M, VAPNIK V N. A training algorithm for optimal margin classifiers[C]//Proceedings of the 5th Annual Workshop on Computational Learning Theory. Pittsburgh, Pennsylvania, USA, 1992:144-152.
[23] 张学工. 关于统计学习理论与支持向量机[J]. 自动化学报, 2000, 26(1):32-42 ZHANG Xuegong. Introduction to statistical learning theory and support vector machines[J]. Acta automatica sinica, 2000, 26(1):32-42
[24] HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507.
[25] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6):84-90.
[26] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587):484-489.
[27] TANG Yichuan. Deep learning using linear support vector machines[J]. arXiv:1306.0239, 2015.
[28] BOURLARD H, KAMP Y. Auto-association by multilayer perceptrons and singular value decomposition[J]. Biological cybernetics, 1988, 59(4/5):291-294.
[29] HINTON G E, ZEMEL R S. Autoencoders, minimum description length and Helmholtz free energy[C]//Proceedings of the 6th International Conference on Neural Information Processing Systems. Denver, Colorado, USA, 1993:3-10.
[30] SCHWENK H, MILGRAM M. Transformation invariant autoassociation with application to handwritten character recognition[C]//Proceedings of the 7th International Conference on Neural Information Processing Systems. Denver, Colorado, USA, 1994:991-998.
[31] HINTON G E, MCCLELLAND J L. Learning representations by recirculation[C]//Proceedings of 1987 International Conference on Neural Information Processing Systems. Denver, USA, 1987:3
[32] SCHÖLKOPF B, PLATT J, HOFMANN T. Efficient learning of sparse representations with an energy-based model[C]//Proceedings of 2006 Conference Advances in Neural Information Processing Systems. Vancouver, Canada, 2007:1137-1144.
[33] RANZATO M, HUANG Fujie, BOUREAU Y L, et al. Unsupervised learning of invariant feature hierarchies with applications to object recognition[C]//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA, 2007:1-8.
[34] VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[C]//Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland, 2008:1096-1103.
[35] RIFAI S, VINCENT P, MULLER X, et al. Contractive auto-encoders:explicit invariance during feature extraction[C]//Proceedings of the 28th International Conference on Machine Learning. Bellevue, Washington, USA, 2011:833?840.
[36] BENGIO Y, LAMBLIN P, POPOVICI D, et al. Greedy layer-wise training of deep networks[C]//Proceedings of the 19th International Conference on Neural Information Processing Systems. Vancouver, Canada, 2006:153-160.
[37] SALAKHUTDINOV R, MNIH A, HINTON G. Restricted Boltzmann machines for collaborative filtering[C]//Proceedings of the 24th International Conference on Machine Learning. Corvalis, Oregon, USA, 2007:791-798.
[38] HINTON G E. A practical guide to training restricted Boltzmann machines[M]//MONTAVON G, ORR G B, MÜLLER K R. Neural Networks:Tricks of the Trade. 2nd ed. Berlin, Heidelberg:Springer, 2012:599-619.
[39] LECUN Y, CHOPRA S, HADSELL R, et al. A tutorial on energy-based learning[M]//BAKIR G, HOFMANN T, SCHÖLKOPF B, et al. Predicting Structured Data. Cambridge:MIT Press, 2006:45?49.
[40] LEE H, EKANADHAM C, NG A Y. Sparse deep belief net model for visual area V2[C]//Proceedings of the 20th International Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada, 2007:873-880.
[41] HINTON G E. Deep belief networks[J]. Scholarpedia, 2009, 4(5):5947.
[42] HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural computation, 2006, 18(7):1527-1554.
[43] LE CUN Y, BOSER B, DENKER J S, et al. Handwritten digit recognition with a back-propagation network[C]//Proceedings of the 2nd International Conference on Neural Information Processing Systems. Denver, USA, 1989:396-404.
[44] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324.
[45] HUBEL D H, WIESEL T N. Receptive fields and functional architecture of monkey striate cortex[J]. The journal of physiology, 1968, 195(1):215-243.
[46] SPRINGENBERG J T, DOSOVITSKIY A, BROX T, et al. Striving for simplicity:the all convolutional net[J]. arXiv:1412.6806, 2014.
[47] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409.1556, 2014.
[48] SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015:1-9.
[49] LIN Min, CHEN Qiang, YAN Shuicheng. Network in network[J]. arXiv:1312.4400, 2013.
[50] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016:770-778.
[51] ZHANG Xiang, ZHAO Junbo, LECUN Y. Character-level convolutional networks for text classification[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada, 2015:649-657.
[52] GEHRING J, AULI M, GRANGIER D, et al. Convolutional sequence to sequence learning[J]. arXiv:1705.03122, 2017.
[53] PHAM N Q, KRUSZEWSKI G, BOLEDA G. Convolutional neural network language models[C]//Proceedings of 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas, USA, 2016:1153-1162.
[54] HARRIS D M, HARRIS S J. Digital design and computer architecture[M]. 2nd ed. San Francisco:Morgan Kaufmann Publishers Inc., 2013:123?125.
[55] MIKOLOV T, CHEN Kai, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv:1301.3781, 2013.
[56] GOLDBERG Y, LEVY O. word2vec Explained:deriving Mikolov et al.’s negative-sampling word-embedding method[J]. Arxiv:1402.3722, 2014.
[57] TANG D, QIN B, LIU T. Document modeling with gated recurrent neural network for sentiment classification[C]//Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal, 2015:1422-1432.
[58] SUTSKEVER I, MARTENS J, HINTON G. Generating text with recurrent neural networks[C]//Proceedings of the 28th International Conference on Machine Learning. Bellevue, Washington, USA, 2011:1017-1024.
[59] GRAVES A. Generating sequences with recurrent neural networks[J]. arXiv:1308.0850, 2013.
[60] WERBOS P J. Generalization of backpropagation with application to a recurrent gas market model[J]. Neural networks, 1988, 1(4):339-356.
[61] PASCANU R, MIKOLOV T, BENGIO Y. On the difficulty of training recurrent neural networks[C]//Proceedings of the 30th International Conference on Machine Learning. Atlanta, Georgia, USA, 2013:1310?1318.
[62] BENGIO Y, SIMARD P, FRASCONI P. Learning long-term dependencies with gradient descent is difficult[J]. IEEE transactions on neural networks, 1994, 5(2):157-166.
[63] HOCHREITER S, BENGIO Y, FRASCONI P. Gradient flow in recurrent nets:the difficulty of learning long-term dependencies[M]//KOLEN J F, KREMER S C. A Field Guide to Dynamical Recurrent Networks. New York:Wiley-IEEE Press, 2001:6?8.
[64] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8):1735-1780.
[65] GERS F A, SCHMIDHUBER J, CUMMINS F. Learning to forget:continual prediction with LSTM[J]. Neural computation, 2000, 12(10):2451-2471.
[66] GERS F A, SCHRAUDOLPH N N, SCHMIDHUBER J. Learning precise timing with LSTM recurrent networks[J]. The journal of machine learning research, 2003, 3:115-143.
[67] GERS F A, SCHMIDHUBER J. Recurrent nets that time and count[C]//Proceedings of 2000 IEEE-INNS-ENNS International Joint Conference on Neural Networks. Como, Italy, 2000:3189.
[68] GREFF K, SRIVASTAVA R K, KOUTNIK J, et al. LSTM:a search space odyssey[J]. IEEE transactions on neural networks and learning systems, 2017, 28(10):2222-2232.
[69] CHO K, VAN MERRIENBOER B, GULCEHRE C, ET AL. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv:1406.1078, 2014.
[70] JOZEFOWICZ R, ZAREMBA W, SUTSKEVER I. An empirical exploration of recurrent network architectures[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille, France, 2015:2342-2350.
[71] LE Q V, JAITLY N, HINTON G E. A simple way to initialize recurrent networks of rectified linear units[J]. arXiv:1504.00941, 2015.
[72] WU Yonghui, SCHUSTER M, CHEN Zhifeng, et al. Google’s neural machine translation system:bridging the gap between human and machine translation[J]. arXiv:1609.08144, 2016.
[73] YIN Jun, JIANG Xin, LU Zhengdong, et al. Neural generative question answering[J]. arXiv:1512.01337, 2016.
[74] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014:3104-3112.
[75] MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014:2204-2212.
[76] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv:1409.0473, 2016.
[77] ANDRYCHOWICZ M, KURACH K. Learning efficient algorithms with hierarchical attentive memory[J]. arXiv:1602.03218, 2016.
[78] SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE transactions on signal processing, 1997, 45(11):2673-2681.
[79] PASCANU R, GULCEHRE C, CHO K, et al. How to construct deep recurrent neural networks[J]. arXiv:1312.6026, 2014.
[80] HERMANS M, SCHRAUWEN B. Training and analyzing deep recurrent neural networks[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2013:190-198.
[81] LE Q V, NGIAM J, COATES A, et al. On optimization methods for deep learning[C]//Proceedings of the 28th International Conference on International Conference on Machine Learning. Bellevue, Washington, USA, 2011:265-272.
[82] RUDER S. An overview of gradient descent optimization algorithms[J]. arXiv:1609.04747, 2016.
[83] YOUSOFF S N M, BAHARIN A, ABDULLAH A. A review on optimization algorithm for deep learning method in bioinformatics field[C]//Proceedings of 2016 IEEE EMBS Conference on Biomedical Engineering and Sciences. Kuala Lumpur, Malaysia, 2016:707-711.
[84] QIAN Ning. On the momentum term in gradient descent learning algorithms[J]. Neural networks, 1999, 12(1):145-151.
[85] SUTSKEVER I, MARTENS J, DAHL G, et al. On the importance of initialization and momentum in deep learning[C]//Proceedings of the 30th International Conference on International Conference on Machine Learning. Atlanta, USA, 2013:1139?1147.
[86] DUCHI J, HAZAN E, Singer A Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. The journal of machine learning research, 2011, 12:2121-2159.
[87] TIELEMAN T, HINTON G E. Lecture 6.5-rmsprop:Divide the gradient by a running average of its recent magnitude[C]//COURSERA:Neural Networks for Machine Learning. 2012.
[88] ZEILER M D. ADADELTA:an adaptive learning rate method[J]. arXiv:1212.5701, 2012.
[89] KINGMA D P, BA J. Adam:a method for stochastic optimization[J]. arXiv:1412.6980, 2014.
[90] FLETCHER R. Practical methods of optimization[M]. New York:John Wiley and Sons, 2013:110?133.
[91] NOCEDAL J. Updating quasi-Newton matrices with limited storage[J]. Mathematics of computation, 1980, 35(151):773-782.
[92] MARTENS J. Deep learning via Hessian-free optimization[C]//Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa, Israel, 2010:735-742.
[93] KIROS R. Training neural networks with stochastic hessian-free optimization[J]. arXiv:1301.3641, 2013.
[94] ERHAN D, BENGIO Y, COURVILLE A, et al. Why does unsupervised pre-training help deep learning?[J]. The journal of machine learning research, 2010, 11:625-660.
[95] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Sardinia, Italy, 2010, 9:249-256.
[96] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Delving deep into rectifiers:surpassing human-level performance on ImageNet classification[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile, 2015:1026-1034.
[97] XU Bing, WANG Naiyan, CHEN Tianqi, et al. Empirical evaluation of rectified activations in convolutional network[J]. arXiv:1505.00853, 2015
[98] GULCEHRE C, MOCZULSKI M, DENIL M, et al. Noisy activation functions[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York, USA, 2016:3059-3068.
[99] LECUN Y, BOTTOU L, ORR G B, et al. Efficient BackProp[M]//ORR G B, MÜLLER K R. Neural Networks:Tricks of the Trade. Berlin, Heidelberg:Springer, 1998:9-50.
[100] AMARI S I. Natural gradient works efficiently in learning[J]. Neural computation, 1998, 10(2):251-276.
[101] LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural computation, 1989, 1(4):541-551.
[102] NAIR V, HINTON G E. Rectified linear units improve restricted Boltzmann machines[C]//Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa, Israel, 2010:807-814.
[103] CLEVERT D A, UNTERTHINER T, HOCHREITER S. Fast and accurate deep network learning by exponential linear units (ELUs)[J]. arXiv:1511.07289, 2016.
[104] LI Yang, FAN Chunxiao, LI Yong, et al. Improving deep neural network with multiple parametric exponential linear units[J]. Neurocomputing, 2018, 301:11-24.
[105] GOODFELLOW I J, WARDE-FARLEY D, MIRZA M, et al. Maxout networks[C]//Proceedings of the 30thInternational Conference on Machine Learning. Atlanta, USA, 2013:1319-1327.
[106] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving neural networks by preventing co-adaptation of feature detectors[J]. arXiv:1207.0580, 2012.
[107] BOUTHILLIER X, KONDA K, VINCENT P, et al. Dropout as data augmentation[J]. arXiv:1506.08700, 2016.
[108] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout:a simple way to prevent neural networks from overfitting[J]. The journal of machine learning research, 2014, 15(1):1929-1958.
[109] WAN Li, ZEILER M, ZHANG Sixin, et al. Regularization of neural networks using DropConnect[C]//Proceedings of the 30th International Conference on Machine Learning. Atlanta, USA, 2013:1058-1066.
[110] BA L J, FREY B. Adaptive dropout for training deep neural networks[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2013:3084-3092.
[111] IOFFE S, SZEGEDY C. Batch normalization:accelerating deep network training by reducing internal covariate shift[J]. arXiv:1502.03167, 2015.
[112] DANIELY A, LINIAL N, SHALEV-SHWARTZ S. From average case complexity to improper learning complexity[C]//Proceedings of the 46th Annual ACM Symposium on Theory of Computing. New York, USA, 2014:441-448.
[113] DANIELY A, SHALEV-SHWARTZ S. Complexity theoretic limitations on learning DNF’s[J]//JMLR:Workshop and Conference Proceedings. 2016:1-16.
[114] DANIELY A. Complexity theoretic limitations on learning halfspaces[C]//Proceedings of the 48th Annual ACM Symposium on Theory of Computing. Cambridge, USA, 2016:105-117.
[115] GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[J]. arXiv:1412.6572, 2015.
[116] ANTHONY M, BARTLETT P L. Neural network learning:theoretical foundations[M]. New York:Cambridge University Press, 2009:286?295.
[117] BARTLETT P L. The sample complexity of pattern classification with neural networks:the size of the weights is more important than the size of the network[J]. IEEE transactions on information theory, 1998, 44(2):525-536.
[118] BAUM E B, HAUSSLER D. What size net gives valid generalization?[J]. Neural computation, 1989, 1(1):151-160.
[119] HARDT M, RECHT B, SINGER Y. Train faster, generalize better:Stability of stochastic gradient descent[J]. arXiv:1509.01240, 2015.
[120] NEYSHABUR B, TOMIOKA R, SREBRO N. Norm-based capacity control in neural networks[C]//Proceedings of the 28th Conference on Learning Theory. Paris, France. 2015, 40:1-26.
[121] PRATT L Y. Discriminability-based transfer between neural networks[C]//Proceedings of the 5th International Conference on Neural Information Processing Systems. Denver, USA, 1992:204-211.
[122] HORNIK K, STINCHCOMBE M, WHITE H. Multilayer feedforward networks are universal approximators[J]. Neural networks, 1989, 2(5):359-366.
[123] BARRON A R. Universal approximation bounds for superpositions of a sigmoidal function[J]. IEEE transactions on information theory, 1993, 39(3):930-945.
[124] DELALLEAU O, BENGIO Y. Shallow vs. deep sum-product networks[C]//Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada, Spain, 2011:666-674.
[125] BIANCHINI M, SCARSELLI F. On the complexity of neural network classifiers:a comparison between shallow and deep architectures[J]. IEEE transactions on neural networks and learning systems, 2014, 25(8):1553-1565.
[126] ELDAN R, SHAMIR O. The power of depth for feedforward neural networks[C]//JMLR:Workshop and Conference Proceedings. 2016:1-34.
[127] ANDONI A, PANIGRAHY R, VALIANT G, et al. Learning polynomials with neural networks[C]//Proceedings of the 31st International Conference on Machine Learning. Beijing, China, 2014:1908-1916.
[128] ARORA S, BHASKARA A, GE Rong, et al. Provable Bounds for Learning Some Deep Representations[C]//Proceedings of the 31st International Conference on Machine Learning. Beijing, China, 2014:584-592.
[129] BRUNA J, MALLAT S. Invariant scattering convolution networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(8):1872-886.
[130] CHOROMANSKA A, HENAFF M, MATHIEU M, et al. The loss surfaces of multilayer networks[C]//Proceedings of the 18th International Conference on Artificial Intelligence and Statistics. San Diego, USA, 2015, 38:192-204.
[131] GIRYES R, SAPIRO G, BRONSTEIN A M. Deep neural networks with random Gaussian weights:a universal classification strategy?[J]. IEEE transactions on signal processing, 2016, 64(13):3444-3457.
[132] LIVNI R, SHALEV-SHWARTZ S, SHAMIR O. On the computational efficiency of training neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014:855-863.
[133] NEYSHABUR B, SALAKHUTDINOV R, SREBRO N. Path-SGD:path-normalized optimization in deep neural networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada, 2015:2422-2430.
[134] SAFRAN I, SHAMIR O. On the quality of the initial basin in overspecified neural networks[C]//Proceedings of the 33rd International Conference on Machine Learning. New York, USA, 2016:774?782.
[135] SEDGHI H, ANANDKUMAR A. Provable methods for training neural networks with sparse connectivity[J]. arXiv:1412.2693, 2015.
[136] DANIELY A, FROSTIG R, SINGER Y. Toward deeper understanding of neural networks:the power of initialization and a dual view on expressivity[C]//Proceedings of the 30th Conference on Neural Information Processing Systems 29. Barcelona, Spain, 2016:2253-2261.

相似文献/References:

[1]叶志飞,文益民,吕宝粮.不平衡分类问题研究综述[J].智能系统学报,2009,4(02):148.
 YE Zhi-fei,WEN Yi-min,LU Bao-liang.A survey of imbalanced pattern classification problems[J].CAAI Transactions on Intelligent Systems,2009,4(01):148.
[2]刘奕群,张 敏,马少平.基于非内容信息的网络关键资源有效定位[J].智能系统学报,2007,2(01):45.
 LIU Yi-qun,ZHANG Min,MA Shao-ping.Web key resource page selection based on non-content inf o rmation[J].CAAI Transactions on Intelligent Systems,2007,2(01):45.
[3]马世龙,眭跃飞,许 可.优先归纳逻辑程序的极限行为[J].智能系统学报,2007,2(04):9.
 MA Shi-long,SUI Yue-fei,XU Ke.Limit behavior of prioritized inductive logic programs[J].CAAI Transactions on Intelligent Systems,2007,2(01):9.
[4]姚伏天,钱沄涛.高斯过程及其在高光谱图像分类中的应用[J].智能系统学报,2011,6(05):396.
 YAO Futian,QIAN Yuntao.Gaussian process and its applications in hyperspectral image classification[J].CAAI Transactions on Intelligent Systems,2011,6(01):396.
[5]文益民,强保华,范志刚.概念漂移数据流分类研究综述[J].智能系统学报,2013,8(02):95.[doi:10.3969/j.issn.1673-4785.201208012]
 WEN Yimin,QIANG Baohua,FAN Zhigang.A survey of the classification of data streams with concept drift[J].CAAI Transactions on Intelligent Systems,2013,8(01):95.[doi:10.3969/j.issn.1673-4785.201208012]
[6]杨成东,邓廷权.综合属性选择和删除的属性约简方法[J].智能系统学报,2013,8(02):183.[doi:10.3969/j.issn.1673-4785.201209056]
 YANG Chengdong,DENG Tingquan.An approach to attribute reduction combining attribute selection and deletion[J].CAAI Transactions on Intelligent Systems,2013,8(01):183.[doi:10.3969/j.issn.1673-4785.201209056]
[7]胡小生,钟勇.基于加权聚类质心的SVM不平衡分类方法[J].智能系统学报,2013,8(03):261.
 HU Xiaosheng,ZHONG Yong.Support vector machine imbalanced data classification based on weighted clustering centroid[J].CAAI Transactions on Intelligent Systems,2013,8(01):261.
[8]张媛媛,霍静,杨婉琪,等.深度信念网络的二代身份证异构人脸核实算法[J].智能系统学报,2015,10(02):193.[doi:10.3969/j.issn.1673-4785.201405060]
 ZHANG Yuanyuan,HUO Jing,YANG Wanqi,et al.A deep belief network-based heterogeneous face verification method for the second-generation identity card[J].CAAI Transactions on Intelligent Systems,2015,10(01):193.[doi:10.3969/j.issn.1673-4785.201405060]
[9]孔庆超,毛文吉,张育浩.社交网站中用户评论行为预测[J].智能系统学报,2015,10(03):349.[doi:10.3969/j.issn.1673-4785.201403019]
 KONG Qingchao,MAO Wenji,ZHANG Yuhao.User comment behavior prediction in social networking sites[J].CAAI Transactions on Intelligent Systems,2015,10(01):349.[doi:10.3969/j.issn.1673-4785.201403019]
[10]姚霖,刘轶,李鑫鑫,等.词边界字向量的中文命名实体识别[J].智能系统学报,2016,11(1):37.[doi:10.11992/tis.201507065]
 YAO Lin,LIU Yi,LI Xinxin,et al.Chinese named entity recognition via word boundarybased character embedding[J].CAAI Transactions on Intelligent Systems,2016,11(01):37.[doi:10.11992/tis.201507065]
[11]丁科,谭营.GPU通用计算及其在计算智能领域的应用[J].智能系统学报,2015,10(01):1.[doi:10.3969/j.issn.1673-4785.201403072]
 DING Ke,TAN Ying.A review on general purpose computing on GPUs and its applications in computational intelligence[J].CAAI Transactions on Intelligent Systems,2015,10(01):1.[doi:10.3969/j.issn.1673-4785.201403072]
[12]马晓,张番栋,封举富.基于深度学习特征的稀疏表示的人脸识别方法[J].智能系统学报,2016,11(3):279.[doi:10.11992/tis.201603026]
 MA Xiao,ZHANG Fandong,FENG Jufu.Sparse representation via deep learning features based face recognition method[J].CAAI Transactions on Intelligent Systems,2016,11(01):279.[doi:10.11992/tis.201603026]
[13]马世龙,乌尼日其其格,李小平.大数据与深度学习综述[J].智能系统学报,2016,11(6):728.[doi:10.11992/tis.201611021]
 MA Shilong,WUNIRI Qiqige,LI Xiaoping.Deep learning with big data: state of the art and development[J].CAAI Transactions on Intelligent Systems,2016,11(01):728.[doi:10.11992/tis.201611021]
[14]杨文元.多标记学习自编码网络无监督维数约简[J].智能系统学报,2018,13(05):808.[doi:10.11992/tis.201804051]
 YANG Wenyuan.Unsupervised dimensionality reduction of multi-label learning via autoencoder networks[J].CAAI Transactions on Intelligent Systems,2018,13(01):808.[doi:10.11992/tis.201804051]
[15]刘彪,黄蓉蓉,林和,等.基于卷积神经网络的盲文音乐识别研究[J].智能系统学报,2019,14(01):186.[doi:10.11992/tis.201805002]
 LIU Biao,HUANG Rongrong,LIN He,et al.Research on braille music recognition based on convolutional neural networks[J].CAAI Transactions on Intelligent Systems,2019,14(01):186.[doi:10.11992/tis.201805002]
[16]杨正理,史文,陈海霞,等.大数据背景下高校招生策略预测[J].智能系统学报,2019,14(02):323.[doi:10.11992/tis.201709011]
 YANG Zhengli,SHI Wen,CHEN Haixia,et al.The strategy of college enrollment predicted with big data[J].CAAI Transactions on Intelligent Systems,2019,14(01):323.[doi:10.11992/tis.201709011]
[17]洪雁飞,魏本征,刘川,等.基于深度学习的椎间孔狭窄自动多分级研究[J].智能系统学报,2019,14(04):708.[doi:10.11992/tis.201806015]
 HONG Yanfei,WEI Benzheng,LIU Chuan,et al.Deep learning based automatic multi-classification algorithm for intervertebral foraminal stenosis[J].CAAI Transactions on Intelligent Systems,2019,14(01):708.[doi:10.11992/tis.201806015]
[18]刘冰,李瑞麟,封举富.深度度量学习综述[J].智能系统学报,2019,14(06):1064.[doi:10.11992/tis.201906045]
 LIU Bing,LI Ruilin,FENG Jufu.A brief introduction to deep metric learning[J].CAAI Transactions on Intelligent Systems,2019,14(01):1064.[doi:10.11992/tis.201906045]

备注/Memo

备注/Memo:
收稿日期:2018-08-24。
基金项目:国家自然科学基金项目(61721003).
作者简介:胡越,男,1994年生,高级工程师,硕士研究生,主要研究方向为计算广告与数据挖掘;罗东阳,男,1992年生,博士研究生,主要研究方向为生物信息学与数据挖掘;花奎,男,1991年生,博士研究生,主要研究方向为生物信息学与机器学习。
通讯作者:张学工.E-mail:zhangxg@tsinghua.edu.cn
更新日期/Last Update: 1900-01-01