[1]HU Yue,LUO Dongyang,HUA Kui,et al.Overview on deep learning[J].CAAI Transactions on Intelligent Systems,2019,14(1):1-19.[doi:10.11992/tis.201808019]
Copy

Overview on deep learning

References:
[1] MCMAHAN H B, HOLT G, SCULLEY D, et al. Ad click prediction:a view from the trenches[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago, USA, 2013:1222-1230.
[2] GRAEPEL T, CANDELA J Q, BORCHERT T, et al. Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s bing search engine[C]//Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa, Israel, 2010:13-20.
[3] HE Xinran, PAN Junfeng, JIN Ou, et al. Practical lessons from predicting clicks on ads at Facebook[C]//Proceedings of the 8th International Workshop on Data Mining for Online Advertising. New York, USA, 2014:1-9.
[4] CHEN Tianqi, HE Tong. Higgs boson discovery with boosted trees[C]//Proceedings of the 2014 International Conference on High-Energy Physics and Machine Learning. Montreal, Canada, 2014:69-80.
[5] GOLUB T R, SLONIM D K, TAMAYO P, et al. Molecular classification of cancer:class discovery and class prediction by gene expression monitoring[J]. Science, 1999, 286(5439):531-537.
[6] POON T C W, CHAN A T C, ZEE B, et al. Application of classification tree and neural network algorithms to the identification of serological liver marker profiles for the diagnosis of hepatocellular carcinoma[J]. Oncology, 2001, 61(4):275-283.
[7] AGARWAL D. Computational advertising:the linkedin way[C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. San Francisco, USA, 2013:1585-1586.
[8] MCCULLOCH W S, PITTS W. A logical calculus of the ideas immanent in nervous activity[J]. Bulletin of mathematical biology, 1990, 52(1/2):99-115.
[9] HEBB D O. The organization of behavior:a neuropsychological theory[M]. New York:John Wiley and Sons, 1949:12-55.
[10] ROSENBLATT F. The perceptron?a perceiving and recognizing automaton[R]. Ithaca, NY:Cornell Aeronautical Laboratory, 1957.
[11] MINSKY M L, PAPERT S A. Perceptrons:an introduction to computational geometry[M]. Cambridge:MIT Press, 1969:227?246.
[12] HAUGELAND J. Artificial intelligence:the very idea[M]. Cambridge:MIT Press, 1989:3-11.
[13] MCCORDUCK P. Machines who think:a personal inquiry into the history and prospects of artificial intelligence[M]. 2nd ed. Natick:A. K. Peters/CRC Press, 2004:2?12.
[14] HOPFIELD J J. Neural networks and physical systems with emergent collective computational abilities[J]. Proceedings of the national academy of sciences of the United States of America, 1982, 79(8):2554-2558.
[15] LE CUN Y. Learning process in an asymmetric threshold network[M]//BIENENSTOCK E, SOULIé F, WEISBUCH G. Disordered Systems and Biological Organization. Berlin, Heidelberg:Springer, 1986.
[16] RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors[J]. Nature, 1986, 323(6088):533-536.
[17] PARKER D B. Learning-logic[R]. Technical Report TR-47. Cambridge, MA:Center for Computational Research in Economics and Management Science, Massachusetts Institute of Technology, 1985.
[18] RUMELHART D E, MCCLELLAND J L. Readings in congnitive science[M]. San Francisco:Margan Kaufmann, 1988:399?421.
[19] KOHONEN T. Self-organization and associative memory[M]. 3rd ed. Berlin Heidelberg:Springer-Verlag, 1989:119?155.
[20] SMOLENSKY P. Information processing in dynamical systems:foundations of harmony theory[M]//RUMELHART D E, MCCLELLAND J L. Parallel Distributed Processing, Vol. 1. Cambridge:MIT Press, 1986:194-281.
[21] CORTES C, VAPNIK V. Support-vector networks[J]. Machine learning, 1995, 20(3):273-297.
[22] BOSER B E, GUYON I M, VAPNIK V N. A training algorithm for optimal margin classifiers[C]//Proceedings of the 5th Annual Workshop on Computational Learning Theory. Pittsburgh, Pennsylvania, USA, 1992:144-152.
[23] 张学工. 关于统计学习理论与支持向量机[J]. 自动化学报, 2000, 26(1):32-42
ZHANG Xuegong. Introduction to statistical learning theory and support vector machines[J]. Acta automatica sinica, 2000, 26(1):32-42
[24] HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507.
[25] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6):84-90.
[26] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587):484-489.
[27] TANG Yichuan. Deep learning using linear support vector machines[J]. arXiv:1306.0239, 2015.
[28] BOURLARD H, KAMP Y. Auto-association by multilayer perceptrons and singular value decomposition[J]. Biological cybernetics, 1988, 59(4/5):291-294.
[29] HINTON G E, ZEMEL R S. Autoencoders, minimum description length and Helmholtz free energy[C]//Proceedings of the 6th International Conference on Neural Information Processing Systems. Denver, Colorado, USA, 1993:3-10.
[30] SCHWENK H, MILGRAM M. Transformation invariant autoassociation with application to handwritten character recognition[C]//Proceedings of the 7th International Conference on Neural Information Processing Systems. Denver, Colorado, USA, 1994:991-998.
[31] HINTON G E, MCCLELLAND J L. Learning representations by recirculation[C]//Proceedings of 1987 International Conference on Neural Information Processing Systems. Denver, USA, 1987:3
[32] SCH?LKOPF B, PLATT J, HOFMANN T. Efficient learning of sparse representations with an energy-based model[C]//Proceedings of 2006 Conference Advances in Neural Information Processing Systems. Vancouver, Canada, 2007:1137-1144.
[33] RANZATO M, HUANG Fujie, BOUREAU Y L, et al. Unsupervised learning of invariant feature hierarchies with applications to object recognition[C]//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA, 2007:1-8.
[34] VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[C]//Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland, 2008:1096-1103.
[35] RIFAI S, VINCENT P, MULLER X, et al. Contractive auto-encoders:explicit invariance during feature extraction[C]//Proceedings of the 28th International Conference on Machine Learning. Bellevue, Washington, USA, 2011:833?840.
[36] BENGIO Y, LAMBLIN P, POPOVICI D, et al. Greedy layer-wise training of deep networks[C]//Proceedings of the 19th International Conference on Neural Information Processing Systems. Vancouver, Canada, 2006:153-160.
[37] SALAKHUTDINOV R, MNIH A, HINTON G. Restricted Boltzmann machines for collaborative filtering[C]//Proceedings of the 24th International Conference on Machine Learning. Corvalis, Oregon, USA, 2007:791-798.
[38] HINTON G E. A practical guide to training restricted Boltzmann machines[M]//MONTAVON G, ORR G B, MüLLER K R. Neural Networks:Tricks of the Trade. 2nd ed. Berlin, Heidelberg:Springer, 2012:599-619.
[39] LECUN Y, CHOPRA S, HADSELL R, et al. A tutorial on energy-based learning[M]//BAKIR G, HOFMANN T, SCH?LKOPF B, et al. Predicting Structured Data. Cambridge:MIT Press, 2006:45?49.
[40] LEE H, EKANADHAM C, NG A Y. Sparse deep belief net model for visual area V2[C]//Proceedings of the 20th International Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada, 2007:873-880.
[41] HINTON G E. Deep belief networks[J]. Scholarpedia, 2009, 4(5):5947.
[42] HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural computation, 2006, 18(7):1527-1554.
[43] LE CUN Y, BOSER B, DENKER J S, et al. Handwritten digit recognition with a back-propagation network[C]//Proceedings of the 2nd International Conference on Neural Information Processing Systems. Denver, USA, 1989:396-404.
[44] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324.
[45] HUBEL D H, WIESEL T N. Receptive fields and functional architecture of monkey striate cortex[J]. The journal of physiology, 1968, 195(1):215-243.
[46] SPRINGENBERG J T, DOSOVITSKIY A, BROX T, et al. Striving for simplicity:the all convolutional net[J]. arXiv:1412.6806, 2014.
[47] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409.1556, 2014.
[48] SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015:1-9.
[49] LIN Min, CHEN Qiang, YAN Shuicheng. Network in network[J]. arXiv:1312.4400, 2013.
[50] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, 2016:770-778.
[51] ZHANG Xiang, ZHAO Junbo, LECUN Y. Character-level convolutional networks for text classification[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada, 2015:649-657.
[52] GEHRING J, AULI M, GRANGIER D, et al. Convolutional sequence to sequence learning[J]. arXiv:1705.03122, 2017.
[53] PHAM N Q, KRUSZEWSKI G, BOLEDA G. Convolutional neural network language models[C]//Proceedings of 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas, USA, 2016:1153-1162.
[54] HARRIS D M, HARRIS S J. Digital design and computer architecture[M]. 2nd ed. San Francisco:Morgan Kaufmann Publishers Inc., 2013:123?125.
[55] MIKOLOV T, CHEN Kai, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv:1301.3781, 2013.
[56] GOLDBERG Y, LEVY O. word2vec Explained:deriving Mikolov et al.’s negative-sampling word-embedding method[J]. Arxiv:1402.3722, 2014.
[57] TANG D, QIN B, LIU T. Document modeling with gated recurrent neural network for sentiment classification[C]//Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal, 2015:1422-1432.
[58] SUTSKEVER I, MARTENS J, HINTON G. Generating text with recurrent neural networks[C]//Proceedings of the 28th International Conference on Machine Learning. Bellevue, Washington, USA, 2011:1017-1024.
[59] GRAVES A. Generating sequences with recurrent neural networks[J]. arXiv:1308.0850, 2013.
[60] WERBOS P J. Generalization of backpropagation with application to a recurrent gas market model[J]. Neural networks, 1988, 1(4):339-356.
[61] PASCANU R, MIKOLOV T, BENGIO Y. On the difficulty of training recurrent neural networks[C]//Proceedings of the 30th International Conference on Machine Learning. Atlanta, Georgia, USA, 2013:1310?1318.
[62] BENGIO Y, SIMARD P, FRASCONI P. Learning long-term dependencies with gradient descent is difficult[J]. IEEE transactions on neural networks, 1994, 5(2):157-166.
[63] HOCHREITER S, BENGIO Y, FRASCONI P. Gradient flow in recurrent nets:the difficulty of learning long-term dependencies[M]//KOLEN J F, KREMER S C. A Field Guide to Dynamical Recurrent Networks. New York:Wiley-IEEE Press, 2001:6?8.
[64] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8):1735-1780.
[65] GERS F A, SCHMIDHUBER J, CUMMINS F. Learning to forget:continual prediction with LSTM[J]. Neural computation, 2000, 12(10):2451-2471.
[66] GERS F A, SCHRAUDOLPH N N, SCHMIDHUBER J. Learning precise timing with LSTM recurrent networks[J]. The journal of machine learning research, 2003, 3:115-143.
[67] GERS F A, SCHMIDHUBER J. Recurrent nets that time and count[C]//Proceedings of 2000 IEEE-INNS-ENNS International Joint Conference on Neural Networks. Como, Italy, 2000:3189.
[68] GREFF K, SRIVASTAVA R K, KOUTNIK J, et al. LSTM:a search space odyssey[J]. IEEE transactions on neural networks and learning systems, 2017, 28(10):2222-2232.
[69] CHO K, VAN MERRIENBOER B, GULCEHRE C, ET AL. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv:1406.1078, 2014.
[70] JOZEFOWICZ R, ZAREMBA W, SUTSKEVER I. An empirical exploration of recurrent network architectures[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille, France, 2015:2342-2350.
[71] LE Q V, JAITLY N, HINTON G E. A simple way to initialize recurrent networks of rectified linear units[J]. arXiv:1504.00941, 2015.
[72] WU Yonghui, SCHUSTER M, CHEN Zhifeng, et al. Google’s neural machine translation system:bridging the gap between human and machine translation[J]. arXiv:1609.08144, 2016.
[73] YIN Jun, JIANG Xin, LU Zhengdong, et al. Neural generative question answering[J]. arXiv:1512.01337, 2016.
[74] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014:3104-3112.
[75] MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014:2204-2212.
[76] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv:1409.0473, 2016.
[77] ANDRYCHOWICZ M, KURACH K. Learning efficient algorithms with hierarchical attentive memory[J]. arXiv:1602.03218, 2016.
[78] SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE transactions on signal processing, 1997, 45(11):2673-2681.
[79] PASCANU R, GULCEHRE C, CHO K, et al. How to construct deep recurrent neural networks[J]. arXiv:1312.6026, 2014.
[80] HERMANS M, SCHRAUWEN B. Training and analyzing deep recurrent neural networks[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2013:190-198.
[81] LE Q V, NGIAM J, COATES A, et al. On optimization methods for deep learning[C]//Proceedings of the 28th International Conference on International Conference on Machine Learning. Bellevue, Washington, USA, 2011:265-272.
[82] RUDER S. An overview of gradient descent optimization algorithms[J]. arXiv:1609.04747, 2016.
[83] YOUSOFF S N M, BAHARIN A, ABDULLAH A. A review on optimization algorithm for deep learning method in bioinformatics field[C]//Proceedings of 2016 IEEE EMBS Conference on Biomedical Engineering and Sciences. Kuala Lumpur, Malaysia, 2016:707-711.
[84] QIAN Ning. On the momentum term in gradient descent learning algorithms[J]. Neural networks, 1999, 12(1):145-151.
[85] SUTSKEVER I, MARTENS J, DAHL G, et al. On the importance of initialization and momentum in deep learning[C]//Proceedings of the 30th International Conference on International Conference on Machine Learning. Atlanta, USA, 2013:1139?1147.
[86] DUCHI J, HAZAN E, Singer A Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. The journal of machine learning research, 2011, 12:2121-2159.
[87] TIELEMAN T, HINTON G E. Lecture 6.5-rmsprop:Divide the gradient by a running average of its recent magnitude[C]//COURSERA:Neural Networks for Machine Learning. 2012.
[88] ZEILER M D. ADADELTA:an adaptive learning rate method[J]. arXiv:1212.5701, 2012.
[89] KINGMA D P, BA J. Adam:a method for stochastic optimization[J]. arXiv:1412.6980, 2014.
[90] FLETCHER R. Practical methods of optimization[M]. New York:John Wiley and Sons, 2013:110?133.
[91] NOCEDAL J. Updating quasi-Newton matrices with limited storage[J]. Mathematics of computation, 1980, 35(151):773-782.
[92] MARTENS J. Deep learning via Hessian-free optimization[C]//Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa, Israel, 2010:735-742.
[93] KIROS R. Training neural networks with stochastic hessian-free optimization[J]. arXiv:1301.3641, 2013.
[94] ERHAN D, BENGIO Y, COURVILLE A, et al. Why does unsupervised pre-training help deep learning?[J]. The journal of machine learning research, 2010, 11:625-660.
[95] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Sardinia, Italy, 2010, 9:249-256.
[96] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Delving deep into rectifiers:surpassing human-level performance on ImageNet classification[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile, 2015:1026-1034.
[97] XU Bing, WANG Naiyan, CHEN Tianqi, et al. Empirical evaluation of rectified activations in convolutional network[J]. arXiv:1505.00853, 2015
[98] GULCEHRE C, MOCZULSKI M, DENIL M, et al. Noisy activation functions[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York, USA, 2016:3059-3068.
[99] LECUN Y, BOTTOU L, ORR G B, et al. Efficient BackProp[M]//ORR G B, MüLLER K R. Neural Networks:Tricks of the Trade. Berlin, Heidelberg:Springer, 1998:9-50.
[100] AMARI S I. Natural gradient works efficiently in learning[J]. Neural computation, 1998, 10(2):251-276.
[101] LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural computation, 1989, 1(4):541-551.
[102] NAIR V, HINTON G E. Rectified linear units improve restricted Boltzmann machines[C]//Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa, Israel, 2010:807-814.
[103] CLEVERT D A, UNTERTHINER T, HOCHREITER S. Fast and accurate deep network learning by exponential linear units (ELUs)[J]. arXiv:1511.07289, 2016.
[104] LI Yang, FAN Chunxiao, LI Yong, et al. Improving deep neural network with multiple parametric exponential linear units[J]. Neurocomputing, 2018, 301:11-24.
[105] GOODFELLOW I J, WARDE-FARLEY D, MIRZA M, et al. Maxout networks[C]//Proceedings of the 30thInternational Conference on Machine Learning. Atlanta, USA, 2013:1319-1327.
[106] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving neural networks by preventing co-adaptation of feature detectors[J]. arXiv:1207.0580, 2012.
[107] BOUTHILLIER X, KONDA K, VINCENT P, et al. Dropout as data augmentation[J]. arXiv:1506.08700, 2016.
[108] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout:a simple way to prevent neural networks from overfitting[J]. The journal of machine learning research, 2014, 15(1):1929-1958.
[109] WAN Li, ZEILER M, ZHANG Sixin, et al. Regularization of neural networks using DropConnect[C]//Proceedings of the 30th International Conference on Machine Learning. Atlanta, USA, 2013:1058-1066.
[110] BA L J, FREY B. Adaptive dropout for training deep neural networks[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2013:3084-3092.
[111] IOFFE S, SZEGEDY C. Batch normalization:accelerating deep network training by reducing internal covariate shift[J]. arXiv:1502.03167, 2015.
[112] DANIELY A, LINIAL N, SHALEV-SHWARTZ S. From average case complexity to improper learning complexity[C]//Proceedings of the 46th Annual ACM Symposium on Theory of Computing. New York, USA, 2014:441-448.
[113] DANIELY A, SHALEV-SHWARTZ S. Complexity theoretic limitations on learning DNF’s[J]//JMLR:Workshop and Conference Proceedings. 2016:1-16.
[114] DANIELY A. Complexity theoretic limitations on learning halfspaces[C]//Proceedings of the 48th Annual ACM Symposium on Theory of Computing. Cambridge, USA, 2016:105-117.
[115] GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[J]. arXiv:1412.6572, 2015.
[116] ANTHONY M, BARTLETT P L. Neural network learning:theoretical foundations[M]. New York:Cambridge University Press, 2009:286?295.
[117] BARTLETT P L. The sample complexity of pattern classification with neural networks:the size of the weights is more important than the size of the network[J]. IEEE transactions on information theory, 1998, 44(2):525-536.
[118] BAUM E B, HAUSSLER D. What size net gives valid generalization?[J]. Neural computation, 1989, 1(1):151-160.
[119] HARDT M, RECHT B, SINGER Y. Train faster, generalize better:Stability of stochastic gradient descent[J]. arXiv:1509.01240, 2015.
[120] NEYSHABUR B, TOMIOKA R, SREBRO N. Norm-based capacity control in neural networks[C]//Proceedings of the 28th Conference on Learning Theory. Paris, France. 2015, 40:1-26.
[121] PRATT L Y. Discriminability-based transfer between neural networks[C]//Proceedings of the 5th International Conference on Neural Information Processing Systems. Denver, USA, 1992:204-211.
[122] HORNIK K, STINCHCOMBE M, WHITE H. Multilayer feedforward networks are universal approximators[J]. Neural networks, 1989, 2(5):359-366.
[123] BARRON A R. Universal approximation bounds for superpositions of a sigmoidal function[J]. IEEE transactions on information theory, 1993, 39(3):930-945.
[124] DELALLEAU O, BENGIO Y. Shallow vs. deep sum-product networks[C]//Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada, Spain, 2011:666-674.
[125] BIANCHINI M, SCARSELLI F. On the complexity of neural network classifiers:a comparison between shallow and deep architectures[J]. IEEE transactions on neural networks and learning systems, 2014, 25(8):1553-1565.
[126] ELDAN R, SHAMIR O. The power of depth for feedforward neural networks[C]//JMLR:Workshop and Conference Proceedings. 2016:1-34.
[127] ANDONI A, PANIGRAHY R, VALIANT G, et al. Learning polynomials with neural networks[C]//Proceedings of the 31st International Conference on Machine Learning. Beijing, China, 2014:1908-1916.
[128] ARORA S, BHASKARA A, GE Rong, et al. Provable Bounds for Learning Some Deep Representations[C]//Proceedings of the 31st International Conference on Machine Learning. Beijing, China, 2014:584-592.
[129] BRUNA J, MALLAT S. Invariant scattering convolution networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(8):1872-886.
[130] CHOROMANSKA A, HENAFF M, MATHIEU M, et al. The loss surfaces of multilayer networks[C]//Proceedings of the 18th International Conference on Artificial Intelligence and Statistics. San Diego, USA, 2015, 38:192-204.
[131] GIRYES R, SAPIRO G, BRONSTEIN A M. Deep neural networks with random Gaussian weights:a universal classification strategy?[J]. IEEE transactions on signal processing, 2016, 64(13):3444-3457.
[132] LIVNI R, SHALEV-SHWARTZ S, SHAMIR O. On the computational efficiency of training neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014:855-863.
[133] NEYSHABUR B, SALAKHUTDINOV R, SREBRO N. Path-SGD:path-normalized optimization in deep neural networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada, 2015:2422-2430.
[134] SAFRAN I, SHAMIR O. On the quality of the initial basin in overspecified neural networks[C]//Proceedings of the 33rd International Conference on Machine Learning. New York, USA, 2016:774?782.
[135] SEDGHI H, ANANDKUMAR A. Provable methods for training neural networks with sparse connectivity[J]. arXiv:1412.2693, 2015.
[136] DANIELY A, FROSTIG R, SINGER Y. Toward deeper understanding of neural networks:the power of initialization and a dual view on expressivity[C]//Proceedings of the 30th Conference on Neural Information Processing Systems 29. Barcelona, Spain, 2016:2253-2261.
Similar References:

Memo

-

Last Update: 1900-01-01

Copyright © CAAI Transactions on Intelligent Systems