深度学习讨论班 黄雷 2016-11-29
内容大纲 1. 深度学习介绍 神经网络的历史 深度学习的应用 2. 多层感知机 (multi-layer perceptron machine) 前向神经网络 (feedforward neural network) 3. 卷积神经网络 (Convolution neural networks) 4. 递归神经网络 (Recursive neural networks) 5. 利用神经网络的针对具体问题建模 如何设计损失函数 是否采用端到端学习 6. 训练神经网络的实用技巧 如何有效地训练 如何提高模型的泛化能力 深度学习知识层面 深度学习应用实践层面
课程的目标 1. 在知识层面 基本的术语 三类经典的神经网络 2. 应用实践层面 会针对具体的问题, 利用深度学习建模 编程实践, 基于 torch 平台 ( 深度学习平台 )
相关资料 深度学习课程 牛津大学,Nando de Freitas, https://www.cs.ox.ac.uk/people/nando.defreitas/ machinelearning/ Coursera, Geoffrey Hinton, Neural Network for Machine Learning 斯坦福大学,Fei-Fei Li,CS231
相关资料 领域顶级会议 ICLR (International Conference on Learning Representation ) ICML CVPR, ACL, IJCAI, AAAI.
相关资料 实验室 ftp: fileserver.nlsde.buaa.edu.cn/public/study/dee plearning/
Deep learning introduction Presented by Lei Huang November 29, 2016
Basic Concept Machine learning Neural network Deep network Outline History of neural network Perceptron BackPropagation Deep learning Application
Machine learning dataset D={X, Y} Goal Input: X Output: Y Learning: Y=F(X) or P(Y X) Automatically detect patterns in data Use the uncovered patters to predict future data Fitting and Generalization Y=F(X) P(Y X)
Types: view of data Machine learning Supervised Learning( 监督学习 ) D={X, Y} Learning: Y=F(X) or P(X,Y) Unsupervised Learning( 非监督学习 ) dataset D={X, X} Learning: X=F(g(X)), use G(x) as representation Types: view of models Non-parametric model ( 非参模型 ) Y=F(X; x 1, x 2 x n ) Parametric model( 参数化模型 ) Y=F(X; θ)
Neural network Neural network Y=F(X)=f T (f T 1 ( f 1 (X))) f i x = g(wx + b) Nonlinear activation sigmod Relu
Deep neural network Why deep? Powerful representation capacity( 函数表达能力 )
Key properties of Deep learning End to End learning ( 端到端学习 ) no distinction between feature extractor and classifier Deep architectures: cascade of simpler non-linear modules
Basic Concept Machine learning Neural network Deep network Outline History of neural network Perceptron BackPropagation Deep learning Application
The Perceptron 1957, Frank Rosenblatt, Perceptron( 感知机 ) Source: Intelligence artificielle Yann Le cun, 2015-2016
The Perceptron Source: Intelligence artificielle Yann Le cun, 2015-2016
The Perceptron Source: Intelligence artificielle Yann Le cun, 2015-2016
The Perceptron Source: Intelligence artificielle Yann Le cun, 2015-2016
The Perceptron Source: Intelligence artificielle Yann Le cun, 2015-2016
The Perceptron Source: Intelligence artificielle Yann Le cun, 2015-2016
The Perceptron Source: Intelligence artificielle Yann Le cun, 2015-2016
The Perceptron Source: Intelligence artificielle Yann Le cun, 2015-2016
The Perceptron Source: Intelligence artificielle Yann Le cun, 2015-2016
The Perceptron Source: Intelligence artificielle Yann Le cun, 2015-2016
AI Winter 1969, Minsky, perceptron XOR Two layers, computation impossible.
Conclusion for first phase Training by iteration Inference, calculate f(x) Compare the difference between the f(x) and y Adjust the weights. (gradient based) Source: Intelligence artificielle Yann Le cun, 2015-2016
Second: Back Propagation 1986, Backpropagation( 反向传播 ) Calculate gradient efficiently O(d^2) Routine of training Forward Back-propagation Update weights based on gradients
Beaten by SVM 1990s, Vapnik, support vector machines.( 支持向量机 ) y=f(wx+b) Globally optimization High efficiency, just one layer Loss function Function + loss Kernel trick for nonlinear
Third: deep learning 2006, Geoffery Hinton Deep Belief network Pre-training Fine-tuning
2011, audio Third: deep learning The task Hours of training data Deep neural network Gaussian Mixture Model GMM with more data Switchboard (Microsoft Research) English broadcast news (IBM) 309 18.5% 27.4% 18.6% (2000 hrs) 50 17.5% 18.8% Google voice search (android 4.1) 5,870 12.3% (and falling) 16.0% (>>5,870 hrs)
2012,imageNet. Third: deep learning
2012,imageNet. Third: deep learning
Why deep learning grow so fast? Big Data More Powerful and cheaper machine Open Source Code: git-hub Paper: arxiv Source: 程序员的深度学习入门指南费良宏, 2016
Basic Concept Machine learning Neural network Deep network Outline History of neural network Perceptron BackPropagation Deep learning Application
Object Classification Application
Object detection Application
Scene Parsing Application
Application Automatic Image Caption Generation
Application Artistic style learning on images
Application Automatically Adding Sounds To Silent Movies
Application Automatic Handwriting Generation
Application Automatic Text Generation Shakespeare Wikipedia articles (including the markup) Algebraic Geometry (with LaTeX markup) Linux Source Code http://karpathy.github.io/2015/05/21/rnn -effectiveness/
Alpha Go Application
Q&A