42 6 Vol. 42, No. 6 2016 6 ACTA AUTOMATICA SINICA June, 2016 ROI-KNN 1 1 1, 2.,.,, (Region of interest, ROI) K (K-nearest neighbors, KNN), ROI-KNN,,,,. DOI,,,,,. ROI-KNN., 2016, 42(6): 883 891 10.16383/j.aas.2016.c150638 Facial Expression Recognition Using ROI-KNN Deep Convolutional Neural Networks SUN Xiao 1 PAN Ting 1 REN Fu-Ji 1, 2 Abstract Deep neural networks have been proved to be able to mine distributed representation of data including image, speech and text. By building two models of deep convolutional neural networks and deep sparse rectifier neural networks on facial expression dataset, we make contrastive evaluations in facial expression recognition system with deep neural networks. Additionally, combining region of interest (ROI) and K-nearest neighbors (KNN), we propose a fast and simple improved method called ROI-KNN for facial expression classification, which relieves the poor generalization of deep neural networks due to lacking of data and decreases the testing error rate apparently and generally. The proposed method also improves the robustness of deep learning in facial expression classification. Key words Convolution neural networks, facial expression recognition, model generalization, prior knowledge Citation Sun Xiao, Pan Ting, Ren Fu-Ji. Facial expression recognition using ROI-KNN deep convolutional neural networks. Acta Automatica Sinica, 2016, 42(6): 883 891.,, 2015-10-12 2016-04-01 Manuscript received October 12, 2015; accepted April 1, 2016 (61432004), (1508085QF119), (NLPR201407345), (2015M580532), 2015 (2015cxcys109) Supported by Key Program of National Natural Foundation Science of China (61432004), the Natural Science Foundation of Anhui Province (1508085QF119), Open Project Program of the National Laboratory of Pattern Recognition (NLPR201407345), China Postdoctoral Science Foundation (2015M580532), and National Training Program of Innovation and Entrepreneurship for HFUT Undergraduates (2015cxcys109) Recommended by Associate Editor KE Deng-Feng 1. 230009 2. 7708500 1. School of Computer and Information, Hefei University of Technology, Hefei 230009, China 2. Department of Information Science and Intelligent Systems, Faculty of Engineering, Tokushima University, Tokushima 7708500, Japan,.,., ( )., ( ),,.,,,. : ( )..,. (Principal component analysis, PCA),, (Scale-invariant feature transform, SIFT) Haar (Local bi-
884 42 nary pattern, LBP).,,.,,. Krizhevsky [1] ILSVRC-2012,, SIFT.,, Lopes [2],, Extended CohnKanade (CK+) [3].,,,, CK+ : 1), Wild,. 1.,. 1 Fig. 1 CK+ Wild Samples from CK+ and Wild 2) CK+ 593,, 100., Lopes [2] 30, ( ),. 60 k ( ), MNIST Cifar10., CK+., CK+ (95 %),. 1. 2. 3 CK+ 1 https://github.com/neopenx/ Wild,. 4. Theano Github 1. 1 1.1, Bishop [4] : p(t x, t, α, β) = N(t m T NΦ(x), σn(x)) 2 (1) N m T NΦ(x) = y(x, m N ) = kernel(x, x n )t n (2) n=1 (1) t, t x, β α. (2) ( Smooth ). x x n.,, t t n,. Bengio [5], (Support vector machine, SVM), K (K-nearest neighbors, KNN),, (Smoothness-prior)., (Local representation),,.,,., SIFT Haar LBP PCA, 2,,,. 1.2 LeCun [6] 1990, 3. Fukushima [7], Rumelhart [8], [9]., Smooth,.
6 : ROI-KNN 885 Fig. 3 3 Fig. 2 1.2.1 2 Manifold side of input space Local connection and structure of convolutional neural network (CNN), (Locally-connection). (Fully-connection) (Dense-connection).,, [5],,. Szegedy [10] 22 GoogLeNet, ILSVRC-2014. 1.2.2 / ( ),,,., Fukushima [7],. 1.2.3 Pooling,,.,, (Max pooling) (Avg pooling).,,,. 1.3 Glorot [11] (Deep sparse rectifier neural networks), Sigmoid (logistic/tanh) ReLU. 1.3.1 Barron [12] N 1/N.,,,. Bengio [5],,,. Hubel [13], V1, V2,.,. 1.3.2 ReLU Dayan [14],, 4, 0, Sigmoid, ReLU. Attwell [15],, 1 % 4 %, 0,,,. ReLU : ReLU(x) = max(0, x) Softplus : Softplus(x) = log(1 + e x ) Softplus ReLU, [ 1, 1], (Gradient vanish),
886 42 4 ( Glorot [11] ) Fig. 4 Graphs for different activation functions from Glorot [11],, [1]., 0, L1 Regularization.,, [11]. 1.4 Dropout Hinton [16] Dropout. Dropout : 1) : x, p 0,. : DropoutT rain(x) = RandomZero(p) x,,. 2) :,., x,. : DropoutT est(x) = (1 p) x Dropout., Dropout,, Attwell [15].,,,,. Darwin [17],. 1.5 1.5.1 : ( W = Uniform 1 ) 1, N N Xavier [18] Sigmoid : ( W = Uniform 1, F in + F out ) 1 F in + F out, F in, F out. Bishop [4], N,,,. W P (W )., W. Krizhevsky [1] Hinton [16] ILSVRC- 2012, W,. 1.5.2 Krizhevsky [1] Hinton [16] ( ) 1 0,.,. 2 2.1 5, 32 32 ( 1), 3 Max pooling 1 1 Softmax., CNN-64 CNN-96 CNN-128. CNN-64: [32, 32, 64, 64] CNN-96: [48, 48, 96, 200] CNN-128: [64, 64, 128, 300]
6 : ROI-KNN 887 5 (,.) Fig. 5 Structure of DNN (? represents uncertain parameters with many candidate solutions.), p = 0.5 Dropout, L2 Regularization. Softmax, ReLU,, Max pooling. W Krizhevsky [1] (Standard deviation, STD). STD : [0.0001, 0.001, 0.001, 0.01, 0.1] Krizhevsky [1]. 2.2 6, 32 32 ( 1), 3 1 Softmax. 6 Fig. 6 Structure of deep sparse rectifier net, DNN- 1000, DNN-2000. DNN-1000: [1 000, 1 000, 1 000] DNN-2000: [2 000, 2 000, 2 000], p = 0.2 Dropout. Softmax, ReLU. W STD : [0.1, 0.1, 0.1, 0.1]., 1, 0. 2.3, 32 32 1 024.,., DNN, 128.0. (Early stopping). lr 0.01, momentum 0.9., lr,,, 0.0001, 3. 2.4 ROI-KNN Xavier [18 19],,.,,, 9 (Region of interest, ROI), 7,. 7 Fig. 7 9 ROI ( ) Nine ROI regions (cut, flip, cover, center focus) ROI,. ROI,,,.,,... ( ). ROI 9,, ROI,. ROI ROI ( ), ROI ( ). Bengio [5] : (Distributed representation),. Smooth-prior (Local representation),
888 42. ROI. ROI,. ROI.,., KNN,,,,., ROI-KNN,, 9 ROI,,. ROI-KNN Distributed representation, ROI,,. ROI ROI,, Local representation. ROI,,.. 2.5 Lopes [2],.,,. :.,,, ROI,.,,. Lopes [2],,., Wild. 3 2.1 2.2, : ROI ROI-KNN.. 3.1 CK+. 4, 500 Wild,., CK+, 1 200,, CK+,, Wild. CK+ 700 200 900. 5, 900. 300 300. 5, 300. 3.2 ROI ROI, Distributed representation. 3.1 5 4 500 5 1 500. 4 500 ROI, 4 500 9 = 40 500,. 1, ROI, ROI., ROI 4 % 5 %,.,, 25.8 %.,.,,. 3.1 :, Wild,,, Lopes [2] CK+.,, Lopes [2],,. 1 ROI (%) Table 1 Test set error rate of ROI auxiliary (%) CNN-64 4.7 32.7 54.3 33 40.3 33.3 CNN-64* 5.6 36.3 59.3 20.0 31.7 30.6 CNN-96* 5.0 36.7 53.3 20.7 24.7 28.6 CNN-128 3.3 32.0 51.0 27.0 37.7 30.2 CNN-128* 3.0 31.0 55.7 18.7 24.3 26.6 DNN-1000 3.0 37.7 65.3 38.3 36.7 36.2 DNN-1000* 2.3 39.0 52.0 30.0 31.7 31.0 DNN-2000* 2.0 43.3 55.0 24.7 32.7 31.5
6 : ROI-KNN 889 3.3 2.5,, : 1) I. CK+,,. [2], α : α N(0, 3 o ) 5, 700 11, 3.1 4 500, 5 700 11 + 4 500 = 43 000,,. 2) II. I 43 000, 3.2 40 500, 83 500,,. 3.2 ROI, 2, * I, + II+ROI, II ROI-KNN. 2 (%) Table 2 Test set error rate of rotating generated sample (%) CNN-128 3.3 32.0 51.0 27.0 37.7 30.2 CNN-128* 4.7 41.3 52.7 32.7 35.0 33.2 CNN-128+ 3.0 37.0 51.7 15.7 24.0 26.3 CNN-128ˆ 0.0 30.0 54.0 13.0 26.7 24.7 DNN-1000 3.0 37.7 65.3 38.3 36.7 36.2 DNN-1000* 1.3 39.7 62.0 37.3 42.0 36.5 DNN-1000+ 2.3 41.3 57.0 30.0 35.7 33.3 DNN-1000ˆ 1.3 43.0 67.7 31.0 33.7 35.3,., I, CNN- 128 DNN-1000 43 000, 4 500, 38 500, Wild, Lopes [2], CK+., II, ROI, ROI-KNN, DNN-1000. 3.4, ROI-KNN Distributed representation, ROI- KNN, Distributed representation.,,.,,,,,,.,,,. 3.4 ROI-KNN ROI-KNN KNN, 2.4, Distributed representation. 3, ROI, * ROI-KNN. 3 ROI-KNN (%) Table 3 Test set error rate with ROI-KNN (%) CNN-64 5.6 36.3 59.3 20.0 31.7 30.6 CNN-64* 1.0 29.7 56.0 17.0 30.0 26.7 CNN-96 5.0 36.7 53.3 20.7 24.7 28.6 CNN-96* 0.3 26.0 56.3 16.0 26.7 25.8 CNN-128 3.0 31.0 55.7 18.7 24.3 26.6 CNN-128* 0.6 22.7 57.0 12.0 26.3 23.7 DNN-1000 2.3 39.0 52.0 30.0 31.7 31.0 DNN-1000* 0.3 37.3 61.0 31.7 31.0 32.2 DNN-2000 2.0 43.3 55.0 24.7 32.7 31.5 DNN-2000* 0.3 40.0 68.0 26.3 33.3 33.6, KNN 4 % 5 %,,,.,,,.,,,. KNN ( Distributed representation),,.,,. 3.5 ROI-KNN SVM,, JAFFE, SVM PCA, CNN-128 ROI-KNN. 4, SVM,.
890 42 4 Table 4 JAFFE Comparisons on JAFFE (%) Kumbhar [20] Image feature 30 40 Lekshmi [21] SVM 13.1 Zhao [22] PCA and NMF 6.28 Zhi [23] 2D-DLPP 4.09 Lee [24] RDA 3.3 ROI-KNN+CNN 2.81 4.,,. Wild, ROI ;,, :,.,,. Regularization, L2 Regularization Dropout.,,, (Recurrent neural network, RNN). Distributed representation, SIFT,., SIFT Haar LBP,, Distributed representation,. ROI-KNN,, Distributed representation,,.,,,. Theano [25],. References 1 Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25. Lake Tahoe, Nevada, USA: Curran Associates, Inc., 2012. 1097 1105 2 Lopes A T, de Aguiar E, Oliveira-Santos T. A facial expression recognition system using convolutional networks. In: Proceedings of the 28th SIBGRAPI Conference on Graphics, Patterns and Images. Salvador: IEEE, 2015. 273 280 3 Lucey P, Cohn J F, Kanade T, Saragih J, Ambadar Z, Matthews I. The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). San Francisco, CA: IEEE, 2010. 94 101 4 Bishop C M. Pattern Recognition and Machine Learning. New York: Springer, 2007. 5 Bengio Y. Learning deep architectures for AI. Foundations and Trends in Machine Learning. Hanover, MA, USA: Now Publishers Inc., 2009. 1 127 6 LeCun Y, Boser B, Denker J S, Howard R E, Hubbard W, Jackel L D, Henderson D. Handwritten digit recognition with a back-propagation network. In: Proceedings of Advances in Neural Information Processing Systems 2. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1990. 396 404 7 Fukushima K. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 1980, 36(4): 193 202 8 Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. Nature, 1986, 323(6088): 533 536 9 LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278 2324 10 Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA: IEEE, 2015. 1 9 11 Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS). Fort Lauderdale, FL, USA, 2011, 15: 315 323 12 Barron A R. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 1993, 39(3): 930 945 13 Hubel D H, Wiesel T N, LeVay S. Visual-field representation in layer IV C of monkey striate cortex. In: Proceedings of the 4th Annual Meeting, Society for Neuroscience. St. Louis, US, 1974. 264
6 : ROI-KNN 891 14 Dayan P, Abott L F. Theoretical Neuroscience. Cambridge: MIT Press, 2001. 15 Attwell D, Laughlin S B. An energy budget for signaling in the grey matter of the brain. Journal of Cerebral Blood Flow and Metabolism, 2001, 21(10): 1133 1145 25 Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow I J, Bergeron A, Bouchard N, Warde-Farley D, Bengio Y. Theano: new features and speed improvements. In: Conference on Neural Information Processing Systems (NIPS) Workshop on Deep Learning and Unsuper Vised Feature Learning. Lake Tahoe, US, 2012. 16 Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors. arxiv: 1207.0580, 2012. 17 Darwin C. On the Origin of Species. London: John Murray, Albemarle Street, 1859. 18 Xavier G, Yoshua B. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS 2010). Chia Laguna Resort, Sardinia, Italy, 2010, 9: 249 256 19 Sun Y, Wang X, Tang X. Deep learning face representation from predicting 10 000 classes. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, OH: IEEE, 2014. 1891 1898 20 Kumbhar M, Jadhav A, Patil M. Facial expression recognition based on image feature. International Journal of Computer and Communication Engineering, 2012, 1(2): 117 119 21 Lekshmi V P, Sasikumar M. Analysis of facial expression using Gabor and SVM. International Journal of Recent Trends in Engineering, 2009, 1(2): 47 50 22 Zhao L H, Zhuang G B, Xu X H. Facial expression recognition based on PCA and NMF. In: Proceedings of the 7th World Congress on Intelligent Control and Automation. Chongqing, China: IEEE, 2008. 6826 6829 23 Zhi R C, Ruan Q Q. Facial expression recognition based on two-dimensional discriminant locality preserving projections. Neurocomputing, 2008, 71(7 9): 1730 1734 24 Lee C C, Huang S S, Shih C Y. Facial affect recognition using regularized discriminant analysis-based algorithms. EURASIP Journal on Advances in Signal Processing, 2010, article ID 596842(doi:10.1155/2010/596842).,.. E-mail: sunx@hfut.edu.cn (SUN Xiao Associate professor at the Institute of Affective Computing, Hefei University of Technology. His research interest covers natural language processing, affective computing, machine learning and human-machine interaction. Corresponding author of this paper.).,. E-mail: neopenx@mail.hfut.edu.cn (PAN Ting Bachelor student at the School of Computer Science and Information, Hefei University of Technology. His research interest covers the theory of deep learning and Bayesian learning, and corresponding applications in computer vision and natural language processing.),.,,,. E-mail: ren@is.tokushima-u.ac.jp (REN Fu-Ji Professor at the Institute of Affective Computing, Hefei University of Technology and Tokushima University. His research interest coves artificial intelligent, affective computing, natural language processing, machine learning, and human-machine interaction.)