39 8 Vol.39 No.8 2017 8 Infrared Technology August 2017 1,2,3 1,3 1,3 1. 2000832. 100049 3. 200083 4 1 TP391.41 A 1001-8891(2017)08-0728-06 Infrared Scene Understandng Algorthm Based on Deep Convolutonal Neural Network WANG Chen 1,2,3 TANG Xny 1,3 GAO Sl 1,3 (1. Shangha Insttute of Techncal Physcs, Shangha 200083, Chna; 2. Unversty of Chnese Academy of Scences, Bejng 100049, Chna; 3. Key Laboratory of Infrared System Detecton and Imagng Technology, Chnese Academy of Scences, Shangha 200083, Chna) AbstractWe adopt a deep learnng method to mplement a semantc nfrared mage scene understandng. Frst, we buld an nfrared mage dataset for the semantc segmentaton research, consstng of four foreground object classes and one background class. Second, we buld an end-to-end nfrared semantc segmentaton framework based on a deep convolutonal neural network connected to a condtonal random feld refned model. Then, we tran the model. Fnally, we evaluate and analyze the outputs of the algorthm framework from both the vsble and nfrared datasets. Qualtatvely, t s feasble to adopt a deep learnng method to classfy nfrared mages on a pxel level, and the predcted accuracy s satsfactory. We can obtan the features, classes, and postons of the objects n an nfrared mage to understand the nfrared scene semantcally. Key wordsnfrared magesnfrared scenesemantc segmentatonconvolutonal neural network 0 Jonathan Long CVPR 2015 FCN [1] S. Zheng ICCV2015 CRFasRNN [2] condtonal random feldcrf PASCAL VOC ICLR2015 Lang-Cheh Chen FCN Deeplab [3] CRF 2016-10-062016-10-31. 1989-E-mallkame@sna.com 2011xcwzk042014216 728
39 8 2017 8 Vol.39 No. 8 August 2017 Hyeonwoo Noh [4] boundng box PASCALV VOC PASCAL VOC Ctyscapes 1 4 4 4 14 1 0 1 1..1 14 bt 8 bt 1 14 bt 20 10 10 20 [I mn, I max ] 2[0,255] I mn f In Imn In Imn Io 255 f Imn In Imax (1) Imax I mn Imax f In Imax 3 CLAHE [5] 8 bt 2 1.2 10000 640512 14 bt.mat8 bt.jpg.png.png (a) (a) Infrared mage 1 (b) (b) Ground truth Fg.1 Sample of nfrared mage datasetss 729
39 8 Vol.39 No.8 2017 8 Infrared Technology August 2017 (a) (b) (a) Compresson algorthm based on global hstogram (b) Our algorthm 2 Fg.2 Comparson of compresson results 2 2.1 Softmax 3 2.2 VGG-16 VGG-16 [6] ImageNet ILSVRC-2014 16 1 Imagenet AlexNet GoogleNet 2.3 VGG Hole FCN [3] 2.3.1 Hole FCN VGG-16 32 8 Hole 4 0 3 Fg.3 Semantc segmentaton algorthm flow 1 VGG-16 Table 1 VGG-16 framework VGG-16 conv3-64 conv3-64 conv3-128 conv3-128 conv3-256 conv3-256 conv3-256 fc-4096 fc-4096 fc-1000 soft-max 730
39 8 Vol.39 No. 8 2017 8 August 2017 4 hole Fg.4 Illustraton of the hole algorthm 2.3.2 77 4096 44 33 23 1024 2.4 CRF [7] (2) E( x) ( x ) ( x, x ) j j j x (x )lgp(x ) P(x ) DCNN (, ) (, ) (, ) (3) x x K m j j x x k f f j m 1 m j x x j (x, x j )1 0 k m j m 2 2 2 p p I I p p exp( ) exp( ) 1 j j j 2 2 2 2 2 2 2 (4) CRF 3 3.1 Ctyscapes [8] Ctyscapes 2975 500 20481024 30 8 2973 8 4 1024512 1000 800 200 3.2 VGG-16 Ctyscapes 54 mn-batch 5 0.001 2000 0.1 0.9 0.0005 8000 1 1 4000 0.1 8000 16000 230.33 ms/frame Ubuntu14.04 Caffe CUDA GPU NVDIA GM200 12 G 3.3 4 IUn j jn cl t j n j [1] n / t (5) (1/ n ) n / t cl IU (1/ n ) n /( t n n ) IU cl 5 5 4 1 731 j j (6) (7) (8) 1 ( tk) tn /( t nj n) k j
39 8 2017 8 Infrared Technology Vol.39 No.8 August 2017 5 2 3 3.4 5 2 Table 2 Comparson of predcton accuracy IU IU 1 fc ctyscapes 0.9111 0.791 0.670 0.846 1crf ctyscapes 0.903 0.716 0.632 0.831 2 fc 0.7666 0.640 0.531 0.673 2crf 0.776 0.628 0.529 0.681 3 fc 0.887 0.797 0.719 0.824 3 crf 0.895 0.778 0.715 0.831 3 IU Table 3 IU results of each class on IR dataset IU IU IU 3 fc 0.603 0.753 0.778 0.732 0.637 0..713 0.887 0.797 0.824 3crf 0.556 0.749 0.6900 0.778 0.650 0..711 0.890 0.775 0.828 (a) (b) (a) Orgnal mage (b) Labels mage background person 5 (c) (c) Coarse predcton results (d) CRF (d) CRF refne resultss vehcle buldng tree Fg.5 Semantc segmentaton results 732
39 8 Vol.39 No. 8 2017 8 August 2017 8000 12mean IU 0.670 0.531 2 16000 mean IU 0.719 2 CRF CRF Ctyscapes SegModel mean IU 0.777 IU 0.556 3 IU CRF CRF 4 Caffe [1] Long J, Shelhamer E, Darrell T. Fully convolutonal networks for semantc segmentaton[c]//ieee Conference on Computer Vson and Pattern Recognton, 2015: 1337-1342. [2] Zheng S, Jayasumana S, Romeraparedes B, et al. Condtonal random felds as recurrent neural networks[c]//ieee Conference on Computer Vson and Pattern Recognton, 2015:1529-1537. [3] Chen L C, Papandreou G, Kokknos I, et al. Semantc mage segmentaton wth deep convolutonal nets and fully connected CRFs[J]. Computer Scence, 2014(4):357-361. [4] Noh H, Hong S, Han B. Learnng deconvoluton network for semantc segmentaton[c]//proceedngs of the IEEE Internatonal Conference on Computer Vson, 2015: 1520-1528. [5] Pzer S M, Amburn E P, Austn J D, et al. Adaptve Hstogram equalzaton and ts varatons[j]. Computer Vson, Graphcs, and Image Processng, 1987, 39(3): 355-368. [6] Smonyan K, Zsserman A. Very deep convolutonal networks for large-scale mage recognton[2014] [DB/OL]. arxv preprnt arxv: 1409.1556. [7] Krähenbühl P, Koltun V. Effcent Inference n fully connected CRFs wth Gaussan edge potentals[c]//advances n Neural Informaton Processng Systems, 2012:109-117. [8] Cordts M, Omran M, Ramos S, et al. The ctyscapes dataset for semantc urban scene understandng[c]//proceedngs of the IEEE Conference on Computer Vson and Pattern Recognton, 2016: 3213-3223. 733