Opto-Electronic Engineering 光电工程 Article 2018 年, 第 45 卷, 第 12 期 遥感图像中飞机的改进 YOLOv3 实时检测算法 戴伟聪 1,2*, 金龙旭 1, 李国宁 1, 郑志强 3 1 130033 2 100049 3 130022 摘要 : 针对遥感图像中的飞机目标, 本文提出一种遥感图像飞机的改进 YOLOv3 实时检测算法 首先, 针对单一的遥感图像飞机目标, 提出一种有 49 个卷积层的卷积神经网络 其次, 在提出的卷积神经网络上应用密集相连模块进行改进, 并提出使用最大池化加强密集连接模块间的特征传递 最后, 针对遥感图像中飞机多为小目标的现实, 提出将 YOLOv3 的 3 个尺度检测增加至 4 个并以密集相连融合不同尺度模块特征层的信息 在本文设计的遥感飞机测试集上进行训练和测试, 实验表明, 该算法的检测精度达到 96.26% 召回率达到 93.81% 关键词 : 遥感图像 ; 飞机目标 ; 实时检测 ; 卷积神经网络中图分类号 :TP751;O436.3 文献标志码 :A 引用格式 : 戴伟聪, 金龙旭, 李国宁, 等. 遥感图像中飞机的改进 YOLOv3 实时检测算法 [J]. 光电工程,2018,45(12): 180350 Real-time airplane detection algorithm in remote-sensing images based on improved YOLOv3 Dai Weicong 1,2*, Jin Longxu 1, Li Guoning 1, Zheng Zhiqiang 3 1 Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, Jilin 130033, China; 2 University of Chinese Academy of Sciences, Beijing 100049, China; 3 Changchun University of Science and Technology, Changchun, Jilin 130022, China Abstract: Focusing on the airplanes in remote-sensing images, a real-time algorithm based on improved YOLOv3 is proposed to detect airplanes in remote-sensing images. Firstly, a convolutional neural network that consists of 49 convolutional layers is proposed to detect airplanes in remote-sensing images specifically. Secondly, dense connection is employed on proposed convolutional neural network, and maxpool is employed to enhance the feature transmit between dense blocks. Finally, to deal with the fact that airplanes in remote-sensing images are small targets mainly, we propose to increase the scale detection from 3 to 4 and employ dense connection to merge feature map among different scales. The algorithm is trained and tested on the designed airplane dataset. The experiment results show that our algorithm obtain 96.26% on precision and 93.81% on recall. 收稿日期 :2018-06-28; 收到修改稿日期 :2018-08-22 基金项目 : 863 (863-2-5-1-13B) 作者简介 : (1994-) E-mail daiweicong16@mails.ucas.ac.cn 180350-1
Keywords: remote-sensing image; airplane target; real-time detection; convolutional neural network Citation: Dai W C, Jin L X, Li G N, et al. Real-time airplane detection algorithm in remote-sensing images based on improved YOLOv3[J]. Opto-Electronic Engineering, 2018, 45(12): 180350 1 引言 RCNN [1] Fast RCNN [2] Faster RCNN [3] Mask RCNN [4] YOLO [5] SSD [6] YOLOv2 [7] YOLOv3 [8] [9] YOLOv2-tiny [10] LeNet5 [11] 8 5 [12] YOLO RCNN YOLOv3 YOLOv3 1) YOLOv3-tiny YOLOv3 2) 3) YOLOv3 3 4 4 YOLOv3 4) 2 YOLOv3 YOLOv3 YOLOv2 Darknet53 YOLO YOLOv3 YOLOv3 41616 FPN S S 13 13 26 26 52 52 2 3 (anchor box) 图 1 YOLOv3 在 13 13 的单元格中的预测边 界框示意图 Fig. 1 An illustration of predicted bounding boxes on 13x13 grids of YOLOv3 180350-2
3 1 4 (x,y) w h t x t y t w t h ( c x c y ) p w p h b = σ ( t ) + c, x x x b = σ ( t ) + c, y y y b w = p, e t w w b e t h = ph h (1) ˆt * ˆt * t * YOLOv3 1 0.5 YOLOv3 YOLOv3 YOLOv3 3 基于 YOLOv3 的改进 YOLOv3 3.1 网络结构改进 YOLOv3 Redmon ResNet Darknet53 Darknet53 1 1 3 3 2 Darknet53 Darknet53 Darknet49 1 1 1 表 1 Darknet49 的网络结构 Table 1 The network structure of Darknet49 Type Output Filters Size Conv 208 208 16 3 3 conv stride=2 Residual block(1) 16 1 1 conv stride=1 208 208 32 3 3 conv stride=1 16 1 1 conv stride=1 Transition module 104 104 32 3 3 conv stride=2 Residual block(2) 32 1 1 conv stride=1 104 104 64 3 3 conv stride=1 32 1 1 conv stride=1 Transition module 52 52 64 3 3 conv stride=2 Residual block(3) 64 1 1 conv stride=1 52 52 128 3 3 conv stride=1 64 1 1 conv stride=1 Transition module 26 26 128 3 3 conv stride=2 Residual block(4) 128 1 1 conv stride=1 26 26 256 3 3 conv stride=1 128 1 1 conv stride=1 Transition module 13 13 256 3 3 conv stride=2 Residual block(5) 128 1 1 conv stride=1 13 13 256 3 3 conv stride=1 3.2 密集相连 Huang [13] DenseNet DenseNet Darknet49 Darknet49 Darknet49-Dense Darknet49-Dense 5 4 2 2 2 180350-3
Maxpool s=2 Transition module Dense block Concatenation Dense block Conv 1 1 s=1 Conv 3 3 s=2 图 2 过渡模块的图解 Fig. 2 An illustration of transition module l l -1 x = H ([ x1, x2,, x 1]), (2) l l l H l 3.3 多尺度检测改进 YOLOv3 FPN [14] YOLOv3 3 4 (IOU R IOU ) K-means K-means d( B, C) = 1 RIOU ( B, C), (3) B C R ( B, C) IOU ( 3) 12 (12, 16) (16, 24) (21, 32) (24, 41) (24, 51) (33, 51) (28, 62) (39, 64) (35, 74) (44, 87) (53, 105) (64, 135) 3 3 Densenet 4 2 2 4 4 8 8 41616 Darknet49-Dense YOLOv3 14.525 BFLOPS Darknet49 YOLOv3 9.695 BFLOPS Darknet53 YOLOv3 65.86 BFLOPS 90 85 80 Average IOU/% 75 70 65 60 55 50 0 2 4 6 8 10 12 14 16 18 20 Anchor box 图 3 锚点框数量与平均交并比的关系 Fig. 3 The relationship between the number of anchor boxes and average IOU 180350-4
Type Filters Size/stride Output size Conv 16 3 3/2 208 208 Conv 16 1 1/1 208 208 4 Conv 32 3 3/1 208 208 Transition block 104 104 Conv 32 1 1/1 104 104 4 Conv 64 3 3/1 104 104 Transition block 52 52 Conv 64 1 1/1 52 52 4 Conv 128 3 3/1 52 52 2 4 8 32 1 1 64 3 3 18 1 1 YOLO Transition block 26 26 Conv 128 1 1/1 26 26 4 Conv 256 3 3/1 26 26 2 4 64 1 1 128 3 3 18 1 1 YOLO Transition block 13 13 Conv 128 1 1/1 13 13 4 Conv 256 3 3/1 13 13 图 4 密集相连的多尺度检测 Fig. 4 Multi-scale detection with dense connection 2 256 1 1 512 3 3 18 1 1 YOLO 128 1 1 256 3 3 18 1 1 YOLO 4 实验结果与分析 Darknet I7-8700 CPU 16G RAM 1070Ti Windows 10 F 1 IOU AP P TP = F + T F 1 P T P R P = F F P + N 2P R F1 = P+ R T P F P F N AP A P VOC2007 PR R (0, 0.1, 0.2,, 0.9, 1) 11 A P 1 AP = Pinterp( R), 11 R {0,0.1,...,1} P ( R) = max p( R ), (4) interp RR, R P interp ( R) R R R N (f/s) t=1/n ms 30 f/s 180350-5
4.1 数据集 990 ( 5) LableImg 850 140 1372 941 ( 5(a)) ( 5(b)) ( 5(c)) 4.2 训练方法 0.9 η lr = 0.001 0.005 1000 2 ηlearning_rate = ηlr ( Nbatch /1000) N batch 10-3 10 320 320 608 608 32 4.3 检测结果定量评估 YOLOv3-air YOLOv3 YOLOv3-tiny YOLOv2 41616 1 AP F 1 140 96.26% 93.81% 89.31% YOLOv3-tiny 6% 13% 13% YOLOv3-air YOLOv3-tiny 4 YOLOv3 AP YOLOv3-air YOLOv3-tiny YOLOv3 33.2 f/s YOLOv3-tiny 215.2 f/s YOLOv3-air 58.3 f/s YOLOv3-air IOU YOLOv3 YOLOv3-tiny 4% YOLOv2 YOLOv2-tiny YOLOv2 YOLOv3 4.4 定性评估与误差分析 6(a) 6(b) 6(c) YOLOv3-air YOLOv3-tiny YOLOv3 7(b) 7(d) 6(d) 6(e) 6(f) YOLOv3 YOLOv3-tiny YOLOv3-air (a) (b) (c) 图 5 飞机数据集实例 Fig. 5 Three samples of airplane dataset 表 1 5 种算法的性能对比 Table 1 Performance comparison of 5 algorithms P/% R/% F 1 /% A P /% R IOU /% /(f/s) /ms YOLOv3 93.56 78.9 85.61 78.97 68.80 33.2 30.1 YOLOv3-tiny 90.82 83.05 86.76 78.99 67.05 215.2 4.6 YOLOv3-air 96.26 93.81 95.02 89.31 72.46 58.3 17.2 YOLOv2 87.11 62.27 72.62 60.92 60.28 47.5 21.1 YOLOv2-tiny 67.44 54.41 60.23 46.87 45.83 207.5 4.8 180350-6
光电工程 的差距 无法处理外表剧烈变化的飞机目标 分析可 与泛化性上表现优异 并且通过密集连接复用特征减 知 简单的卷积神经网络模型具有更好的泛化性 尤 少了训练数据不足的影响 其是在数据集较小 数据复杂多变的情况下 YOLOv3 为了进一步探讨训练集大小与算法性能的关系 参数数量过大 导致 YOLOv3 过拟合 本文提出的 表 2 中给出了在训练集只有 300 张遥感图像与只有 500 YOLOv3-air 通过减少参数量和增加多尺度检测上结 张遥感图像时 在相同测试集下 YOLOv3-air 的性能 合了 YOLOv3-tiny 和 YOLOv3 的优点 在小目标检测 在表 2 中 YOLOv3-air-500 表示训练集中只有 500 张 YOLOv3-air YOLOv3-tiny YOLOv3 (a) (b) (c) (d) (e) (f) 图 6 从左到右分别是 YOLOv3-air YOLOv3-tiny YOLOv3 的检测结果 (a) P883 (b) P902 (c) P903 (d) P909 (e) P866 (f) P867 Fig. 6 The detection results of YOLOv3-air YOLOv3-tiny YOLOv3 in order. (a) P883; (b) P902; (c) P903; (d) P909; (e) P866; (f) P867 180350-7
表 2 在不同大小的训练集中,YOLOv3-air 的性能对比 Table 2 Performance comparison of YOLOv3-air with different number images in training set P/% R/% F 1 /% A P /% R IOU /% YOLOv3-air 96.26 93.81 95.02 89.31 72.46 YOLOv3-air-500 93.47 87.25 90.25 86.53 70.74 YOLOv3-air-300 92.62 74.49 82.57 78.12 67.15 YOLOv3-air YOLOv3-air-300 300 YOLOv3-air 2 300 YOLOv3 YOLOv3-tiny 5 结论 YOLOv3 YOLOv3-air 58.34 f/s 96.26% 93.81% 89.31% YOLOv3 参考文献 [1] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[c]// Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014: 580 587. [2] Girshick R. Fast R-CNN[C]//IEEE International Conference on Computer Vision. IEEE, 2015:1440 1448. [3] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[c]// Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada, 2015: 91 99. [4] He K, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//IEEE International Conference on Computer Vision. IEEE, 2017: 2980 2988. [5] Redmon J, Divvala S, Girshick R, et al. You Only Look Once: Unified, Real-Time Object Detection[C]//Computer Vision and Pattern Recognition. IEEE, 2016: 779 788. [6] Liu W, Anguelov D, Erhan D, et al. SSD: Single Shot MultiBox Detector[C]//European Conference on Computer Vision. Springer International Publishing, 2016: 21 37. [7] Redmon J, Farhadi A. YOLO9000: Better, Faster, Stronger[C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2017: 6517 6525. [8] Redmon J, Farhadi A. YOLOv3: An Incremental Improvement[J]. arxiv preprint arxiv:1804.02767, 2018. [9] Xue Y J, Huang N, Tu S Q, et al. Immature mango detection based on improved YOLOv2[J]. Transactions of the Chinese Society of Agricultural Engineering, 2018, 34(7): 173 179. 薛月菊, 黄宁, 涂淑琴, 等. 未成熟芒果的改进 YOLOv2 识别方法 [J]. 农业工程学报, 2018, 34(7): 173 179. [10] Wang S Y, Gao X, Sun H, et al. An aircraft detection method based on convolutional neural networks in high-resolution SAR images[j]. Journal of Radars, 2017, 6(2): 195 203. 王思雨, 高鑫, 孙皓, 等. 基于卷积神经网络的高分辨率 SAR 图像飞机目标检测方法 [J]. 雷达学报, 2017, 6(2): 195 203. [11] Zhou M, Shi Z W, Ding H P. Aircraft classification in remote-sensing images using convolutional neural networks[j]. Journal of Image and Graphics, 2017, 22(5): 702 708. 周敏, 史振威, 丁火平. 遥感图像飞机目标分类的卷积神经网络方法 [J]. 中国图象图形学报, 2017, 22(5): 702 708. [12] Gu Y, Xu Y. Fast SAR target recognition based on random convolution features and ensemble extreme learning machines[j]. Opto-Electronic Engineering, 2018, 45(1): 170432. 谷雨, 徐英. 基于随机卷积特征和集成超限学习机的快速 SAR 目标识别 [J]. 光电工程, 2018, 45(1): 170432. [13] Huang G, Liu Z, Maaten L V D, et al. Densely Connected Convolutional Networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2017: 2261 2269. [14] Lin T Y, Dollar P, Girshick R, et al. Feature Pyramid Networks for Object Detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2017: 936 944. 180350-8
Real-time airplane detection algorithm in remote-sensing images based on improved YOLOv3 Dai Weicong 1,2*, Jin Longxu 1, Li Guoning 1, Zheng Zhiqiang 3 1 Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, Jilin 130033, China; 2 University of Chinese Academy of Sciences, Beijing 100049, China; 3 Changchun University of Science and Technology, Changchun, Jilin 130022, China An illustration of predicted bounding boxes on 13x13 grids of YOLOv3 Overview: The detection of airplanes in remote-sensing images has many important applications in many domains. However, limited to the performance of traditional machine learning methods, the airplanes in remote-sensing images are difficult to be detected. Recently, deep convolutional neural networks are employed to solve object detection problem and reach excellent accuracy. YOLO is one of the most famous real-time object detection algorithms based on regression. Compared with other algorithms, YOLO is more generalized when applied to many domains. Focusing on the airplanes in remote-sensing images, a real-time algorithm based on improved YOLOv3 is proposed to detect airplanes in remote-sensing images. Firstly, a convolutional neural network that consists of 49 convolutional layers is proposed to detect airplanes in remote-sensing images specifically. In the transition blocks of proposed convolutional neural network, we employ 1 1 convolution kernels to further reduce the parameters. Secondly, dense connection is employed on proposed convolutional neural network, and the maxpool is employed to enhance the feature transmit between two dense blocks. In this way, the feature transmit between two dense blocks is reconnected after a undersampling convolutional layer. The dense connection in proposed convolutional neural network enable the network to avoid over-fitting and reach high accuracy although the network is trained by relative few training data. Finally, to deal with the fact that airplanes in remote-sensing images are small targets mainly, we propose to increase the scale detections from 3 to 4 and employ dense connection to merge feature map among different scales. The anchor boxes in our work are obtained by running k-means clustering on the training set bounding boxes. The algorithm is trained and tested on the designed airplane dataset, which have 990 remote-sensing images. The qualitative experiment results show that our algorithm has stronger robustness than other existing algorithms, and our algorithm also shows especially high recall on small targets. The quantitative experiment results show that our algorithm obtains 96.26% on precision, 93.81% on recall and 89.31% on AP. Our algorithm reaches a relative improvement of 13.1% with respect to the YOLOv3 on AP. The detector proposed in this study is proven to perform real-time speed of more than 58.3 frames per second on a 1070Ti GPU. This study demonstrates the high effectiveness and accuracy of deep convolutional neural network in detecting airplanes on remote-sensing images. Meanwhile, the research also shows the fact that the performance of convolutional neural networks is decided by their structure and the number of training data. Citation: Dai W C, Jin L X, Li G N, et al. Real-time airplane detection algorithm in remote-sensing images based on improved YOLOv3[J]. Opto-Electronic Engineering, 2018, 45(12): 180350 Supported by National High Technology Research and Development Program ("863"Program) of China (863-2-5-1-13B) *E-mail: daiweicong16@mails.ucas.ac.cn 180350-9