Deep Learning-Based Landmark Detection for Mobile Robot Outdoor Localization

Similar documents
报 告 1: 郑 斌 教 授, 美 国 俄 克 拉 荷 马 大 学 医 学 图 像 特 征 分 析 与 癌 症 风 险 评 估 方 法 摘 要 : 准 确 的 评 估 癌 症 近 期 发 病 风 险 和 预 后 或 者 治 疗 效 果 是 发 展 和 建 立 精 准 医 学 的 一 个 重 要 前

Microsoft Word - KSAE06-S0262.doc

Microsoft PowerPoint - Performance Analysis of Video Streaming over LTE using.pptx

Microsoft PowerPoint - Aqua-Sim.pptx

2/80 2

Microsoft PowerPoint _代工實例-1

UDC Empirical Researches on Pricing of Corporate Bonds with Macro Factors 厦门大学博硕士论文摘要库

國家圖書館典藏電子全文

PowerPoint Presentation

1 引言

untitled

THE APPLICATION OF ISOTOPE RATIO ANALYSIS BY INDUCTIVELY COUPLED PLASMA MASS SPECTROMETER A Dissertation Presented By Chaoyong YANG Supervisor: Prof.D

Shanghai International Studies University THE STUDY AND PRACTICE OF SITUATIONAL LANGUAGE TEACHING OF ADVERB AT BEGINNING AND INTERMEDIATE LEVEL A Thes

(baking powder) 1 ( ) ( ) 1 10g g (two level design, D-optimal) 32 1/2 fraction Two Level Fractional Factorial Design D-Optimal D

影響新產品開發成效之造型要素探討

The Development of Color Constancy and Calibration System

A VALIDATION STUDY OF THE ACHIEVEMENT TEST OF TEACHING CHINESE AS THE SECOND LANGUAGE by Chen Wei A Thesis Submitted to the Graduate School and Colleg

Microsoft PowerPoint - ATF2015.ppt [相容模式]

IP TCP/IP PC OS µclinux MPEG4 Blackfin DSP MPEG4 IP UDP Winsock I/O DirectShow Filter DirectShow MPEG4 µclinux TCP/IP IP COM, DirectShow I

Microsoft PowerPoint - talk8.ppt

STEAM STEAM STEAM ( ) STEAM STEAM ( ) 1977 [13] [10] STEM STEM 2. [11] [14] ( )STEAM [15] [16] STEAM [12] ( ) STEAM STEAM [17] STEAM STEAM STEA

BC04 Module_antenna__ doc

COCO18-DensePose-BUPT-PRIV

WTO

LH_Series_Rev2014.pdf

第二十四屆全國學術研討會論文中文格式摘要

Public Projects A Thesis Submitted to Department of Construction Engineering National Kaohsiung First University of Science and Technology In Partial

OncidiumGower Ramsey ) 2 1(CK1) 2(CK2) 1(T1) 2(T2) ( ) CK1 43 (A 44.2 ) CK2 66 (A 48.5 ) T1 40 (

度 身 體 活 動 量 ; 芬 蘭 幼 兒 呈 現 中 度 身 體 活 動 量 之 比 例 高 於 臺 灣 幼 兒 (5) 幼 兒 在 投 入 度 方 面 亦 達 顯 著 差 異 (χ²=185.35, p <.001), 芬 蘭 與 臺 灣 幼 兒 多 半 表 現 出 中 度 投 入 與 高 度

Microsoft Word - 01李惠玲ok.doc

Preface This guide is intended to standardize the use of the WeChat brand and ensure the brand's integrity and consistency. The guide applies to all d

2015年4月11日雅思阅读预测机经(新东方版)

Outline Speech Signals Processing Dual-Tone Multifrequency Signal Detection 云南大学滇池学院课程 : 数字信号处理 Applications of Digital Signal Processing 2

Microsoft PowerPoint - ch6 [相容模式]

穨423.PDF

D A

HC50246_2009

K301Q-D VRT中英文说明书141009

南華大學數位論文

~ 10 2 P Y i t = my i t W Y i t 1000 PY i t Y t i W Y i t t i m Y i t t i 15 ~ 49 1 Y Y Y 15 ~ j j t j t = j P i t i = 15 P n i t n Y


Microsoft PowerPoint - ryz_030708_pwo.ppt

\\Lhh\07-02\黑白\内页黑白1-16.p

20

豐佳燕.PDF

穨control.PDF

~ a 3 h NCEP ~ 24 3 ~ ~ 8 9 ~ km m ~ 500 m 500 ~ 800 m 800 ~ m a 200

untitled

Improved Preimage Attacks on AES-like Hash Functions: Applications to Whirlpool and Grøstl

5 1 linear 5 circular ~ ~

186 臺 灣 學 研 究. 第 十 三 期 民 國 一 一 年 六 月 壹 前 言 貳 從 廢 廳 反 對 州 廳 設 置 到 置 郡 運 動 參 地 方 意 識 的 形 成 與 發 展 肆 結 論 : 政 治 史 的 另 一 個 面 相 壹 前 言 長 期 以 來, 限 於 史 料 的 限 制

Liao Mei-Yu Professor, Department of Chinese Literature, National Cheng Kung University Abstract Yao Ying was a government official in Taiwan for more

coverage2.ppt

Our Mission ICAPlants has been working since a long time in industrial automation, developing specific solutions for many industrial purposes to satis


, GC/MS ph GC/MS I

EXCEL EXCEL

摘 要 張 捷 明 是 台 灣 當 代 重 要 的 客 語 兒 童 文 學 作 家, 他 的 作 品 記 錄 著 客 家 人 的 思 想 文 化 與 觀 念, 也 曾 榮 獲 多 項 文 學 大 獎 的 肯 定, 對 台 灣 這 塊 土 地 上 的 客 家 人 有 著 深 厚 的 情 感 張 氏 於

附件1:

受訪者編號:

spss.doc

(Pattern Recognition) 1 1. CCD

致 谢 本 论 文 能 得 以 完 成, 首 先 要 感 谢 我 的 导 师 胡 曙 中 教 授 正 是 他 的 悉 心 指 导 和 关 怀 下, 我 才 能 够 最 终 选 定 了 研 究 方 向, 确 定 了 论 文 题 目, 并 逐 步 深 化 了 对 研 究 课 题 的 认 识, 从 而 一

世新稿件end.doc

2005 3,? :; ;, ;,,,,,,1 % %,,,,, 1 %,,,, : () ;, ;,,,,,,,,,,,,, (2004) ( GBΠT ) 16 (2004), (2004) 47

國立中山大學學位論文典藏

優 秀 的 構 圖 設 計 可 以 引 起 眾 的 注 意, 書 籍 封 面 的 構 圖 影 響 消 費 者 的 購 買 意 願 海 報 設 計 的 構 圖 影 響 的 傳 達 效 益 照 片 的 構 圖 影 響 美 感 的 表 現 與 傳 遞 經 典 名 作 在 構 圖 上 皆 有 細 膩 的 安

~ ~

國立中山大學學位論文典藏.PDF

第六篇

[9] R Ã : (1) x 0 R A(x 0 ) = 1; (2) α [0 1] Ã α = {x A(x) α} = [A α A α ]. A(x) Ã. R R. Ã 1 m x m α x m α > 0; α A(x) = 1 x m m x m +

Microsoft Word - 刘 慧 板.doc

Abstract Today, the structures of domestic bus industry have been changed greatly. Many manufacturers enter into the field because of its lower thresh

( ) ( ) ( ) ( )

% % % % % % ~

HC20131_2010

Thesis for the Master degree in Engineering Research on Negative Pressure Wave Simulation and Signal Processing of Fluid-Conveying Pipeline Leak Candi

Microsoft PowerPoint - NCBA_Cattlemens_College_Darrh_B


WTO

IPCC CO (IPCC2006) 1 : = ( 1) 1 (kj/kg) (kgc/gj) (tc/t)

考試學刊第10期-內文.indd

地質調査研究報告/Bulletin of the Geological Survey of Japan

Microsoft Word 張嘉玲-_76-83_

穨6街舞對抗中正紀念堂_林伯勳張金鶚_.PDF

Microsoft PowerPoint - Eisenstein_ABET_Presentation_Beijing_Oct_2007-Chinese.ppt [兼容模式]

http / /yxxy. cbpt. cnki. net / % % %

東方設計學院文化創意設計研究所


東吳大學

Microsoft Word - Final Exam Review Packet.docx


<4D F736F F D20A46AA4AFACECA7DEA46ABEC7B1D0AE76ACE3A873AD70B565A6A8AA47B3F8A769A4AFACE >

UDC The Policy Risk and Prevention in Chinese Securities Market

(1) (2) (IVI) (2001) (IVI) 50% ~8% 1~30cm (IVI) Study on the Plant Succession of Slopeland Landslide Areas Following H

211 better than those in the control group, with significant difference between two groups (P < 0.05). The ocular hypertension of patients in the cont

2005 5,,,,,,,,,,,,,,,,, , , 2174, 7014 %, % 4, 1961, ,30, 30,, 4,1976,627,,,,, 3 (1993,12 ),, 2

标题

Microsoft Word - ChineseSATII .doc

08陈会广

Transcription:

maches Article Deep Learng-Based Landmark Detection for Mobile Robot Outdoor Localization Sivapong Nilwong 1, Delowar Hossa 2, Sh-ichiro Kaneko 3 and Genci Capi 4, * 1 Graduate School Science and Engeerg, Hosei University, 3-7-2 Kajochō, Koganei, Tokyo 184-8584, Japan; sivapong.nilwong.46@stu.hosei.ac.jp 2 Fairy Devices Inc., Tokyo 113-0034, Japan; hossa@fairydevices.jp 3 Department Electrical and Control Systems Engeerg, National Institute Technology, Toyama College, 13, Hongo-machi, Toyama 939-8045, Japan; skaneko@nc-toyama.ac.jp 4 Department Mechanical Engeerg, Hosei University, 3-7-2 Kajochō, Koganei, Tokyo 184-8584, Japan * Correspondence: capi@hosei.ac.jp; Tel.: +81-42-387-6148 Received: 28 February 2019; Accepted: 16 April 2019; Published: 18 April 2019 Abstract: Outdoor mobile robot applications generally implement Global Positiong Systems (GPS) for localization tasks. However, GPS accuracy outdoor localization has less accuracy different environmental conditions. This paper presents two outdoor localization methods based on deep learng and landmark detection. The first localization method is based on Faster Regional-Convolutional Neural Network (Faster R-CNN) landmark detection captured image. Then, a feedforward neural network (FFNN) is traed to determe robot location coordates and compass orientation from detected landmarks. The second localization employs a sgle convolutional neural network (CNN) to determe location and compass orientation from whole image. The dataset consists images, geolocation data and labeled boundg boxes to tra and test two proposed localization methods. Results are illustrated with absolute errors from comparisons between localization results and reference geolocation data dataset. The experimental results poted both presented localization methods to be promisg alternatives to GPS for outdoor localization. Keywords: outdoor localization; deep learng; landmark detection; Faster R-CNN; CNN 1. Introduction In present world, mobile robots operate various fields applications, such as logistics, medical, agriculture, health carg and housekeepg. Navigation is one key elements that mobile robots need order to accomplish ir given tasks. Success navigation requires success different factors, cludg localization, which robots must be able to determe ir positions environments [1]. Recent fdgs suggested a significant number localization methods for both outdoor and door environments. For outdoor environments, Global Positiong Systems (GPS) is method that has been widely applied among a variety outdoor applications, some which are: mobile robot for high-voltage transmission le spection [2], autonomous position control multiple aerial vehicles [3], mobile robot for gas level mappg [4], navigation system for mobile robots usg GPS and ertial navigation system (INS) [5], and map buildg with simultaneous localization and mappg (SLAM) for firefighter robots [6]. Despite large-scale implementation, localization through GPS suffers decle accuracy from several environmental conditions. Accordg to ficial U.S. government formation about GPS and related topics [7], common causes degradation GPS accuracy are: (1) satellite signal blockage due to large objects environments such as buildg, bridges and trees; (2) door or underground use; and (3) signal reflected f buildg or walls. Due to decle GPS accuracy and Maches 2019, 7, 25; doi:10.3390/maches7020025 www.mdpi.com/journal/maches

Maches 2019, 7, 25 2 14 reliability, a large number GPS-based approaches also employ or sensors to improve localization accuracy. For example, fusion captured camera image features with GPS signals [8], use state chi-square test and simplified fuzzy predictive adaptive resonance ory (predictive ART or ARTMAP) neural network to diagnose sensors GPS/INS system [9], and combation GPS, wheel odometry and received signal strength (RSS) from wireless communication nodes to create a precise localization approach for mobile robots [10]. Apart from improvements available, re are also various alternative localization approaches that aim to replace GPS. Such approaches rely on odometry [11,12], visual odometry [13], visual patterns [14] and ultra-wideband network [15]. Research on deep learng has extended rapidly recent years. The implementation deep learng has been spreadg though many fields applications. Localization and positiong applications also adopt deep learng approaches for tasks. For stance, deep learng-based encoder determes locations from low-level features images [16]. Some significant fields that deep learng has been extensively implemented clude object detection and object recognition. Convolutional Neural Network (CNN) is one particular stance that has been implemented for object detection and recognition, due to its structure that can effectively handle visual data. CNN has been implemented as base for various object detectors, cludg Faster Regional Convolutional Neural Network (Faster R-CNN) [17]. Faster R-CNN is state---art object detector based on region proposals, which surrounds detected objects with boundg boxes. The approach region proposal-based method for Faster R-CNN object detector is same as its predecessor, Fast R-CNN [18]. One major difference between Faster R-CNN and Fast R-CNN is Region Proposal Network (RPN), which reduces Faster R-CNN detection time and creases accuracy. The creased speed Faster R-CNN for object detection makes it suitable for real time applications [17,18]. Implementations Faster R-CNN spread throughout various applications, such as detection cyclists depth images [19], pedestrian detection from security cameras [20], and ship detection remote sensg images that conta foggy scenes [21]. In [19 21] it is shown that Faster R-CNN has high accuracy (more than 80%), slightly higher than human volunteers that have approximately 75% accuracy [22]. It is well known that humans and or animals can use landmarks to determe where y are world and generate path to destations [23]. For localization mobile robots outdoor environments, signs and landmarks are commonly visible and usually distct. CNN and CNN-oriented Faster R-CNN are very useful for handlg 2D data such as images. Therefore, it is an advantage to use deep learng-based object detection approaches for mobile robot localization outdoor environments. Visual based mobile robot localization will mimic way humans and animals determe ir locations and directions. In addition, or conventional sensors such as GPS or compass will be replaced by vision. Therefore, this paper aims to propose and compare two localization methods based on CNN and Faster R-CNN. In CNN-based method, CNN analyzes robot captured image and determes current location and orientation robot. In Faster R-CNN-based method, Faster R-CNN is used to detect landmarks with image, before sendg detected landmarks to feedforward neural network (FFNN) that generates current location and compass orientation. Data amount becomes a challenge deep learng, sce performances deep learng approaches rely on a large amount data [16,21]. Thus, we also aim to develop and test performances proposed methods usg a smaller amount data than or deep learng implementations. The proposed method has been implemented as follows: first, image dataset with geolocation data that contas 1625 sets data is created. Second, we develop localization methods based on CNN and Faster R-CNN usg created dataset. Fally, we evaluate performance developed localization methods through test set dataset created first contribution. The paper is organized as follows: Section 2 describes two proposed localization methods, and ir essential components. Section 3 explas experimental results, and Section 4 concludes this paper.

Maches 2019, 7, 7, 25 3 14 14 2. Localization Methods 2. Localization Methods Two localization methods based on CNN are proposed this paper. We vestigated object Two localization methods based on CNN are proposed this paper. We vestigated object detection capabilities CNN and one its successors, Faster R-CNN, to be used for localization detection capabilities CNN and one its successors, Faster R-CNN, to be used for localization based based on visual landmarks. The first localization method is two-step procedure based on Faster on visual landmarks. The first localization method is two-step procedure based on Faster R-CNN R-CNN object detector. Faster R-CNN is used to detect visible landmarks from a camera image. object detector. Faster R-CNN is used to detect visible landmarks from a camera image. Labeled Labeled boundg boxes detected landmarks are n used as puts for FFNN that generates boundg boxes detected landmarks are n used as puts for FFNN that generates location location coordates and compass orientation from landmarks. The second localization method is coordates and compass orientation from landmarks. The second localization method is based on based on conventional CNN. In second method, whole camera image is processed through conventional CNN. In second method, whole camera image is processed through CNN to CNN to directly generate location coordates and compass orientation. Furr details two directly generate location coordates and compass orientation. Furr details two localization localization methods and ir components are described followg subsections. methods and ir components are described followg subsections. 2.1. Faster R-CNN Localization 2.1. Faster R-CNN Localization The overview structure Faster R-CNN based localization method is illustrated The overview structure Faster R-CNN based localization method is illustrated Figure 1. Figure 1. The Faster R-CNN based method is a two-step procedure. Faster R-CNN object detector The Faster R-CNN based method is a two-step procedure. Faster R-CNN object detector and FFNN are and FFNN are two ma components first outdoor localization method. Durg robot two ma components first outdoor localization method. Durg robot navigation, Faster navigation, Faster R-CNN detects landmarks camera captured image. This landmark detection R-CNN detects landmarks camera captured image. This landmark detection process generates process generates three types answers for each stance detected landmarks, which are three types answers for each stance detected landmarks, which are boundg box, label and boundg box, label and score. Boundg box contas position and size detected landmark score. Boundg box contas position and size detected landmark put image. Label put image. Label dicates class name detected landmark. Score refers to an dicates class name detected landmark. Score refers to an objectness score, which measure objectness score, which measure membership boundg box to classes landmarks or membership boundg box to classes landmarks or background [17]. The components background [17]. The components each detected landmark are n sent to FFNN for localization. each detected landmark are n sent to FFNN for localization. The localization part uses detected The localization part uses detected landmarks from Faster R-CNN to localize robot landmarks from Faster R-CNN to localize robot real-world environment. FFNN for real-world environment. FFNN for localization utilizes boundg boxes and labels detected localization utilizes boundg boxes and labels detected landmarks to generate geolocation data as landmarks to generate geolocation data as result localization system. The generated result localization system. The generated geolocation data cludes location coordates and geolocation data cludes location coordates and compass orientation, which location compass orientation, which location coordates are form latitude and longitude angles, coordates are form latitude and longitude angles, and orientation is form and orientation is form magnetic-referenced compass orientation. magnetic-referenced compass orientation. Figure 1. System flows Faster Regional-Convolutional Neural Network (Faster R-CNN) based Figure 1. System flows Faster Regional-Convolutional Neural Network (Faster R-CNN) based localization method. localization method. Furr details on two ma components Faster R-CNN-based localization method are described Furr details followg on two subsections. ma components Faster R-CNN-based localization method are described followg subsections.

Maches 2019, 7, 25 4 14 Maches 2019, 7, 4 14 2.1.1. Faster R-CNN for Landmark Detection Landmark detection is is first process our our Faster Faster R-CNN localization method. method. We We employ employ standard standard version version Faster Faster R-CNN R-CNN object object detector detector for landmark for landmark detection detection tasks. tasks. Typically, Typically, Faster Faster R-CNN R-CNN comprises comprises two modules, two modules, i.e., Fast i.e., R-CNN Fast R-CNN object detector object detector and region and region proposal proposal network network (RPN). (RPN). The structure The structure Fast R-CNN Fast R-CNN object detector object detector contas contas several several convolutional convolutional and max and poolg max poolg layers, layers, a regiona region terest terest (RoI) poolg (RoI) poolg layer, and layer, a sequence and a sequence fully connected fully connected layers. Inlayers. Fast R-CNN, In Fast R-CNN, a set convolutional a set convolutional layers and layers maxand poolg max layers poolg constructs layers constructs a convolutional a convolutional feature map feature frommap an from entirean put entire image. put RoI image. poolg RoI poolg extracts extracts a fixed-length a fixed-length feature vector feature from vector from feature feature map map each at region, each which region, iswhich used as is put used as put Fast R-CNN. Fast At R-CNN. fal pots At fal pots Fast R-CNN, Fast fully R-CNN, connected fully connected layers estimate layers classes estimate feature classes vectors feature from RoI vectors poolg from and RoI refe poolg result and boundg refe result boxes from boundg se boxes featurefrom vectors se [18]. feature The RPN vectors is a[18]. deep The fully RPN convolutional is a deep fully neural convolutional network that neural shares network full-image that shares convolutional full-image features convolutional with detection features network, with Fast detection R-CNN. network, RPN proposes Fast R-CNN. high-quality RPN regions proposes to high-quality Fast R-CNN regions module. to InFast addition, R-CNN it helps module. guidg In addition, Fast it R-CNN helps guidg over locations Fast R-CNN objects over locations captured image objects [17]. captured image [17]. Architecture implemented standard Faster Faster R-CNN R-CNN is shown is shown Figure Figure 2. The 2. whole The whole put image put image is processed is processed through through set convolutional set convolutional and max and poolg max layers poolg layers CNN to generate CNN to a generate convolutional a convolutional feature map. feature The map. feature The map feature is n map put is to n RPN put to generate to RPN to a set generate rectangular a set region rectangular proposals. region Sce proposals. feature Sce map feature is shared map isacross sharedrpn acrossand RPN anddetection detection network, network, generated generated feature feature map map is also is also put put to to RoI RoI poolg layer. This is isused to extract fixed-length feature vectors, with help region proposals generated from RPN. Extracted feature vectors from RoI poolg layer are n processed through a series fully connected layers to estimate classes each feature vector through classifiers and refe result region proposals form feature vectors through regression process. Thus, refed and classified region proposals, or boundg boxes are generated from Faster R-CNN. Figure 2. Faster R-CNN for landmark detection and its components. Figure 2. Faster R-CNN for landmark detection and its components. The structure CNN used our Faster R-CNN is also shown Figure 2. As mentioned earlier this subsection, The structure set CNN convolutional used our Faster layers R-CNN CNN is also analyzes shown whole Figure put 2. As image mentioned to construct earlier a convolutional this subsection, feature set map. convolutional As size layers smallest landmarks CNN analyzes utilized whole dataset put image is nearly to construct 32 32 pixels, a convolutional put size feature is set map. to 32 As 32 size 3, where smallest last 3 is landmarks for three color channels: utilized red, dataset green is nearly and blue. 32 The 32 pixels, set convolutional put size layers is set contas to 32 32 two, 3, two-dimensional where last 3 is convolutional for three color layers, channels: with red, a rectified green lear and blue. unit The (ReLU) set attached convolutional after each layers convolutional contas two, layer. two-dimensional The set also cludes convolutional one max layers, with a rectified lear unit (ReLU) attached after each convolutional layer. The set also cludes one max poolg layer for down-samplg purposes. Each convolutional layer employs a 3 3 filter

Maches 2019, 7, 25 5 14 Maches 2019, 7, 5 14 and poolg has layer stride forsettgs down-samplg 1 pixel purposes. for both horizontal Each convolutional and vertical layer strides. employs The number a 3 3 filter filters and has first stride convolutional settgs 1layer pixelis for 48, both while horizontal 96 filters and are vertical used for strides. second The number convolutional filters layer. The first max convolutional poolg layer layer is placed is 48, while at 96 end filters arelayers used set, for which second convolutional poolg size layer. is 2 2 The and max poolg stride settgs layer is is placed 1 pixel at for both endhorizontal layers and set, vertical which strides. poolg This small sizepoolg is 2 2size andis applied strideto settgs prevent is 1 premature pixel for both down-samplg horizontal and vertical put strides. image, This which small may poolg cause size isloss applied features to prevent premature result feature down-samplg map. put image, which may cause loss features result feature map. Trag Faster R-CNN consists followg four steps: Step Step1 RPN trag; Step Step2 Fast R-CNN trag usg region proposals from Step Step1; 1; Step Step3 RPN re-trag usg weight sharg with Fast Fast R-CNN for for fe-tung RPN; and andstep 4 Fast R-CNN re-trag usg updated RPN. These steps are are same as as origal Faster R-CNN trag [17]. The The trag uses uses whole images as asput, and and labeled boundg boxes as as target. Trag contues for for 20 20 epochs, with 1 1 10 10 4 itial 4 itial learng rate. rate. 2.1.2. 2.1.2. Feedforward Neural Neural Network Network for for Localization The The localization part part generates generates robot robot location locationusg usgboundg boxes boxes and andlabels labels detected detected landmarks from from Faster Faster R-CNN. R-CNN. Sce Sce boundg boxes boxes and andlabels labels are are generated through through features features an animage imagedurg durg detection process process Faster FasterR-CNN, boundg boxes boxes detected detected landmarks are arearranged accordg to tolabels labels landmark classes without furr processg. The The localization part part Faster Faster R-CNN R-CNN method method has a has sgle a sgle FFNN FFNN as its as core itscomponent, core component, which which uses arranged uses arranged boundg boundg boxes as boxes put. asthe put. output The units output are location units arecoordates location coordates and magnetic andcompass magnetic orientation. compass orientation. The Theimplemented FFNN consists 72 72 put put neurons, neurons, 48 hidden 48 hidden neurons neurons and three and output three output neurons, neurons, as shown as shown Figure 3. The Figure 72 puts 3. The are 72 slots puts boundg are slots box boundg elements box detected elements landmarks. detected Each landmarks. boundg Each box contas boundg four box elements contas forfour positions elements andfor sizes positions box. and We sizes have limited box. We number have limited landmark number classes to ne landmark due classes numberto ne landmark due classes number experiments. landmark Furr classes details experiments. ne landmark Furr classes details will bene givenlandmark Sectionclasses 3. Each will class be given has limitation Section 3. Each maximum class has two limitation detection stances. maximum For example, two detection if re are stances. three Crossg For example, landmarks if re detected, are two three stances Crossg with landmarks highest detected, objectness two scores stances will with be used for highest localization. objectness This scores results will be used total for localization. 72 (9 2 4) This put results neurons. Default total value 72 (9 2 put 4) put neurons neurons. is zerodefault if revalue is no detected put neurons stanceis for zero each if re slot is landmarks. no detected The stance threefor output each neurons slot landmarks. correspondthe to three latitude, output longitude neurons and correspond compass. to The latitude, number longitude hidden and neurons, compass. however, The number are acquired hidden from neurons, trial-and-error however, tests. are acquired Activation from function trial-and-error FFNN tests. neurons Activation is symmetric function saturatg FFNN neurons learis transfer symmetric function. saturatg lear transfer function. Figure 3. Feedforward neural network (FFNN) for localization with detected landmarks from Figure 3. Feedforward neural network (FFNN) for localization with detected landmarks from Faster Faster R-CNN. R-CNN. Trag FFNN for localization utilizes labeled boundg boxes and geolocation data for trag, with Trag Bayesian regularization FFNN for localization as trag utilizes algorithm. labeled Boundg boundg boxes boxes are and arranged geolocation accordg data to for trag, labels attached with Bayesian to boxes. regularization Elements arranged as trag boundg algorithm. boxes are Boundg used as boxes put are arranged trag accordg data. Geolocation to labels data attached which to cludes boxes. Elements latitude angles, arranged longitude boundg angles, boxes and are magnetic used as compass put orientations, trag data. are used Geolocation as target data trag which cludes data. latitude angles, longitude angles, and magnetic compass orientations, are used as target trag data.

Maches 2019, 7, 25 6 14 2.2. CNN Localization 2.2. CNN Localization Convolutional neural network (CNN) is one type deep neural networks that is suitable for two-dimensional Convolutional array neural implementations, network (CNN) such is one as type images. deep Typically, neuralcnns networks are that applied is suitable for object for detection two-dimensional and recognition array implementations, purposes, which such asfal images. parts Typically, CNNs mostly areemploy appliedlayers for object for classification. detection and recognition For stance, purposes, Stmax classifier. which fal However, parts CNNs our mostly CNN employ localization layersmethod, for classification. CNN is used For stance, to directly Stmax determe classifier. proper However, geolocation our CNN data localization from image method, and CNN its features is usedwith. to directly The implemented determe CNN proper for geolocation localization data has from its fal image parts and replaced its features with regressors with. The stead implemented classifiers. CNN Consequently, for localizationour has CNN its fal determes parts replaced geolocation with regressors data stead through regression classifiers. output, Consequently, stead our classification. CNN determes Similar geolocation to Faster data R-CNN through localization, regression output, camera stead image is classification. used put Similar for to CNN Faster localization, R-CNN localization, where whole camera put image image is is used processed as putthrough for CNN all layers localization, CNN. wherethe result whole put our image CNN is localization processed method through is all layers geolocation CNN. data, The which result consists our CNN latitude localization angle, longitude method is angle, geolocation and data, magnetic which consists compass latitude orientation, angle, longitude same angle, as and magnetic Faster compass R-CNN localization orientation, results. same as Faster R-CNN localization results. Design implemented CNN is shown Figure 4. The best architecture is determed by trial-and-error method and and combation from from different different CNN CNN examples examples and and prciples prciples available available [24]. [24]. The CNN The CNN for localization for localization comprises comprises 37 layers 37 layers total. total. The The put put layer layer CNN CNN has has size size 320 320 240 240 3 (320 3 (320 pixels pixels width, width, 240 240 pixels pixels height height and three and three color color channels: channels: red, green red, green and blue). and blue). There There are 10are sets10 sets convolutional convolutional layers, layers, batch batch normalization normalization and ReLU and ReLU cluded cluded with with 37 layers 37 layers CNN, CNN, where where convolutional convolutional layer, layer, batch batch normalization normalization and ReLU and ReLU are displayed are displayed toger toger as oneas green one green layer layer diagram. diagram. There are There different are sizes different and different sizes and amounts different amounts filters among filters implemented among implemented convolutional convolutional layers. The earliest layers. convolutional The earliest layer convolutional employs layer largest employs filter size largest 5 filter 5, while size 5 filter 5, size while is decreasg filter size as is decreasg network contues as network deeper, to contues last convolutional deeper, to layers last convolutional which apply layers filter which size apply 2 2. On filter size contrary, 2 2. On number contrary, filters begs number with afilters small number begs with 24a filters small number first convolutional 24 filters layer. first convolutional The number layer. filters The number each convolutional filters each layerconvolutional set creases as layer set network creases progresses as network deeper, progresses to amount deeper, 64to filters amount last layers. 64 filters We employ last layers. four max We poolg employ four layers max at poolg size layers 3 3, at 3 3, size 2 2, 3 and 3, 23 3, 2, 2 with 2, and stride 2 2, with settgs as stride 2, 2, settgs 1, and 1as respectively. 2, 2, 1, and 1 Fal respectively. parts Fal CNNparts for localization CNN consist for localization a fully connected consist layer a fully andconnected regressor. layer Sce and regressor. localization Sce is determed localization basedis on determed latitude, longitude, based latitude, and compass longitude, orientation, and compass only three orientation, neurons only are employed three neurons are fully employed connected layer fully connected CNN. layer CNN. Figure 4. Structure Convolutional Neural Network (CNN) based localization. Figure 4. Structure Convolutional Neural Network (CNN) based localization. The implemented CNN trag employs a whole captured image as put, and geolocation datathe as implemented trag target. CNN Setup trag for CNN employs trag a whole cludes captured 48 trag image as epochs, put, batch and size geolocation 64 and data 1 10as 6 itial trag learng target. rate. Setup The learng for CNN rate trag decreases cludes at 48 rate trag 0.1epochs, every 20 batch epochs. size 64 and 1 10 6 itial learng rate. The learng rate decreases at rate 0.1 every 20 epochs.

Maches 2019, 7, 25 7 14 3. Experimental Results Experiments our localization methods start from dataset construction. The geotagged image dataset is constructed to provide data for both trag and testg our localization methods. In total we created a dataset 1625 data, which is relatively small compared to most deep learng implementations. From 1625 sets data dataset, 1198 sets were randomly selected for trag process, while remag 427 sets were used to test performance. Reducg amount trag data is a very important research issue deep learng community. Our developed localization methods were tested terms localization accuracy. The Faster R-CNN localization method was also tested terms landmark detection accuracy. 3.1. Geotagged Image Dataset Similar to or deep learng systems, Faster R-CNN, FFNN and CNN employed our localization methods require data for both trag and test. The dataset was constructed from 1625 images form JPEG color images at size 320 240 pixels. Each image was tagged with correspondg geolocation data, cludg location coordates, form latitude and longitude angles, and compass orientation. Geotagged images were labeled with boundg boxes landmarks each image. In summary, one set data consists an image, latitude angle, longitude angle, magnetic compass orientation, boundg boxes landmarks image and labels landmarks for boundg boxes. The trag set, which cludes randomly-selected 1198 sets data, was employed for trag all components both Faster R-CNN and CNN localization methods. Faster R-CNN for landmark detection used whole images and labeled boundg boxes landmarks for trag. FFNN for localization Faster R-CNN localization method used labeled boundg boxes and correspondg geolocation data for trag. CNN for localization used whole images and correspondg geolocation data for trag. Experimental results proposed localization methods were generated with data test set as put both localization methods. The landmark detection tested Faster R-CNN performance with all images test set as puts, and compared results with correspondg boundg boxes test images. The proposed localization methods were tested usg all images test set as puts. The results from both localization methods were evaluated by comparg with correspondg geolocation data each test image. 3.1.1. Data Garg Geotagged images dataset were taken by a wheelchair robot equipped with camera, GPS receiver and compass sensor (Figure 5). The wheelchair robot is 55 cm width, 120 cm length and 140 cm height. Sensors for data garg were attached above seat. We used a Logitech C920 HD (Logitech, Lausanne, Switzerland) as robot camera, BU-353S4 (GlobalSat, Taipei, Taiwan ) as GPS receiver and an Octopus 3-axis digital compass sensor. All images dataset were taken from area near Koganei campus Hosei University, Japan. Two areas were selected for robot localization outdoor environments, as shown Figure 6. The length and width area 1 is 70 and 30 m, respectively. Area 2 is 75 m length and 30 m wide. The two areas for experiments were a distance 250 m from each or. There are different types landmarks available each area, which distguished one experimental area from anor. Durg data garg, robot was pushed by a human, and images were taken manually. Each time an image was taken, correspondg geolocation data was tagged to image automatically. The tagged geolocation data cludes location coordates and compass orientation. Location coordates were received from GPS receiver form a GGA message. Latitude and longitude formation side GGA message was extracted and tagged to image. Compass orientation was received from compass sensor, converted to magnetic compass orientation, before beg tagged to

Maches 2019, 7, 25 8 14 Maches 2019, image. We7, collected 8 14 data different wear conditions order to crease robustness Maches 2019, 7, 8 14 proposed algorithms. (a) (a) (b) (b) Figure 5. Wheelchair robot equipped with sensors forfordata (a) view Figure Wheelchair robot equipped sensors datagarg: garg: (a) overall overall Figure 5. 5. Wheelchair robot equipped withwith sensors for data garg: (a) overall view view wheelchair wheelchair robot; (b) sensors for data garg: 1. Camera, 2. Global Positiong Systems (GPS) wheelchair robot; (b) sensors for data garg: 1. Camera, 2. Global Positiong Systems (GPS) robot; (b) sensors for data garg: 1. Camera, 2. Global Positiong Systems (GPS) receiver, and 3. receiver, and and 3. Compass sensor. receiver, 3. Compass sensor. Compass sensor. Figure6.6.Map Map experimental map). TheThe areas experiments are marked with Figure experimentalareas areas(google (Google map). areas experiments are marked with Figure 6. Map experimental areas (Google map). The areas experiments are marked with rectangles. rectangles. rectangles. DurgLabelg data garg, robot was pushed by a human, and images were taken manually. 3.1.2. Image Durg data robot was pushed by a geolocation human, anddata images taken manually. Each time an garg, image was taken, correspondg was were tagged to image All gared images dataset were hand-labeled with boundg boxestagged landmarks images. Eachautomatically. time an image was taken, correspondg geolocation data was to image The tagged geolocation data cludes location coordates and compass orientation. Ne types The landmarks were utilized for robot localization: FamilyMart, CocaCola, BicycleLane, automatically. tagged geolocation data cludes locationcoordates compass orientation. Location coordates were received from GPS receiver form and a GGA message. Latitude NoTruck, Crossg, Lawson, TimesParkg, LawsonParkg, and RoadSign 1. Figure 7 shows Location coordates were received from message GPS receiver form a GGA message. Latitude and longitude formation side GGA was extracted and tagged to image. Compass pictures se ne landmarks area experiments. Each boundg box is form and longitude compass GGA message extracted and tagged to orientation, image. Compass orientationformation was receivedside from sensor,was converted to magnetic compass before a vector which horizontal and conditions vertical position begwith tagged to member image. We compass collected contas dataconverted different ordercoordates to crease orientation wasfour received fromelements, sensor, towear magnetic compass orientation, before top-left corner, width, and height boundg box image. Unit position coordates, robustness proposed algorithms. beg tagged to image. We collected data different wear conditions order to crease and height boundg box is determed by number pixels. Horizontal and vertical width, robustness proposed algorithms. 3.1.2. Image Labelgare referenced from top-left corner image. For example, a boundg box position coordates that has a vector images {10, 20, 56, 72}dataset has itswere top-left corner at pixel number 10 horizontally 3.1.2. Image All Labelg gared hand-labeled with boundg boxes landmarksand 20 vertically, and width and height were utilized box are for 56 and 72 localization: pixels, respectively. images. Ne types landmarks robot FamilyMart, CocaCola, All gared images dataset were hand-labeled with boundg boxes landmarks BicycleLane, NoTruck, Crossg, Lawson, TimesParkg, LawsonParkg, and RoadSign 1. images. Ne types landmarks were utilized for robot localization: FamilyMart, CocaCola, Figure 7 shows pictures se ne landmarks area experiments. Each boundg box is BicycleLane, NoTruck, Crossg, Lawson, TimesParkg, LawsonParkg, and RoadSign 1. form a vector with four member elements, which contas horizontal and vertical position Figure 7 shows pictures se ne landmarks area experiments. Each boundg box is coordates top-left corner, width, and height boundg box image. Unit position form a vector with four member elements, which contas horizontal and vertical position coordates, width, and height boundg box is determed by number pixels. Horizontal coordates top-left corner, width, and height boundg box image. Unit position coordates, width, and height boundg box is determed by number pixels. Horizontal

Maches 2019, 7, 9 14 and vertical position coordates are referenced from top-left corner image. For example, a Maches 2019, 7, 25 9 14 boundg box that has a vector {10, 20, 56, 72} has its top-left corner at pixel number 10 horizontally and 20 vertically, and width and height box are 56 and 72 pixels, respectively. (a) (b) (c) (d) (e) (f) (g) (h) (i) Figure 7. Landmarks used experiments: (a) FamilyMart ; (b) CocaCola ; (c) BicycleLane ; (d) NoTruck ; (e) Crossg ; (f) Lawson ; (g) TimesParkg ; (h) LawsonParkg ; (i) (i) RoadSign 1. 1. 3.2. Detection Experiments The goal landmark detection experiments was to evaluate performance Faster R-CNN, sce localization part is strongly related with landmark detection. All 427 images test set were processed through through Faster Faster R-CNN, R-CNN, and and embedded embedded with with boundg boundg boxes boxes and labels and labels landmarks landmarks detected detected by Fasterby R-CNN. Faster Evaluation R-CNN. Evaluation detection results detection cludes results qualitative cludes and qualitative quantitativeand tests. quantitative tests. The qualitative evaluation was done by analyzg detection results through human eyes. Some detection resultsfrom from Faster Faster R-CNN R-CNN on on images images test set testare setshown are shown Figure Figure 8. Most 8. Most generated generated boundg boundg boxes boxes are placed are placed well on well detected on detected landmarks landmarks with proper with proper positions positions and sizes. and Labels sizes. Labels attached attached to toboxes boxes correspond correspond to to classes classes landmarks shown shown Figure7. 7. However, some landmarks such as CocaCola Figure 8c has its boundg box placed area actual Maches landmark, 2019, but 7, box size did not match with landmark size. 10 14 Mean Average Precision (map) was used for quantitative evaluation landmark detection experiments. map is considered to be actual metric to measure accuracy object detectors. The map is mean value average precisions (AP) from all object classes. In this paper, we refer to this as landmark classes. AP is average maximum precisions at different recall values, which both precision and recall can be calculated by followg equations: TP P =, (1) TP + FP TP R =, (2) TP + FN where P is precision, R is recall, TP is amount correct boundg boxes comparg from detection and reference boxes dataset, FP is amount missed or misplaced boundg boxes that appeared detection results, and FN is amount missed boundg boxes that did not appear detection results, but existed reference dataset. The correct boundg boxes were measured from ratio tersection over union (IoU), which is ratio between tersection area and union area boundg boxes, comparg detection results with reference data. The higher IoU ratio means less detection error allowance, which can also reduce outcome (c) AP values. In (a) (b) this paper, IoU 0.5 and 0.7 were employed for measurg detection accuracy, similar to [17] which Figure used 8. an Samples IoU 0.7. images AP values test all landmark set classes Fasterand R-CNN mean landmark values detection (map) results: 0.5 and 0.7 IoU (a) ratio Sample values 1; (b) are Sample displayed 2; (c) Sample Table 3. 1. Table Mean Average 1. Precision precision (map) values was used detection for results quantitative from Faster evaluation R-CNN, with landmark 0.5 and detection 0.7 experiments. tersection map over is union considered (IoU). to be actual metric to measure accuracy object detectors. The map is mean value average precisions (AP) from all object classes. In this paper, we refer to Class AP0.5 AP0.7 1 ( FamilyMart ) 0.9024 0.8786 2 ( CocaCola ) 0.8281 0.5823 3 ( BicycleLane ) 0.8040 0.5466 4 ( NoTruck ) 0.8573 0.8573 5 ( Crossg ) 0.8500 0.8500 6 ( Lawson ) 0.7682 0.7206 7 ( TimesParkg ) 0.6156 0.4966

Maches 2019, 7, 25 10 14 this as landmark classes. AP is average maximum precisions at different recall values, which both precision and recall can be calculated by followg equations: P = R = TP TP + FP, (1) TP TP + FN, (2) where P is precision, R is recall, TP is amount correct boundg boxes comparg from detection and reference boxes dataset, FP is amount missed or misplaced boundg boxes that appeared detection results, and FN is amount missed boundg boxes that did not appear detection results, but existed reference dataset. The correct boundg boxes were measured from ratio tersection over union (IoU), which is ratio between tersection area and union area boundg boxes, comparg detection results with reference data. The higher IoU ratio means less detection error allowance, which can also reduce outcome AP values. In this paper, IoU 0.5 and 0.7 were employed for measurg detection accuracy, similar to [17] which used an IoU 0.7. AP values all landmark classes and mean values (map) 0.5 and 0.7 IoU ratio values are displayed Table 1. Table 1. Average precision values detection results from Faster R-CNN, with 0.5 and 0.7 tersection over union (IoU). Class AP 0.5 AP 0.7 1 ( FamilyMart ) 0.9024 0.8786 2 ( CocaCola ) 0.8281 0.5823 3 ( BicycleLane ) 0.8040 0.5466 4 ( NoTruck ) 0.8573 0.8573 5 ( Crossg ) 0.8500 0.8500 6 ( Lawson ) 0.7682 0.7206 7 ( TimesParkg ) 0.6156 0.4966 8 ( LawsonParkg ) 0.8360 0.7815 9 ( RoadSign 1 ) 0.9904 0.9235 Mean 0.8280 0.7375 From Table 1, map values are 0.8280 and 0.7375 for 0.5 and 0.7 IoU, respectively. This implies that landmark detection accuracies Faster R-CNN are 82.80% for 0.5 IoU and 73.75% for 0.7 IoU. Though map values were higher than 80% when IoU is 0.5, map decreased to around 70% as IoU creased to 0.7. This means landmarks could be detected but may not be precise or have high accuracy. Comparg to well-configured examples presented [19 21], accuracy our Faster R-CNN was moderately lower. This reduction detection accuracy is cause a lower localization accuracy, as landmark detection results are required to generate localization results Faster R-CNN localization method. 3.3. Localization Experiments The localization methods presented this paper were implemented and evaluated several localization experiments. All images test set were processed Faster R-CNN localization and CNN localization methods to generate localization results. In addition to two proposed localization methods, we added a CNN localization method based on well-known CNN for classification called AlexNet [25]. We replaced last layers AlexNet with regression layers, similar to our second localization method. Trag AlexNet was same as our CNN second localization method. We employed AlexNet for localization as reference for our second localization method, sce re was no evaluation metric for testg our CNN design. Results from localization methods,

Maches 2019, 7, 25 11 14 cludg location coordates latitude and longitude, and compass orientations were n passed on to evaluation. Evaluation localization results was done by calculatg absolute errors and distance between two pots, generated results and reference geolocation data test dataset. Three absolute errors were considered experiments: mean, mimum and maximum absolute errors. The mean absolute error is calculated from followg equation: MAE = 1 n n a i b i, (3) where MAE is mean absolute error, a is result from localization, b is reference value test dataset, and n is amount data test set, which was 427 experiments. The mimum and maximum absolute errors are smallest and largest values absolute errors. The distances between two pots were calculated from location coordates generated and reference data. We used haverse formula to calculate distances from latitude and longitude two pots. The haverse formula is widely used computer programmg to determe distance between two pots on a great sphere, which commonly referred to as Earth. The implemented haverse formula is as follows: D = 2 r arcs s 2 ( y2 y 1 2 ) i=1 ( ) + cos(y 1 ) cos(y 2 ) s 2 x2 x 1 2, (4) where D is distance between two pots kilometers, r is earth radius, which were applied as 6378.1 km [26], y 1 is latitude localization results radius, y 2 is reference latitude radius, x 1 is longitude localization results radius, and x 2 is reference longitude radius. In addition to absolute errors and distance errors, we also calculated standard errors from localization results and distance errors. The standard errors were calculated to measure deviations all results, which equation for standard errors can be described mamatically as; SE = σ n, (5) where SE is standard error, σ is standard deviation result, and n is amount data, which was 427 for test set. Table 2 shows each localization error and distances between real and generated robot location. The mean, mimum, maximum and standard errors are calculated from absolute errors. Localization errors are distances meters calculated from location coordates. It can be seen from Table 2 that Faster R-CNN localization method outperforms both CNNs terms location and distance errors. Mean absolute errors latitude and longitude from Faster R-CNN method are slightly lower than CNN methods, while mimum errors are also slightly lower case Faster R-CNN. There are some differences maximum errors latitude and longitude for Faster R-CNN, CNN and AlexNet. The AlexNet has lower errors than CNN localization, while Faster R-CNN yields least errors. These small errors latitude and longitude cause significant differences distance errors. The average distance error Faster R-CNN is 28 m which is less than half distance error CNN method (70 m). The reference AlexNet has an average distance error around 50 m. In case mimum errors, distance error from Faster R-CNN is less than 1 m, while both CNNs for localization have distance errors around 3 m. On maximum errors, Faster R-CNN method has a distance error around 177 m, which is lower than CNN localization methods that have a distance error 238 m. The reference AlexNet, however, gave distance error 322 m which is highest among maximum distance errors.

Maches 2019, 7, 25 12 14 Table 2. Localization errors proposed methods. Errors Faster R-CNN CNN CNN (AlexNet) Mean Latitude 2.4367 10 4 5.2166 10 4 3.4441 10 4 Errors Longitude 4.0868 10 5 2.7269 10 4 2.2187 10 4 Compass 54.9425 32.0381 17.0498 Distance (m) 28.4739 70.5796 49.8166 M Latitude 1.0000 10 6 2.4417 10 6 2.3391 10 6 Errors Longitude 1.8654 10 7 3.4527 10 7 2.4863 10 7 Compass 0.3826 0.1986 0.0374 Distance (m) 0.5396 3.3217 3.3838 Max Latitude 0.0011 0.0021 0.0026 Errors Longitude 1.6409 10 4 0.0010 0.0013 Compass 179.0098 152.3111 173.2717 Distance (m) 176.9496 238.2083 321.9153 Standard Latitude 4.0797 10 5 4.0797 10 5 4.0797 10 5 Errors Longitude 2.1783 10 6 2.1783 10 6 2.1783 10 6 Compass 6.0188 4.7458 4.9259 Distance (m) 1.4299 2.0971 1.5464 However, performances Faster R-CNN localization suffer a decle compass accuracy. On average, compass orientations from Faster R-CNN method can have errors around 55. Comparg to average errors compass orientations from both CNNs, error from Faster R-CNN is higher, which mean orientations errors from CNN is only around 32 for our CNN and only 17 for AlexNet. In best case with mimum errors, CNN methods also gave good results with mimum compass error around 0.2 from our CNN and 0.03 error from AlexNet, which are smaller than mimum compass error from Faster R-CNN method 0.3. For worst case, all localization methods have maximum compass errors near 180, while our CNN gives error 152 which is smallest among maximum compass errors. The standard errors dicated that latitude and longitude coordates resulted from all localization methods share similar deviation, while Faster R-CNN method has higher compass differences, and CNN methods have higher distance error differences results. 4. Conclusions This paper proposed and tested two outdoor localization methods based on CNN and Faster R-CNN for mobile robots. The performance was evaluated outdoor environment localization tasks. In addition, AlexNet was implemented order to compare performance. Faster R-CNN localization method was also tested for landmark detection. Results from landmark detection yielded good performance, with more than 70% detection accuracy. Good detection performance Faster R-CNN led to good localization performance Faster R-CNN based method, with approximately less than 1 m distance error and less than 1 compass error best case. The CNN localization method also had good performance best case, with approximately 3 m location error and less than 1 compass error. The results from average and worst cases poted that Faster R-CNN performs best among tested approaches for localization tasks, while performance decles for compass orientations. However, re is still space to improve localization results. The average location errors proposed methods were relatively high, compared to GPS that has approximately 4.9 m error [7]. The orientation errors our proposed methods were relatively high resultg a poor performance compared with reference AlexNet. Some possible causes high orientation errors are as follows: The development and experiments proposed localization methods were done with a small amount data compared with or works that use more than half a million data.

Maches 2019, 7, 25 13 14 There were less environmental variations durg data garg, despite attempts on garg data multiple environmental conditions. Small environmental variations can be cause poor performance. The performance Faster R-CNN localization method relied on landmark detection. If we improve landmark detection, it will improve performance Faster R-CNN localization. Despite small amount data, proposed Faster R-CNN and CNN localization methods performed well for robot localization tasks. In future we will focus on improvg performance localization methods and implement m on real robot. We will focus on contuous learng or transfer learng, order to improve performance without creasg amount data. Author Contributions: Conceptualization, S.N. and G.C.; methodology, S.N.; stware, S.N.; validation, S.N., D.H. and G.C.; formal analysis, S.N.; vestigation, S.N.; resources, G.C.; data curation, S.N.; writg origal draft preparation, S.N.; writg review and editg, S.N., D.H. and G.C.; visualization, S.N.; supervision, S.K. and G.C.; project admistration, G.C. Fundg: This research received no external fundg. Conflicts Interest: The authors declare no conflict terest. References 1. Roland, S.; Illah, R.N. Mobile Robot Localization. In Introduction to Autonomous Mobile Robots, 1st ed.; Bradford Company Scituate: Cambridge, MA, USA, 2004; pp. 181 256. 2. Muhammad, A.G.; Kundan, K.; Muhammad, A.J.; Muhammad, S. High-voltage transmission le spection robot. In Proceedgs International Conference on Engeerg and Emergg Technologies (ICEET), Lahore, Pakistan, 22 23 February 2018. 3. Seiko, P.Y.; Filip, K.; Takanori, E.; Yukori, K.; Tadeusz, U. Autonomous position control multi-unmanned aerial vehicle network designed for long range wireless data transmission. In Proceedgs 2017 IEEE/SICE International Symposium on System Integration (SII), Taipei, Taiwan, 11 14 December 2017. 4. Richa, W.; Muhammad, R.; Roby, A.W.; Ontoseno, P. Path planng mobile robot usg waypot for gas level mappg. In Proceedgs 2017 International Semar on Intelligent Technology and Its Applications (ISITIA), Surabaya, Indonesia, 28 29 August 2017. 5. Bruno, V.M.; Sebastián, A.V.; Alejandro, R.; Gerardo, G.A. GPS aided strapdown ertial navigation system for autonomous robotics applications. In Proceedgs 2017 XVII Workshop on Information Processg and Control (RPIC), Mar del Plata, Argenta, 20 22 September 2017. 6. Shamsud, A.U.; Ohno, K.; Hamada, R.; Kojima, S.; Westfechtel, T.; Suzuki, T.; Okada, Y.; Tadokoro, S.; Fujita, J.; Amano, H. Consistent map buildg petrochemical complexes for firefighter robots usg SLAM based on GPS and LIDAR. Robomech 2018, 5, 1 13. [CrossRef] 7. GPS.gov: GPS Accuracy. Available onle: https://www.gps.gov/systems/gps/performance/accuracy/ (accessed on 26 December 2018). 8. Kumar, V.; Jawahar, C.V.; Visesh, C. Accurate localization by fusg images and GPS signals. In Proceedgs 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA, 7 12 June 2015. 9. Chang, L.; Honglun, W.; Na, L.; Yue, Y. Sensor fault diagnosis GPS/INS tightly coupled navigation system based on state chi-square test and improved simplified fuzzy ARTMAP neural network. In Proceedgs 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO), Macau, Cha, 5 8 December 2017. 10. Santos, E.R.S.; Azpurua, H.; Rezeck, P.A.F.; Corrěa, M.F.S.; Freitas, G.M.; Macharet, D.G. Global localization mobile robots usg local position estimation a geo tagged wireless node sensor network. In Proceedgs Lat American Robotic Symposium, 2018 Brazilian Symposium on Robotics (SBR) and 2018 Workshop on Robotics Education (WRE), Joao Pessoa, Brazil, 6 10 November 2018. 11. Nilesh, S.; Peshala, G.J.; Takashi, K. 3D pose trackg for GPS-denied terra rovers by fast state variable extension and enhanced motion model. In Proceedgs 2017 17th International Conference on Control, Automation and Systems (ICCAS), Jeju, South Korea, 18 21 October 2017.

Maches 2019, 7, 25 14 14 12. Zhou, B.; Tang, Z.; Qian, K.; Fang, F.; Ma, X. A LiDAR Odometry for Outdoor Mobile Robots Usg NDT Based Scan Matchg GPS-denied environments. In Proceedgs IEEE 7th Annual International Conference on CYBER Technology Automation, Control, and Intelligent Systems (CYBER), Honolulu, HI, USA, 31 July 4 August 2017. 13. Kottath, R.; Yalamandala, D.P.; Poddar, S.; Bhondekar, A.P.; Karar, V. Inertia constraed visual odometry for navigational applications. In Proceedgs 2017 4th International Conference on Image Information Processg (ICIIP), Shimla, India, 21 23 December 2017. 14. Saska, M.; Baca, T.; Thomas, J.; Chudoba, J.; Preucil, L.; Krajnik, T.; Faigl, J.; Loianno, G.; Kumar, V. System for deployment groups unmanned micro aerial vehicles GPS-denied environments usg onboard visual relative localization. Auton. Robots 2017, 41, 919 944. [CrossRef] 15. Hannes, S.; Peter, Z.; Frank, H.; Eric, S. GPS-dependent localization for f-road vehicles usg ultra-wideband (UWB). In Proceedgs 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16 19 October 2017. 16. Shuhui, J.; Yu, K.; Yun, F. Deep Geo-constraed Auto-encoder for Non-landmark GPS Estimation. IEEE Trans. Big Data 2017, press. [CrossRef] 17. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedgs 29th Annual Conference on Neural Information Processg Systems (NIPS), Montreal, QC, Canada, 7 12 December 2015. 18. Girshick, R. Fast R-CNN. In Proceedgs 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7 13 December 2015. 19. Saleh, K.; Hossny, M.; Hossny, A.; Nahavandi, S. Cyclist detection LIDAR scans usg faster R-CNN and syntic depth images. In Proceedgs 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16 19 October 2017. 20. Zhang, H.; Du, Y.; Ng, S.; Zhang, Y.; Yang, S.; Du, C. Pedestrian Detection Method Based on Faster R-CNN. In Proceedgs 2017 13th International Conference on Computational Intelligence and Security (CIS), Hong Kong, Cha, 15 18 December 2017. 21. Wang, R.; You, Y.; Zhang, Y.; Zhou, W.; Liu, J. Ship detection foggy remote sensg image via scene classification R-CNN. In Proceedgs 6th IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC), Guiyang, Cha, 22 24 August 2018. 22. Robert, G.; David, H.J.J.; Heiko, H.S.; Jonas, R.; Matthias, B.; Felix, A.W. Comparg deep neural networks agast humans: Object recognition when signal gets weaker. arxiv, 2017; arxiv:1706.06969v2. 23. Epste, R.A.; Vass, L.K. Neural systems for landmark-based wayfdg humans. Philos. Trans. R. Soc. B Biol. Sci. 2013, 369, 1 7. [CrossRef] [PubMed] 24. Bayar, B.; Stamm, M.C. Design Prciples Convolutional Neural Networks for Multimedia Forensics. Electron. Imagg Med. Watermark. Secur. Forensics 2017, 10, 77 86. [CrossRef] 25. Alex, K.; Ilya, S.; Gefrey, E.H. ImageNet classification with deep convolutional neural networks. In Proceedgs 25th International Conference on Neural Information Processg Systems (NIPS 12), Lake Tahoe, NV, USA, 3 6 December 2012. [CrossRef] 26. Mamajek, E.E.; Prsa, A.; Torres, G.; Harmanec, P.; Asplund, M.; Bennett, P.; Capitae, N.; Christensen-Dalsgaard, J.; Depagne, É.; Folkner, M.W.; et al. Resolution B3 on Recommended Nomal Conversion Constants for Selected Solar and Planetary Properties. In Proceedgs 29th IAU General Assembly (IAU 2015), Honolulu, HI, USA, 3 14 August 2015. 2019 by authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under terms and conditions Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).