entropy Article Multisensor Estimation Fusion with Gaussian Process for Nonlinear Dynamic Systems Yiwei Liao 1, Jiangqiong Xie 1, Zhiguo Wang 2,3 and

entropy Article Multisensor Estimation Fusion with Gaussian Process for Nonlinear Dynamic Systems Yiwei Liao 1, Jiangqiong Xie 1, Zhiguo Wang 2,3 and Xiaojing Shen 1, * 1 School of Mathematics, Sichuan University, Chengdu 610064, China; lyw@stu.scu.edu.cn (Y.L.); xiejiangqiong@163.com (J.X.) 2 School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China; wangzhiguo@cuh.edu.cn 3 Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei 230026, China * Correspondence: shenxj@scu.edu.cn Received: 9 October 2019; Accepted: 11 November 2019 ; Published: 16 November 2019 Abstract: The Gaussian process is gaining increasing importance in different areas such as signal processing, machine learning, robotics, control and aerospace and electronic systems, since it can represent unnown system functions by posterior probability. This paper investigates multisensor fusion in the setting of Gaussian process estimation for nonlinear dynamic systems. In order to overcome the difficulty caused by the unnown nonlinear system models, we associate the transition and measurement functions with the Gaussian process regression models, then the advantages of the non-parametric feature of the Gaussian process can be fully extracted for state estimation. Next, based on the Gaussian process filters, we propose two different fusion methods, centralized estimation fusion and distributed estimation fusion, to utilize the multisensor measurement information. Furthermore, the equivalence of the two proposed fusion methods is established by rigorous analysis. Finally, numerical examples for nonlinear target tracing systems demonstrate the equivalence and show that the multisensor estimation fusion performs better than the single sensor. Meanwhile, the proposed fusion methods outperform the convex combination method and the relaxed Chebyshev center covariance intersection fusion algorithm. Keywords: multisensor estimation fusion; Gaussian process; nonlinear dynamic systems; data driven modeling; target tracing; information fusion 1. Introduction Estimation in nonlinear systems is extremely important because almost all practical systems involve nonlinearities of one ind or another [1,2], such as target tracing, vehicle navigation, automatic control and robotics [3,4]. In the case of nonlinearities, estimation cannot be obtained analytically in general. Some methods based on exact parametric models and ideas of the Kalman filter (KF) have been developed. The Extended Kalman filter (EKF) [5] was the most common application to nonlinear systems, which simply linearizes all nonlinear functions via Taylor-series expansion and substitutes Jacobian matrices for the linear transformations into the KF equations. The unscented transformation was introduced to address the deficiencies of linearization, namely unscented Kalman filters (UKF) [1], which provided a more direct and explicit mechanism for transforming mean and covariance matrices. Under the Bayesian framewor, particle filter (PF) was presented by constructing the posterior probability density function of the state based on all available information in Reference [6]. However, for nonlinear systems, these methods were usually assumed that the nonlinear relationships are nown. Lac of modeling accuracy, including the identification of the noise and the model Entropy 2019, 21, 1126; doi:10.3390/e21111126 www.mdpi.com/journal/entropy

Entropy 2019, 21, 1126 2 of 26 parameters, was inevitable [7]. In many applications, most real dynamic systems are difficult to obtain the exact models and system noises due to the complexity of the target motion environment, therefore, the parameterized estimation methods may be invalid. To overcome the limitations of parametric models, researchers have recently employed so called non-parametric methods lie the Gaussian process [8] to learn models for dynamic systems. More specifically, the functional representation needs to be learned from the training data before the filtering prediction and update step [9]. Gaussian processes have been increasingly attracting the interests in machine learning, signal processing, robotics, control and aerospace and electronic systems [10 12]. For example, Gaussian process models are used as the surrogate models for complex physics models in Reference [13]. The advantages stemmed from the fact that Gaussian processes tae both the noise in the system and the uncertainty in the model into consideration [14]. In the context of modeling the dynamic systems, Gaussian processes can be used as prior over the transition function and measurement function. By analyzing the correlation among given training data, Gaussian process models can provide the posterior distributions over functions through the combination of the prior and the data. For the cases that ground truth data are unavailable or can only be determined approximately, Gaussian process latent variable models were developed in References [15,16] and they were extended to the setting of dynamical robotics systems in Reference [17]. In order to reduce the cubic complexity of Gaussian process training for the fixed number of training points, sparse Gaussian processes were developed (see e.g., [18 23]). K-optimality was used to improve the stability of the Gaussian process prediction in Reference [24]. So far, Gaussian process models have been applied successfully to massive nonlinear dynamic systems. Motivated by modeling human motion, learning nonlinear dynamical models with Gaussian process was investigated in Reference [16]. Many filtering methods, such as the Gaussian process extended Kalman filters (GP-EKF) [25], the Gaussian process unscented Kalman filters (GP-UKF) [26], the Gaussian process particle filters (GP-PF) [27], GP-BayesFilters [14] and the Gaussian process assumed density filters (GP-ADF) [7,10] were derived by incorporating the Gaussian process transition model and Gaussian process measurement model into the classic filters. GP-ADF was an efficient form of assumed density filtering (ADF) introduced in References [28 30] and propagated the full Gaussian density [7]. Although Gaussian processes have been around for decades, they mainly focus on single sensor systems. With the rapid development of the sensor technology, computer science, communication technology and information technology, multisensor systems are widely used in military and civil fields [31 35]. Benefited from the application of multiple sensors, multisensor data fusion maes more comprehensive and accurate decision by integrating the available information from multiple sensors and has attracted lots of research interests. Generally speaing, the multisensor estimation fusion mainly contains centralized estimation fusion and distributed estimation fusion. Centralized estimation fusion is that a central processor receives all measurement data from sensors without processing and uses them to estimate the state [36,37]. In general, many filtering algorithms in single sensor system can be applied to the multisensor systems, since measurement value can be staced and regarded as one measurement. On the other hand, distributed estimation fusion has several advantages in terms of reliability, survivability, communication bandwidth and computational burdens [38 42], which mae it desirable in real applications such as surveillance, tracing and battle management. In the distributed setting, each local sensor deals with its own measurement data and transmits the local state estimation to the fusion center for the purpose of more accurate estimation fusion. So far, a variety of distributed fusion methods have been investigated for different occasions, such as References [37,40,43 46]. A convex combination method was given to fuse the local estimates in Reference [43]. For the unavailable cross correlation matrix problem, a relaxed Chebyshev center covariance intersection (RCC-CI) algorithm was also provided to fuse the local estimates in Reference [37]. A current review of distributed multisensor data fusion under unnown correlation can be seen in Reference [47]. However, these multisensor estimation fusion methods are mainly based on

Entropy 2019, 21, 1126 3 of 26 the exact dynamic models or nown nonlinear functions. For the cases in which accurate parametric models are difficult to obtain, it is worth considering integrating Gaussian processes with multisensor estimation fusion to improve the system s performance. In this paper, we focus on multisensor estimation fusion including the centralized and distributed fusion methods with Gaussian process for nonlinear dynamic systems. Firstly, given the training data, we learn the dynamic models with the Gaussian process and derive multisensor estimation fusion methods based on the Gaussian process models for nonlinear dynamic systems, which can avoid inappropriate parametric models and improve predictive ability. Combining with the single senor GP-ADF and GP-UKF, respectively, the prediction step and update step of the multisensor estimation fusion are provided. In general, it is hard to analyze the performance of the non-parametric fusion methods. Since the Gaussian process fusion methods have analytic mean and covariance, we show that the distributed estimation fusion is equivalent to the centralized estimation fusion with the single sensor cross terms being full column ran. Numerical examples show that the equivalence is satisfied under the condition and the multisensor estimation fusion performs better than the single sensor. We also compare the proposed fusion methods with the RCC-CI [37] algorithm and the convex combination method [43]. In the Gaussian process model setting, the simulation results show that the multisensor estimation fusion methods outperform the RCC-CI algorithm and the convex combination method, as far as the estimation accuracy is concerned. This article also extends our earlier wor [48]. Compared with the conference paper, the main differences are as follows: Detailed proofs are provided. The enhancement of the multisensor fusion algorithm with GP-UKF is presented. Additional set of extensive experiments are carried out. The equivalence condition of Proposition 1 is analyzed. Comparison between GP-ADF fusion and GP-UKF is given. The rest of the paper is organized as follows. In Section 2, the problem formulation and the Gaussian process are introduced. In Section 3, the centralized estimation fusion and the distributed estimation fusion methods are presented. In Section 4, simulations are provided to confirm our analysis. Some conclusions are drawn in Section 5. 2. Preliminaries 2.1. Problem Formulation In this paper, we consider the state estimation problem of a nonlinear dynamic system with additive noise and N (N 2) sensor measurements. The multisensor nonlinear dynamic system with state equation and N measurement equations (see Figure 1) is described as follows: x = h(x 1 ) + w, = 0, 1,..., (1) y m = gm (x ) + v m, m = 1,..., N, (2) where x R D is the state of the dynamic system at time and y m Rl m is the measurement of the m th sensor at time, m = 1,..., N. h(x ) is the transition function of the state x and g m (x ) is the nonlinear measurement function of x at the m th sensor. w N(0, Q ) is a Gaussian system noise and v m N(0, Rm ) is a Gaussian measurement noise, which is independent of each other. Our goal is to estimate the state from all the available sensor measurement information. Gaussian processes are regarded as the prior of the transition function h(x 1 ) and the sensor measurement function g m (x ), m = 1,..., N, respectively. Then we mae inference about the posterior distribution of the transition function and the sensor measurement function. The Gaussian process model represents a powerful tool to mae Bayesian inference about functions [49].

Entropy 2019, 21, 1126 4 of 26 h() x 1 x x 1 y m 1 y m y m 1 m g () 2.2. Gaussian Processes Figure 1. Graphical structure for multisensor nonlinear dynamic systems. A Gaussian process is defined over functions, which is a generalization of the Gaussian probability distribution [8]. It means that we should consider inference directly in function space. Also, it gives a formal definition, the Gaussian process is defined a collection of random variables, any finite number of which have a joint Gaussian distribution, in [8]. Similar to a Gaussian distribution, nowledge of both mean and covariance function can specify a Gaussian process. The mean function m(x) and covariance function (also called a ernel) (x, x ) of a Gaussian process f (x) are defined as follows, m(x) = E[ f (x)], (x, x ) = E[( f (x) m(x))( f (x ) m(x ))], where E[ ] denotes the expectation. Thus, we write the Gaussian process as f (x) GP(m(x), (x, x )). Unless stated otherwise, a prior mean function is usually assumed to be 0. The choice of the covariance functions depends on the application [14]. In this paper, we employ the most commonly-used squared exponential (SE) ernel in machine learning, which is the prototypical stationary covariance function and useful for modeling particularly smooth functions. It is defined as Cov( f (x), f (x )) = (x, x ) = α 2 exp{ 1 2 (x x ) T Λ 1 (x x )}, (3) where parameter α 2 represents the variance of the function f that controls the uncertainty of predictions in areas of less training sample density and parameter Λ is a diagonal matrix of the characteristic length-scales of the SE ernel. Other commonly employed ernel functions can be seen in Reference [8]. A Gaussian process implies a distribution over functions based upon the obtained training data. Assume that we have obtained a set of training data X = {X, y}. X and y are made up of multiple samples drawn from the following standard Gaussian process regression model y = f (x) + ε, ε N (0, σ 2 ε ), (4)

Entropy 2019, 21, 1126 5 of 26 where f : R D R and f (x) GP, N denotes the normalized Gaussian probability density function. Note that Gaussian process regression uses the fact that any finite set of training data and testing data of a Gaussian process are jointly Gaussian distributed. Let θ = {α 2, Λ, σ 2 ε }, called the hyper-parameters of the Gaussian process. Using the evidence maximization method [8,50], we obtain a point estimate ˆθ = arg max θ logp(y X, θ), from the nown training data [51]. We can solve the optimization problem with numerical optimization techniques such as conjugate gradient ascent [8,14]. Next, we infer the posterior distribution over function value f (x ) from training data for each inputs x R D. In addition, the test input x can be categorized into two classes, relying on whether it is uncertain or not. The corresponding predictive distribution over f (x ) will be presented as follows. (1) Predictive Distribution over Univariate Function Suppose that the training data is X = (X, y)={(x i, y i ) : i = 1,..., n}, where x i is a D-dimensional input vector, y i is a scalar output, and n is the number of training data. Next, two cases, deterministic inputs and uncertain inputs, are considered. Deterministic Inputs Conditioned on the training data and the deterministic test input x, the predictive distribution over f (x ) is a Gaussian with the mean m f (x ) = E[ f ] = T (K + σ 2 ε I) 1 y = T β, (5) and the variance where Var f [ ] represents the variance with respect to f, σ 2 f (x ) = Var f [ f ] = T (K + σ 2 ε I) 1, (6) := (X, x ), := (x, x ), β := (K + σ 2 ε I) 1 y, and K is the ernel matrix with each element satisfying K ij = (x i, x j ). The uncertainty of prediction is characterized by the variance σ 2 f (x ). More details can be seen in Reference [8]. Uncertain Inputs When the test input is uncertain, namely x has a probability distribution, the prediction problem is relatively difficult. Let us consider the predictive distribution of f (x ) for the uncertain input x N (µ, Σ). According to the results in Reference [7] (or reviewing References [52,53]), we introduce the predictive distribution over the function value as p( f (x ) µ, Σ) = p( f (x ) x )p(x µ, Σ)dx, (7)

Entropy 2019, 21, 1126 6 of 26 where the mean and variance of the distribution p( f (x ) x ) are provided with Equations (5) and (6), respectively. Based on the conditional expectation and variance formulae, it yields the mean µ and variance σ 2 of the distribution p( f (x ) µ, Σ) in closed form. In particular, we have and µ = E x [E f ( f (x ) x ) µ, Σ] = E x [m f (x ) µ, Σ] = m f (x )N (x µ, Σ)dx = β T l, (8) σ 2 = E x [m f (x ) 2 µ, Σ] + E x [σ 2 f (x ) µ, Σ] E x [m f (x ) µ, Σ] 2 Note that l in Equation (8) is from l = [l 1,..., l n ] T, = β T Lβ + α 2 tr((k + σ 2 ε I) 1 L) (µ ) 2. (9) l j = f (x j, x )p(x )dx = α 2 ΣΛ 1 + I 1 2 exp{ 1 2 (x j µ) T (Σ + Λ) 1 (x j µ)}. tr( ) represents the trace in Equation (9) and we have and L ij = f (x i, µ) f (x j, µ) 2ΣΛ 1 + I 1 2 exp{( z ij µ) T (Σ + 1 2 Λ) 1 ΣΛ 1 ( z ij µ)}, z ij := 1 2 (x i + x j ). In general, the predictive distribution in Equation (7) cannot be analytically calculated since it leads to a non-gaussian distribution, if a Gaussian distribution is mapped through a nonlinear function. Using moment-matching method, the distribution p( f (x ) µ, Σ) can be approximated by the Gaussian distribution N (µ, σ 2 ). (2) Predictive Distribution over Multivariate Function For the model (4), let us turn to the multiple output case that f : R D R E, f GP. Following the results in References [7,51], we need to train E Gaussian process regression models independently. The ath model is learned from the training data [X, y a ], y a = [y1 a,..., ya n] T, a = 1,..., E, where y a j is the ath element of y j. It implies that any two targeted dimensions are conditionally independent given the input. For any deterministic input x, the mean and variance of each target dimension are obtained by Equations (5) and (6). The predictive mean of f (x ) is a vector staced by E targeted dimension mean, and the corresponding covariance is a diagonal matrix whose diagonal element is the variance of each targeted dimension. With the uncertain input x N (µ, Σ), the predictive mean µ of f (x ) also is the staced E predictive mean µ a given by Equation (8), a = 1,..., E. Unlie predicting at deterministic

Entropy 2019, 21, 1126 7 of 26 inputs, targeted dimensions are dependent due to the uncertain input. Denote fa the corresponding predictive covariance matrix is given by = f a (x ) and Σ µ, Σ Var[ f1 µ, Σ]... Cov[ f 1, f E µ, Σ] =....... (10) Cov[ fe, f 1 µ, Σ]... Var[ f E µ, Σ] It is obvious that the predictive covariance is no longer a diagonal matrix. Using Equation (9), we can obtain the diagonal element of covariance matrix (10). For each a, b {1,..., E}, the off-diagonal elements satisfy Cov[ f a, f b µ, Σ] = E f,x [ f a (x ) f b (x ) µ, Σ] µ a µ b. Given x, f a (x ) and f b (x ) are independent, then we have E f,x [ fa fb µ, Σ] = fa fb p( f a, f b x )p(x µ, Σ)d f dx (11) = β T a a f (X, x ) b f (x, X)p(x µ, Σ)dx β b, where β a, β b can be similarly obtained in Equation (5). For notational simplicity, define where L := a f (X, x ) b f (x, X)p(x µ, Σ)dx, L ij = α 2 aα 2 b (Λ 1 a + Λ 1 b )Σ + I 1 2 exp{ 1 2 (x i x j ) T (Λ a + Λ b ) 1 (x i x j )} exp{ 1 2 (z ij µ) T R 1 (z ij µ)}, R := (Λ 1 a + Λ 1 b ) 1 + Σ, z ij := Λ b (Λ a + Λ b ) 1 x i + Λ a (Λ a + Λ b ) 1 x j. Thus, we also approximate the distribution p( f (x ) µ, Σ) with the Gaussian distribution N (µ, Σ ). More details can be found in Reference [51]. 3. Multisensor Estimation Fusion In this section, an essential prerequisite is that the transition function (1) and the measurement functions (2) are either not nown or no longer accessible. Thus, we model the latent functions with Gaussian processes. For the N-sensor dynamic systems (1) and (2), suppose that we have obtained some training data of the target state and sensor measurement. In the following, we discuss the centralized and distributed estimation fusion methods to estimate the state from a sequence of noisy sensor measurements.

Entropy 2019, 21, 1126 8 of 26 3.1. Centralized Estimation Fusion First, we consider the centralized estimation fusion method. To facilitate fusion, we stac the multisensor measurement information as follows y = [(y 1 )T,..., (y N )T ] T, g(x ) = [(g 1 (x )) T,..., (g N (x )) T ] T, (12) v = [(v 1 )T,..., (v N )T ] T. Thus the dynamic system (1) and (2) can be rewritten as where the covariance of the noise v is given by x = h(x 1 ) + w, (13) y = g(x ) + v, (14) Cov(v ) =Σ = diag(σ 1,..., ΣN ), Cov(v m ) =Σm, m = 1,..., N. Considering the Gaussian process dynamic system setup, two Gaussian process models can be trained using evidence maximization. GP h models the mapping x 1 x, R D R D and GP g models the mapping x y, R D R N E. As we now, the ey of the estimation is to recursively infer the posterior distribution over the state x of the dynamic system (13) (14), which are based on all the sensor measurements y 1: = {y τ } τ=1, including the prediction step and the update step. In particular, the prediction step uses the posterior distribution from the previous time step to produce a predictive distribution of the state at the current time step. In the update step, the current predictive distribution is combined with current measurement information to refine the posterior distribution at the current time step. Next, centralized estimation fusion methods with GP-ADF and GP-UKF are presented, respectively. (1) Fusion with GP-ADF Note that some approximation methods are required to obtain an analytic posterior distribution for the nonlinear dynamic systems. In particular, for the Gaussian process dynamic system, it is easy to mae some Gaussian approximation to the posterior distribution. GP-ADF exploits the fact that the true moments of the Gaussian process predictive distribution can be computed in closed form. The predictive distribution is approximated by a Gaussian with the exact predictive mean and covariance [7]. Based on the following lemma, we present the centralized estimation fusion method with GP-ADF. Lemma 1. For the Gaussian process dynamic system (13) (14), the posterior distribution of the state x can be approximated by the Gaussian distribution N (µ e, Ce ), in which the mean µe and covariance Ce are given as [7]: µ e = µp + C x y (C y ) 1 (y µ y ), (15) C e = Cp C x y (C y ) 1 C T x y, (16)

Entropy 2019, 21, 1126 9 of 26 where µ p = E[x y 1: 1 ], C p = Cov(x y 1: 1 ), µ y = E[y y 1: 1 ], C y = Cov(y y 1: 1 ), C x y = Cov(x, y y 1: 1 ). (17) Remar 1. Since GP-ADF algorithm [7] is a state estimation method in the single sensor Gaussian process dynamic system, it is applicable to the multisensor system with all the sensor measurements staced into a vector. Therefore, Equation (15) and Equation (16) are called the centralized estimation fusion method with GP-ADF here. Then, let us present the prediction step and update step of the above fusion method in detail. Prediction Step First, this step needs to use the previous posterior distribution p(x 1 y 1: 1 ) N (µ e 1, Ce 1 ) to produce a current predictive distribution p(x y 1: 1 ). We write the predictive distribution as p(x y 1: 1 ) = p(x x 1 )p(x 1 y 1: 1 )dx 1, and p(x x 1 ) is a Gaussian distribution due to h GP and w N (0, Σ w ). From this, an analogy with Equation (7) yields the mean µ p and the variance Cp of the state x conditioned on the measurement y 1: 1. In particular, µ p = E(x y 1: 1 ) = E[h(x 1 ) y 1: 1 ] (18) = E x 1 [E h (h(x 1 ) x 1 ) y 1: 1 ]. µ p is a D-dimension mean vector and the computation of every target dimension can be seen in Equation (8). The corresponding covariance matrix C p = Cov(h(x 1) y 1: 1 ) + Cov(w ) = Σ h + Σ w, (19) where Σ w is the covariance matrix of transition noise obtained by the evidence maximization method, and the computation of Σ h can be seen in Equation (10). Update Step Next, we approximate the joint distribution p(x, y y 1: 1 ) with a joint Gaussian N (µ, C ), where [ µ p ] [ µ = µ y, C = C p C T x y C x y C y Note that we are not aiming at approximating the distribution p(x y 1: ) directly. To obtain the joint approximation Gaussian distribution, we use the Gaussian distribution N (µ y, Cy ) to approximate the distribution p(y y 1: 1 ), which can be done in a same way to the prediction step due to p(y y 1: 1 ) = p(y x )p(x y 1: 1 )dx. ].

Entropy 2019, 21, 1126 10 of 26 On the other hand, the cross term satisfies C x,y = E x,g[x (y ) T y 1: 1 ] µ p (µy )T. For the unnown term E x,g[x (y ) T y 1: 1 ], we have E x,gã[x yã y 1: 1] = E x,gã[x gã(x ) y 1: 1 ] [ n ] = x βãi gã(x, x i ) p(x )dx i=1 = n βãi i=1 x c 1 N (x x i, Λã)N (x µ p, Cp )dx =c 1 (c 2 ) 1 n βãi ψ(x i, µ p ), (20) i=1 where ã = 1, 2,..., N E, c1 1 is the normalization constant of the unnormalized SE ernel, ψ(x i, µ p ) is the mean of a new Gaussian distribution that is the product of two Gaussian density functions and c2 1 is the normalization constant of the new Gaussian distribution. Consequently, from the joint distribution p(x, y y 1: 1 ), we obtain the posterior distribution p(x y 1: ) N (µ e, Ce ) with the mean and variance as Equations (15) and (16), respectively. Remar 2. From Equations (15) and (16), we can see that the centralized fusion method with GP-ADF is similar to the unified optimal linear estimation fusion [40,54]. The difference is that the computational way of µ p, Cp, µy,cy and C x y and they are mainly based on the training data due to the nonparametric dynamic system model. (2) Fusion with GP-UKF Following the single sensor GP-UKF (see e.g., References [14,26]), we present the prediction step and update step of the GP-UKF fusion. Prediction Step At time 1, based on the unscented transform [1], we obtain X 1 containing 2D + 1 points through the mean µ e 1 and covariance Ce 1. Using the GP prediction model, the transformed set is given by X [i] = GP [µ] h and the process noise is computed as [i] (X 1 ), f or i = 1,..., 2D + 1, (21) Q = GP [Σ] h (µe 1 ), (22) where GP [µ] h and GP [Σ] h are the mean and covariance models with respect to the Gaussian process GP h. The predictive mean and covariance are µ p = 2D+1 i=1 C p = 2D+1 i=1 W [i] X [i], (23) W [i] ( X [i] µ p )( X [i] µ p )T + Q, (24)

Entropy 2019, 21, 1126 11 of 26 where W [i] is the weight generated in the unscented transform. Using the predictive mean µ p and covariance C p, we obtain X ˆ [i] through the unscented transform. Thus, based on the GP observation model, Ŷ [i] = GP [µ] g ( ˆ X [i] and the observation noise matrix is determined by R = GP [Σ] g ) f or i = 1,..., 2D + 1, (25) ( ˆ X [i] Then the predicted observation and innovation covariance are calculated by ŷ = S = Meanwhile, the cross covariance is given by Update Step C x y = ). (26) 2D+1 W [i] Ŷ [i], (27) i=1 2D+1 W [i] (Ŷ [i] ŷ )(Ŷ [i] ŷ ) T + R. (28) i=1 2D+1 i=1 Finally, the update can be obtained by the following equations: W [i] ( X ˆ [i] µ p [i] )(Ŷ ŷ ) T (29) µ e = µp + C x y S 1 (y ŷ ), (30) C e = Cp C x y S 1 C T x y. (31) Remar 3. Note that GP-ADF propagates the full Gaussian density by exploiting specific properties of Gaussian process models [7]. GP-UKF maps the sigma points through the Gaussian process models instead of the parametric functions and the distributions are described by finite-sample approximations. In contrast to GP-UKF, GP-ADF is consistent and moment preserving [7]. In addition GP-EKF requires linearization of the nonlinear prediction and observation models in order to propagate the state and observation, respectively [14]. GP-PF needs to perform one Gaussian process mean and variance computation per particle and has very high computational cost. For more details about these single sensor filtering methods, it can be seen in [7,14], and so forth. Due to the linearization loss of GP-EKF and high computational cost of GP-PF, we do not consider them and only introduce the multisensor estimation fusion with GP-ADF and GP-UKF in this paper. 3.2. Distributed Estimation Fusion In this subsection, we mainly present the distributed estimation fusion with GP-ADF. For the dynamic systems (1) and (2), assume that there is a prior on initial state x 0 and each local sensor sends its estimation to the fusion center. Thus, the posterior distribution of m th local state estimation is a Gaussian distribution N (µ em, Cem µ em C em ), where µem = µ pm = C pm and C em are calculated as follows + C x y m(cym ) 1 (y m µym ), (32) C x y m(cym ) 1 C T x y m, (33)

Entropy 2019, 21, 1126 12 of 26 with µ pm = E[x y1: 1 m ], C pm = Cov(x y1: 1 m ), µ ym = E[y m ym 1: 1 ], C ym = Cov(y m ym 1: 1 ), = Cov(x, y m ym 1: 1 ). C x y m (34) Then, the local posterior mean and covariance are transmitted to the fusion center to yield the posterior distribution of global state estimation. In general, the centralized estimation fusion method with directly fusing raw data has better fusion performance than the distributed estimation fusion method based on the processed data. However, in some cases, the distributed estimation fusion is equivalent to the centralized estimation fusion. In particular, in what follows, we propose a distributed estimation fusion method and prove that the distributed estimation fusion method is equivalent to the above centralized estimation fusion method when all cross terms C x y m, m = 1,..., N, have full column ran. Proposition 1. Assume that all cross terms C x y m, m = 1,..., N are full column ran, the distributed estimation fusion is equivalent to the centralized estimation fusion as follows where µ e N = µp + C x y (C y ) 1 (m)(µ ym µ y (m)) m=1 N + C x y (C y ) 1 (m)c ym m=1 C + x y m µ y = (µy (1),..., µy (N)), µ y (m) = E(ym y 1: 1), (µ em µ pm ), (35) and (C y ) 1 = [(C y ) 1 (1), (C y ) 1 (2),..., (C y ) 1 (N)] is an appropriate partition of matrix (C y ) 1 such that Equation (35) holds. The superscript +" stands for pseudoinverse. Proof. According to Equation (15), the mean of centralized fused state is given by µ e = µp + C x y (C y ) 1 (y µ y ) (36) = µ p C x y (C y ) 1 µ y + C x y (C y ) 1 y (37) = µ p C x y (C y ) 1 µ y + C x y N m=1(c y ) 1 (m)y m. (38) Based on the assumption that all C x y m have full column ran, we obtain C + x y m C x y m = I, m = 1,..., N. (39)

Entropy 2019, 21, 1126 13 of 26 Thus, combining Equations (37) and (38) and Equation (39) yields C x y (C y ) 1 y =C x y N m=1(c y ) 1 (m)y m N =C x y (C y ) 1 (m)c ym m=1 C + x y m C x y m (Cym ) 1 y m. (40) In order to obtain the centralized estimation mean, that is, fuse the mean of local sensor state estimation at the fusion center, we use Equations (32) and (40) to eliminate y from Equation (36). From Equation (32), we have C x y m(cym ) 1 y m =µ em µ pm + C x y m(cym ) 1 µ ym. (41) Thus, substituting Equation (41) into Equation (40) and recalling Equations (36) (38), we have µ e = µp C x y (C y ) 1 µ y N + C x y (C y ) 1 (m)c ym m=1 (µ em µ pm C + x y m + C x y m(cym ) 1 µ ym ) = µ p N + C x y (C y ) 1 (m)(µ ym µ y (m)) m=1 N + C x y (C y ) 1 (m)c ym m=1 C + x y m (µ em µ pm ). Therefore, we obtain the state mean of the distributed estimation fusion. Remar 4. The proposed distributed estimation fusion formula is equivalent to the centralized estimation fusion, therefore, the two fusion methods have the same fusion performance. However, distributed estimation fusion is more desirable in real applications in terms of reliability, survivability, communication bandwidth and computational burdens. From Equation (35), we can see that µ p, Cp, µy, Cy, C x y can be calculated in advance and the terms µ pm, µ ym, µ em, Cym, C x y m need to be transmitted by local sensors in real time. Remar 5. From the update step (30) of the centralized estimation fusion with GP-UKF, the distributed estimation fusion method (35) is also suitable for the GP-UKF. Since the detailed description and proof for the distributed GP-UKF fusion are similar to Proposition 1, we omit the results. Note that the distributed GP-UKF fusion is different from the distributed GP-ADF fusion in the prediction step. 4. Numerical Examples In this section, we assess the performance of our fusion methods with two different examples. 4.1. 1D Example We consider the nonlinear dynamic system with two sensor measurements as follows: x +1 = 0.5x + 25x 1 + x 2 + w, w N (0, σ 2 ) (42) y m = 5 sin(2x + z m ) + v m, vm N (0, 0.12 ), m = 1, 2, (43)

Entropy 2019, 21, 1126 14 of 26 where z m, m = 1, 2 are given by z 1 = 0 and z 2 = π 4, respectively. This nonlinear system is popularly used in many papers (see References [7,10]) to measure the performance of nonlinear filters. We regard the first 200 samples as the training data and use the rest 50 samples called testing data to test the fusion methods. Assume that the x 200 is a Gaussian distribution with prior mean µ 200 = 0 and variance σ 200 2 = 1.52. The estimation performance of the single sensor and multisensor fusion is evaluated by the average Mahalanobis distances that are also used in Reference [7]. The Mahalanobis distance, which is defined between the ground truth and the filtered mean, is as follows: Maha = (x ˆx ) T (C e ) 1 (x ˆx ), (44) where C e is the estimated covariance. For the Mahalanobis distance, lower values indicate better performance [7]. Figures 2 and 3 show the average Mahalanobis distances of the single filter with GP-ADF and GP-UKF, and the fusion methods after 1000 independent runs. Figure 4 shows the average Mahalanobis distances for different system noises. According to the average Mahalanobis distances in Figures 2 and 3, we can see that the estimation performance of Sensor 2 is better than that of Sensor 1. Furthermore, the multisensor estimation fusion methods perform better than the single sensor filters with GP-ADF and GP-UKF, respectively. Thus the effectiveness and advantages of the multisensor estimation fusion with Gaussian process can be observed. In addition, GP-ADF fusion performs better than GP-UKF fusion for the system noise σ 2 = 1 and GP-UKF fusion performs better than GP-ADF fusion for the system noise σ 2 = 2. It means that GP-ADF fusion and GP-UKF fusion have their comparative advantages for different system noises. It suggests that GP-ADF fusion may be a better choice for the small system noise and GP-UKF fusion may be worth considering for the relatively large system noise. From Figure 4, we can see that the proposed fusion methods enjoy better performance for different system noises. It demonstrates the consistence of our fusion methods for different system noises. average Mahalanobis distances 3 2.5 2 1.5 1 0.5 Fusion-GP-ADF Fusion-GP-UKF Sensor1-GP-ADF Sensor2-GP-ADF Sensor1-GP-UKF Sensor2-GP-UKF 0 0 5 10 15 20 25 30 35 40 45 50 time step Figure 2. The average Mahalanobis distances of the single sensor and multisensor fusion with σ 2 = 1.

Entropy 2019, 21, 1126 15 of 26 average Mahalanobis distances 2 1.8 1.6 1.4 1.2 1 Fusion-GP-ADF Fusion-GP-UKF Sensor1-GP-ADF Sensor2-GP-ADF Sensor1-GP-UKF Sensor2-GP-UKF 0.8 0.6 0 10 20 30 40 50 time step Figure 3. The average Mahalanobis distances of the single sensor and multisensor fusion with σ 2 = 2. 1.2 average Mahalanobis distances 1.1 1 0.9 0.8 0.7 0.6 Fusion-GP-ADF Fusion-GP-UKF Single-GP-ADF Single-GP-UKF 0.5 1 1.5 2 2.5 3 Figure 4. The average Mahalanobis distances for different system noises σ 2. 4.2. 2D Nonlinear Dynamic System In this subsection, we elaborate the performance of the fusion methods with GP-ADF and GP-UKF, and we consider the example with constant turn motion model in target tracing. The root mean square error (RMSE) is used as the estimation performance measure. It is defined as follows: RMSE = 1 M M x j ˆxj 2 2, (45) j=1 where M is the total number of simulation runs, x j is the true simulated state for the jth simulation, and ˆx j is the estimated state value for the jth simulation at time, j = 1,..., M. The lower the value of the RMSE is, the better the performance of corresponding method is.

Entropy 2019, 21, 1126 16 of 26 4.2.1. Experimental Setup We consider the constant turn motion model with two sensors as follows: x +1 = Fx + w, w N (0, Q ), (46) [ ] (x y m = (1) z m (1)) 2 + (x (2) z m (2)) 2 arctan x (2) z m (2) + v m, x (1) z m (1) v m N (0, R ), m = 1, 2, (47) where the state transition matrix F is given by [ cos Ω F = sin Ω ] sin Ω cos Ω with angular velocity Ω, the covariance matrix of transition noise satisfies [ ] 5 0 Q =, 0 5 and the covariance matrix of measurement noise is [ ] 5 R = 2 0 0 ( 0.1 π. 180 )2 The sensor positions are given by z 1 = [ 200, 200] T and z 2 = [200, 200] T, respectively. We use the transition function (46) and measurement functions (47) to simulate a group of data. The values of angular velocity Ω are given by Ω = 2π 90 rad/s and Ω = 0.5 rad/s, respectively, which correspond to the small turn motion model and large turn motion model. The first κ samples are regarded as the training data and the rest 50 samples called testing data are used to test the fusion methods. The {x τ, y 1 τ, y2 τ } 1 τ= κ are the true simulated state and two-sensor observations used to train the models. Assume that the x κ is a Gaussian distribution with the prior mean µ κ = [0, 0] T and the identity matrix variance S κ = I. The initial value of filtering for each sensor is set as the Cartesian coordinate transformation value based on the Polar coordinate observation with a random perturbation for each dimension. We use the random perturbation N(10, 1) for the κ = 300 training data case and N(20, 1) for the κ = 30, 60 training data cases, respectively. The initial value of fusion filtering is the mean of the initial values of all sensors. Based on the available observation information, the outputs of GP-ADF and GP-UKF for each single sensor are the mean and covariance terms µ pm, C pm, µ em, Cem, µym,c ym, C x y m. The distributed fusion methods, the RCC-CI algorithm [37] and the convex combination method [43], are also used as a comparison with our distributed fusion method. The RCC-CI algorithm and the convex combination method directly use the estimated mean and covariance matrix to fuse the estimated results. Our methods tae full advantages of the results of the single sensor Gaussian process filters such as the cross term C x y m which is easily obtained. Note that all of the fusion methods are based on the local estimates that are distilled from the measurements. Thus, the comparison is fair from the available observation information. The simulation results of the RCC-CI algorithm are based on the YALMIP toolbox [55]. We compare the fusion methods with the RCC-CI algorithm and the convex combination method by RMSE after M = 1000 independent simulation runs. Meanwhile, we also compare the computation time of the multisensor estimation fusion methods with the RCC-CI algorithm and the convex combination method. The ratio of full ran, defined as the percentage of full column ran of the cross term C x y m for the single sensor filtering at every time step after M = 1000 independent runs, is used to test the condition of equivalence. If the

Entropy 2019, 21, 1126 17 of 26 ratio of full ran is less than 100%, the condition is broen. The simulations are done under Matlab (MathWors, Inc., Natic, MA, USA) R2018b with ThinPad W540. Figure 5 describes the ratios of full column ran in the single sensor case for testing data with κ = 300 training data and Ω = 2π 2π 90, 0.5 rad/s. The RMSEs for Ω = 90, 0.5 rad/s in the case of κ = 300 training data are depicted in Figures 6 and 7, respectively. The average computation time of these fusion methods for Ω = 2π 90, 0.5 rad/s in the case of κ = 300 training data after 1000 independent simulation runs are depicted in Figures 8 and 9, respectively. The diamond line represents the RMSE of the centralized estimation fusion with GP-ADF (CMF-GP-ADF) and the star line represents the distributed estimation fusion with GP-ADF (DMF-GP-ADF). The circle line represents the RMSE of the centralized estimation fusion with GP-UKF (CMF-GP-UKF) and the x line represents the distributed estimation fusion with GP-UKF (DMF-GP-UKF). The upper triangle line represents the RMSE of the RCC-CI fusion algorithm with GP-ADF (RCC-CI-GP-ADF) and the lower triangle line represents the RMSE of the RCC-CI fusion algorithm with GP-UKF (RCC-CI-GP-UKF). The square line represents the RMSE of the convex combination method, that is, covariance weight, with GP-ADF (CW-GP-ADF) and the + line represents the RMSE of the convex combination method with GP-UKF (CW-GP-UKF). At the same time, solid line is used for the single sensor GP-ADF filter and dash-dotted line is used for the single GP-UKF filter. Similarly, the ratios of full column ran in the single sensor case for testing data with κ = 30, 60 training data and Ω = 2π 90 rad/s are depicted in Figures 10 and 11, respectively. The RMSEs for Ω = 2π 90 rad/s in the case of κ = 30, 60 training data are depicted in Figures 12 and 13, respectively. The average computation time of these fusion methods for testing data with κ = 30, 60 training data and Ω = 2π 90 rad/s after 1000 independent simulation runs are depicted in Figures 14 and 15, respectively. 4.2.2. Experimental Analysis We divide the experimental analysis into two cases, that is, the equivalence condition is satisfied or not. Some phenomena and analysis are described as follows: Case 1: the equivalence condition is satisfied. From Figure 5, we can see that the ratios of full ran are all equal to 100%. It means that the cross terms of the single sensor filters are all full column ran for the κ = 300 training data case and thus the equivalence condition of centralized fusion and distributed fusion is satisfied in Proposition 1. Meanwhile, from Figures 6 and 7, the RMSE of the distributed estimation fusion is the same as that of the centralized estimation fusion based on GP-ADF and GP-UKF, respectively. It demonstrates the equivalence between the centralized estimation fusion and the distributed estimation fusion in Proposition 1 under the condition of full column ran. From Figures 6 and 7, it can be seen that the RMSE of the multisensor estimation fusion method is lower than that of the single sensor filtering. It shows that the multisensor fusion improves the estimation accuracy. In addition, the RMSE of multisensor estimation fusion method is lower than that of the RCC-CI algorithm and the convex combination method. It implies the effectiveness of the fusion methods based on Gaussian processes. The possible reason is that our estimation fusion methods extract more extra correlation information, and the RCC-CI algorithm and the convex combination method only use the local estimates with mean and covariance. Compared Figure 6 with Figure 7, we can find that our methods do well in the different angular velocity cases, i.e., the small turn motion model and large turn motion model. It confirms that our fusion methods have fine applicability with better performance. From Figures 8 and 9, we can find that the computation time of the distributed estimation fusion is less than that of the centralized estimation fusion. It demonstrates the superiority of the distributed estimation fusion under the same fusion performance. The computation time of the proposed fusion methods is much less than that of the RCC-CI algorithm. The possible reason is

Entropy 2019, 21, 1126 18 of 26 due to solve a optimization problem for the RCC-CI algorithm. The convex combination method taes the least computation time, since it directly uses the weight combination with covariance. percentage of full ran (%) 2 1.8 1.6 1.4 1.2 1 0.8 0.6 Sensor1-GP-ADF Sensor2-GP-ADF Sensor1-GP-UKF Sensor2-GP-UKF 0.4 0.2 0 0 5 10 15 20 25 30 35 40 45 50 time step Figure 5. The ratios of full column ran of the cross term C x y m with κ = 300 training data and Ω = 2π 90 rad/s, 0.5 rad/s. in the single sensor case for testing data RMSE (m) 20 18 16 14 12 10 Sensor1-GP-ADF Sensor2-GP-ADF RCC-CI-GP-ADF CW-GP-ADF CMF-GP-ADF DMF-GP-ADF Sensor1-GP-UKF Sensor2-GP-UKF RCC-CI-GP-UKF CW-GP-UKF CMF-GP-UKF DMF-GP-UKF 8 6 4 2 0 10 20 30 40 50 time step Figure 6. The RMSE for testing data with κ = 300 training data and Ω = 2π 90 rad/s.

Entropy 2019, 21, 1126 19 of 26 RMSE (m) 18 16 14 12 10 8 Sensor1-GP-ADF Sensor2-GP-ADF RCC-CI-GP-ADF CW-GP-ADF CMF-GP-ADF DMF-GP-ADF Sensor1-GP-UKF Sensor2-GP-UKF RCC-CI-GP-UKF CW-GP-UKF CMF-GP-UKF DMF-GP-UKF 6 4 2 0 5 10 15 20 25 30 35 40 45 50 time step Figure 7. The RMSE for testing data with κ = 300 training data and Ω = 0.5 rad/s. 0.1 average computation time (s) 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 RCC-CI-GP-ADF RCC-CI-GP-UKF CMF-GP-ADF CMF-GP-UKF DMF-GP-ADF DMF-GP-UKF CW-GP-ADF CW-GP-UKF 0.01 0 0 10 20 30 40 50 time step Figure 8. Average computation time of the three fusion methods for testing data with κ = 300 training data and Ω = 2π 90 rad/s.

Entropy 2019, 21, 1126 20 of 26 0.1 average computation time (s) 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 RCC-CI-GP-ADF RCC-CI-GP-UKF CMF-GP-ADF CMF-GP-UKF DMF-GP-ADF DMF-GP-UKF CW-GP-ADF CW-GP-UKF 0.01 0 0 10 20 30 40 50 time step Figure 9. Average computation time of the three fusion methods for testing data with κ = 300 training data and Ω = 0.5 rad/s. Case 2: the equivalence condition is not satisfied. From Figures 10 and 11, it can be seen that the ratios of full ran are both less than 100% for the single sensor GP-UKF case with κ = 30 and κ = 60 training data. Thus, the equivalence between the centralized and distributed estimation fusion with GP-UKF is broen in Figures 12 and 13, respectively. The reason may be that the Gaussian models are relatively inaccurate with less training data, which can be nown from the comparison with Figures 6, 12 and 13 for the same methods. At the same time, the finite-sample approximation of GP-UKF seriously depends on the Gaussian process models and the computation way of the cross terms is the sum about ran-one matrices for GP-UKF. However, the equivalence is still satisfied for the GP-ADF fusion. It implies GP-ADF is more stable than GP-UKF, which is also referred to in Reference [7]. We can also see that the performance of GP-UKF fusion is better than that of GP-ADF fusion with κ = 300 training data from Figure 6 and a little worse with κ = 30, 60 training data from Figures 12 and 13. Meanwhile, from Figures 8, 14 and 15, the average computation time of GP-UKF fusion is less than that of GP-ADF fusion with κ = 300 training data and is contrary with κ = 30, 60 training data. It may inspire us that the GP-UKF fusion is suitable for the enough training data case and GP-ADF fusion does well in the small number of training data case for the turn motion systems. In a word, distributed estimation fusion with GP-ADF is more stable, and has a better performance in the case of the small number of training data. If we have enough training data, GP-UKF fusion may be the better choice.

Entropy 2019, 21, 1126 21 of 26 1 0.98 percentage of full ran (%) 0.96 0.94 0.92 0.9 Sensor1-GP-ADF Sensor2-GP-ADF Sensor1-GP-UKF Sensor2-GP-UKF 0.88 0 10 20 30 40 50 time step Figure 10. The ratios of full column ran of the cross term C x y m in the single sensor case for testing data with κ = 30 training data and Ω = 2π 90 rad/s. 1 percentage of full ran (%) 0.995 0.99 0.985 0.98 Sensor1-GP-ADF Sensor2-GP-ADF Sensor1-GP-UKF Sensor2-GP-UKF 0.975 0 10 20 30 40 50 time step Figure 11. The ratios of full column ran of the cross term C x y m in the single sensor case for testing data with κ = 60 training data and Ω = 2π 90 rad/s.

Entropy 2019, 21, 1126 22 of 26 140 120 CMF-GP-UKF DMF-GP-UKF CMF-GP-ADF DMF-GP-ADF 100 RMSE (m) 80 60 40 20 0 0 10 20 30 40 50 time step Figure 12. The RMSE for testing data with κ = 30 training data and Ω = 2π 90 rad/s. 30 25 CMF-GP-UKF DMF-GP-UKF CMF-GP-ADF DMF-GP-ADF RMSE (m) 20 15 10 5 0 10 20 30 40 50 time step Figure 13. The RMSE for testing data with κ = 60 training data and Ω = 2π 90 rad/s.

Entropy 2019, 21, 1126 23 of 26 0.014 0.012 average computation time (s) 0.01 0.008 0.006 0.004 CMF-GP-UKF DMF-GP-UKF CMF-GP-ADF DMF-GP-ADF 0.002 0 0 10 20 30 40 50 time step Figure 14. Average computation time of the three fusion methods for testing data with κ = 30 training data and Ω = 2π 90 rad/s. 0.014 average computation time (s) 0.012 0.01 0.008 0.006 0.004 CMF-GP-UKF DMF-GP-UKF CMF-GP-ADF DMF-GP-ADF 0.002 0 0 10 20 30 40 50 time step Figure 15. Average computation time of the three fusion methods for testing data with κ = 60 training data and Ω = 2π 90 rad/s. 5. Conclusions In the context of this paper, the property matters if the real system does not closely follow idealized models or a parametric model cannot easily be determined for nonlinear systems. A non-parametric method, the Gaussian process, is introduced to learn models from training data, since it taes both model uncertainty and sensor measurement noise into account. In order to estimate the state of the multisensor nonlinear dynamic systems, we have used the Gaussian process models as the prior of the transition and measurement function of the dynamic system. Then the transition function and measurement function have been trained with Gaussian processes, respectively. Based on

Entropy 2019, 21, 1126 24 of 26 the Gaussian process models for all available local sensor measurements, we have developed two fusion methods, centralized estimation fusion and distributed estimation fusion, with GP-ADF and GP-UKF, respectively. Taing full advantage of the nature of the Gaussian process, the equivalence between centralized estimation fusion and distributed estimation fusion has been derived under mild conditions. Simulations show that the equivalence is satisfied under given conditions and the multisensor estimation fusion performs better than the single sensor filters. Compared with the RCC-CI algorithm, the multisensor estimation fusion methods not only have higher accuracy, but also require less computation time. The estimation performance of multisensor estimation fusion methods is also better than that of the convex combination method. Future wor may involve state initialization, Gaussian process latent variable models without ground truth states, multiple model fusion with different Gaussian processes for maneuvering target tracing and validate the methods with the real sensor data. Author Contributions: Conceptualization, Y.L. and J.X.; Gaussian process methodology, Z.W. and Y.L.; fusion methodology, X.S. and Y.L.; data curation, J.X. and Y.L.; software, Y.L. and J.X.; validation, Y.L. and Z.W.; formal analysis, Y.L.; writing original draft preparation, J.X. and Y.L.; writing review and editing, Y.L.; supervision, X.S.; project administration, X.S.; funding acquisition, X.S. Funding: This wor was supported in part by the NSFC under Grant 61673282, the PCSIRT under Grant PCSIRT16R53 and the Fundamental Research Funds for the Central Universities under Grand No. 2012017yjsy140. Acnowledgments: We would lie to than Yunmin Zhu for his insightful comments and helpful suggestions that greatly improved the quality of this paper. Conflicts of Interest: The authors declare no conflict of interest. References 1. Julier, S.J.; Uhlmann, J.K. Unscented filtering and nonlinear estimation. Proc. IEEE 2004, 92, 401 422. [CrossRef] 2. Lan, J.; Li, X.R. Multiple conversions of measurements for nonlinear estimation. IEEE Trans. Signal Process. 2017, 65, 4956 4970. [CrossRef] 3. Bar-Shalom, Y.; Li, X.R.; Kirubarajan, T. Estimation with Applications to Tracing and Navigation: Theory, Algorithms and Software; Wiley: New Yor, NY, USA, 2001. 4. Thrun, S.; Burgard, W.; Fox, D. Probabilistic Robotics; MIT Press: Cambridge, MA, USA, 2005. 5. Simon, D. Optimal State Estimation: Kalman, H, and Nonlinear Approaches; Wiley-Interscience: New Yor, NY, USA, 2006. 6. Arulampalam, M.S.; Masell, S.; Gordon, N.; Clapp, T. A tutorial on particle filters for online nonlinear/non-gaussian Bayesian tracing. IEEE Trans. Signal Process. 2002, 50, 174 188. [CrossRef] 7. Deisenroth, M.P.; Huber, M.F.; Hanebec, U.D. Analytic moment-based Gaussian process filtering. In Proceedings of the International Conference on Machine Learning, Montreal, QC, Canada, 14 18 June 2009. 8. Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006. 9. Huber, M.F. Nonlinear Gaussian Filtering: Theory, Algorithms, and Applications. Ph.D. Thesis, Karlsruhe Institute of Technology, Karlsruhe, Germany, 2015. 10. Deisenroth, M.P.; Turner, R.D.; Huber, M.F.; Hanebec, U.D.; Rasmussen, C.E. Robust filtering and smoothing with Gaussian processes. IEEE Trans. Automat. Control 2012, 57, 1865 1871. [CrossRef] 11. Jacobs, M.A.; DeLaurentis, D. Distributed Kalman filter with a Gaussian process for machine learning. In Proceedings of the 2018 IEEE Aerospace Conference, Big Sy, MT, USA, 3 10 March 2018; pp. 1 12. 12. Guo, Y.; Li, Y.; Tharmarasa, R.; Kirubarajan, T.; Efe, M.; Sariaya, B. GP-PDA filter for extended target tracing with measurement origin uncertainty. IEEE Trans. Aerosp. Electron. Syst. 2019, 55, 1725 1742. [CrossRef] 13. Preuss, R.; Von Toussaint, U. Global optimization employing Gaussian process-based Bayesian surrogates. Entropy 2018, 20, 201. [CrossRef] 14. Ko, J.; Fox, D. GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models. Autonomous Robots 2009, 27, 75 90. [CrossRef]

Entropy 2019, 21, 1126 25 of 26 15. Lawrence, N.D. Gaussian process latent variable models for visualisation of high dimensional data. In Proceedings of the NIPS 03 16th International Conference on Neural Information Processing Systems, Whistler, BC, Canada, 9 11 December 2003; pp. 329 336. 16. Wang, J.M.; Fleet, D.J.; Hertzmann, A. Gaussian process dynamical models. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 5 8 December 2005; pp. 1441 1448. 17. Ko, J.; Fox, D. Learning GP-BayesFilters via Gaussian process latent variable models. Autonomous Robots 2011, 30, 3 23. [CrossRef] 18. Csató, L.; Opper, M. Sparse on-line Gaussian processes. Neural Comput. 2002, 14, 641 668. [CrossRef] 19. Snelson, E.; Ghahramani, Z. Sparse Gaussian processes using pseudo-inputs. In Proceedings of the 18th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 5 8 December 2005; pp. 1257 1264, 20. Seeger, M.; Williams, C.K.I.; Lawrence, N.D. Fast forward selection to speed up sparse Gaussian process regression. In Proceedings of the Worshop on AI and Statistics 9, Key West, FL, USA, 3 6 January 2003. 21. Smola, A.J.; Bartlett, P. Sparse greedy Gaussian process regression. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 27 November 2 December 2000. 22. Quiñonero-Candela, J.; Rasmussen, C. A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 2005, 6, 1939 1959. 23. Velycho, D.; Knopp, B.; Endres, D. Maing the coupled Gaussian process dynamical model modular and scalable with variational approximations. Entropy 2018, 20, 724. [CrossRef] 24. Yan, L.; Duan, X.; Liu, B.; Xu, J. Bayesian optimization based on K-optimality. Entropy 2018, 20, 594. [CrossRef] 25. Ko, J.; Klein, D.J.; Fox, D.; Hähnel, D. Gaussian processes and reinforcement learning for identification and control of an autonomous blimp. In Proceedings of the IEEE International Conference on Robotics and Automation, Roma, Italy, 10 14 April 2007. 26. Ko, J.; Klein, D.J.; Fox, D.; Haehnel, D. GP-UKF: Unscented Kalman filters with Gaussian process prediction and observation models. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, CA, USA, 29 October 2 November 2007. 27. Ferris, B.; Hähnel, D.; Fox, D. Gaussian processes for signal strength-based location estimation. In Proceedings of the Robotics: Science and Systems, Philadelphia, PA, USA, 16 19 August 2006; pp. 1 8. 28. Maybec, P.S. Stochastic Models, Estimation, and Control; Academic Press, Inc.: New Yor, NY, USA, 1979. 29. Boyen, X.; Koller, D. Tractable inference for complex stochastic processes. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, Madison, WI, USA, 24 26 July 1998; pp. 33 42. 30. Opper, M. On-line learning in neural networs. In ch. A Bayesian Approach to On-line Learning; Cambridge University Press: Cambridge, UK, 1999; pp. 363 378. 31. Liggins, M.E.; Hall, D.L.; Llinas, J. Handboo of Multisensor Data Fusion: Theory and Practice; CRC Press: Boca Raton, FL, USA, 2009. 32. Bar-Shalom, Y.; Willett, P.K.; Tian, X. Tracing and Data Fusion: A Handboo of Algorithms; YBS Publishing: Storrs, CT, USA, 2011. 33. Shen, X.; Zhu, Y.; Song, E.; Luo, Y. Optimal centralized update with multiple local out-of-sequence measurements. IEEE Trans. Signal Process. 2009, 57, 1551 1562. [CrossRef] 34. Wu, D.; Zhou, J.; Hu, A. A new approximate algorithm for the Chebyshev center. Automatica 2013, 49, 2483 2488. [CrossRef] 35. Li, M.; Zhang, X. Information fusion in a multi-source incomplete information system based on information entropy. Entropy 2017, 19, 570. [CrossRef] 36. Gao, X.; Chen, J.; Tao, D.; Liu, W. Multi-sensor centralized fusion without measurement noise covariance by variational Bayesian approximation. IEEE Trans. Aerosp. Electron. Syst. 2011, 47, 718 272. [CrossRef] 37. Wang, Y.; Li, X.R. Distributed estimation fusion with unavailable cross-correlation. IEEE Trans. Aerosp. Electron. Syst. 2012, 48, 259 278. [CrossRef] 38. Chong, C.-Y.; Chang, K.; Mori, S. Distributed tracing in distributed sensor networs. In Proceedings of the American Control Conference, Seattle, WA, USA, 18 20 June 1986; pp. 1863 1868, 39. Zhu, Y.; You, Z.; Zhao, J.; Zhang, K.; Li, X.R. The optimality for the distributed Kalman filtering fusion with feedbac. Automatica 2001, 37, 1489 1493. [CrossRef]

Entropy 2019, 21, 1126 26 of 26 40. Li, X.R.; Zhu, Y.; Jie, W.; Han, C. Optimal linear estimation fusion Part I: Unified fusion rules. IEEE Trans. Inf. Theory 2003, 49, 2192 2208. [CrossRef] 41. Duan, Z.; Li, X.R. Lossless linear transformation of sensor data for distributed estimation fusion. IEEE Trans. Signal Process. 2010, 59, 362 372. [CrossRef] 42. Shen, X.J.; Luo, Y.T.; Zhu, Y.M.; Song, E.B. Globally optimal distributed Kalman filtering fusion. Sci. China Inf. Sci. 2012, 55, 512 529. [CrossRef] 43. Chong, C.-Y.; Mori, S. Convex combination and covariance intersection algorithms in distributed fusion. In Proceedings of the 4th International Conference on Information Fusion, Montreal, QC, Canada, 7 10 August 2001. 44. Sun, S.; Deng, Z.-L. Multi-sensor optimal information fusion Kalman filter. Automatica 2004, 40, 1017 1023. [CrossRef] 45. Song, E.; Zhu, Y.; Zhou, J.; You, Z. Optimal Kalman filtering fusion with cross-correlated sensor noises. Automatica 2007, 43, 1450 1456. [CrossRef] 46. Hu, C.; Lin, H.; Li, Z.; He, B.; Liu, G. Kullbac Leibler divergence based distributed cubature Kalman filter and its application in cooperative space object tracing. Entropy 2018, 20, 116. [CrossRef] 47. Bar, M.A.; Lee, S. Distributed multisensor data fusion under unnown correlation and data inconsistency. Sensors 2017, 17, 2472. [CrossRef] 48. Xie, J.; Shen, X.; Wang, Z.; Zhu, Y. Gaussian process fusion for multisensor nonlinear dynamic systems. In Proceedings of the 37th Chinese Control Conference (CCC), Wuhan, China, 25 27 July 2018; pp. 4124 4129. 49. Osborne, M. Bayesian Gaussian Processes for Sequential Prediction, Optimisation and Quadrature. Ph.D. Thesis, University of Oxford, Oxford, UK, 2010. 50. Nguyen-Tuong, D.; Seeger, M.; Peters, J. Local Gaussian process regression for real time online model learning. In Proceedings of the International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8 10 December 2008. 51. Deisenroth, M. Efficient Reinforcement Learning using Gaussian Processes; KIT Scientific Publishing: Karlsruhe, Germany, 2010. 52. Ghahramani, Z.; Rasmussen, C.E. Bayesian Monte Carlo. Adv. Neural Inf. Process. Syst. 2003, 15, 489 496. 53. Candela, J.Q.; Girard, A.; Larsen, J.; Rasmussen, C.E. Propagation of uncertainty in Bayesian ernel models - application to multiple-step ahead forecasting. IEEE Int. Conf. Acoust. Speech Signal Process. 2003, 2, 701 704. 54. Zhu, Y. Multisensor Decision and Estimation Fusion; Kluwer Academic Publishers: Boston, MA, USA, 2003. 55. Öfberg, J.L. Yalmip: A toolbox for modeling and optimization in matlab. In Proceedings of the CACSD Conference, Taipei, Taiwan, China, 2 4 September 2004. 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).