On-Line Identification and Adaptive Trajectory Tracking for Nonlinear Stochastic Continuous Time Systems using Differential Neural Networks

(1)

tracling for non-linear stochastic continuous-time systems using dierential

neural networks

Alexander S.Poznyak and Lennart Ljung Department of ElectricalEngineering

Linkoping University, SE-581 83Linkoping, Sweden WWW: http://www.control.isy.l iu. se

Email: ljung@isy.liu.se October2, 2001

REGLERTEKNIK

AUTO_{MATIC CONTR}OL LINKÖPING

Report no.: LiTH-ISY-R-2364 Submitted toAutomatica

TechnicalreportsfromtheAutomaticControlgroupinLinkopingareavailable by anonymous ftp at the address ftp.control.isy.liu.se. This report is containedin thele2364.pdf.

(2)

(3)

On-line IdentiÞcation and Adaptive Trajectory

Tracking for Nonlinear Stochastic Continuous Time

Systems Using Diﬀerential Neural Networks

Alex S. Poznyak

†

_{and Lennart Ljung}

‡

†

CINVESTAV-IPN, Dept. of Automatic Control, AP 14 740, CP 07000, Mexico D.F., Mexico, e-mail: apoznyak@ctrl.cinvestav.mx ‡ _{Linköping University, Dept. of Electr. Eng.S-581 83, Linköping, Sweden}

e-mail: ljung@isy.liu.se

Abstract

IdentiÞcation of nonlinear stochastic processes via diﬀerential neural networks is dis-cussed. A new ”dead-zone” type learning law for the weight dynamics is suggested. By a stochastic Lyapunov-like analysis the stability conditions for the identiÞcation error as well as for the neural network weights are established. The adaptive trajectory tracking using the obtained neural network model is realized for the subclass of stochastic com-pletely controllable processes linearly dependent on control. The upper bounds for the identiÞcation and adaptive tracking errors are established.

Keywords: Dynamic Neural Networks, Stochastic Processes, IdentiÞcation, Adaptive Control.

1. Introduction

Due to the big enthusiasm generated by the successful applications, the use of Neural Net-work technique seems to be a very eﬀective tool to control a wide class of complex nonlinear systems in the absence of a complete model information or, even, considering a controlled plant as ”a black box” (Hunt et al., 1992). The NN’s can be qualiÞed as static (feedforward) and as dynamic (recurrent or diﬀerential) nets. The most of recent publications (see, for example, Haykin, 1994; Agarval, 1997; Parizini and Zoppoli, 1994) deals with Static Neural Nets providing the appropriate approximation of a nonlinear operator functions in the right-hand side of dynamic model equations. Although they have been used successfully, the major disadvantages are a slow learning rate (the weights updates do not utilize the information on the local NN structure) and a high sensitivity of the function approximation to the training

(4)

data (they do not have memory, so their outputs are uniquely determined by the current in-puts and weights). Dynamic Neural Networks can successfully overhead these drawbacks and demonstrate a workable behavior in the presence of essential umnodelled dynamics because their structure incorporate feedback. They have been introduced by HopÞeld, 1984 and then studied in (Sandberg, 1991 (the approximation aspects), Rivithakis and Cristodoulou, 1994 (the direct adaptive regulation via DNN), Poznyak et al., 1999 (the adaptive tracking using DNN)). Several advance results concerning DNN have been recently obtained in (Narendra and Li, 1998 (the identiÞcation and control), Lewis and Parisini, 1998 (NN feedback control: the stability analysis) and in (Rivithakis and Cristodoulou, 2000). All of these publications concern NN applications considered speciÞcally only for the external environment of the de-terministic nature, that is, all dynamic trajectories as well as external perturbations and internal uncertainties are assumed to be deterministic functions of time. On our knowledge, there are a few publications dealing with the identiÞcation or control of stochastic processes with unknown dynamics via NN. So, in (Kosmatopulos and Christodoulou, 1994) the Boltz-mann type recurrent high-order NN (g-RHONN without a hidden layer) was applied for the estimation of the unknown density of a ”transmitted” stochastic signal.

This paper is our Þrst attempt to spread the ideas of the Neuro Control approach to the class of continuous time stochastic processes with incomplete dynamic model description. Dynamic Neural Networks given by differential equations (Differential Neural Networks) are applied for the identiÞcation and adaptive trajectory tracking purposes. The main goal of the paper is to show that practically the same DNN, designed for the deterministic systems with bounded perturbations, is robust with respect to perturbations of a stochastic nature which are unbounded (with probability one). The stochastic calculus (such as Itô formula and the martingale technique together with the strong large number law implementation are shown to be the effective instrument for the investigation of such systems.

2. Study Motivation

State feedback control is one of the topic acquiring big importance and attention in engineering publications (Isidori, 1995). To realized the deterministic trajectory tracking of a nonlinear controlled system given by

úx = f (x, t) + g(x, t)u (1)

based on the available information on the reference dynamics úx∗ _{= ø}_ϕ(x∗_{, t), the successful} feedback control u (compensating arising nonlinear eﬀects) can be designed as the nonlinear feedback satisfying

(5)

The control (3) is realizable if and only if the following ”controllability-rank” condition is fulÞlled. A possible control actions (having the minimal norm) and verifying (2) is as follows1

u = g+(x, t) [øϕ(x∗_{, t) − f(x, t) + A}∗_{(x − x}∗)] (3)

It provides the global exponential stability property for the corresponding closed-loop system as d dtkx − x∗k 2 P = 2 (x − x∗)|P (f (x, t) + g(x, t)u − øϕ(x∗, t)) = (x − x∗)|(P A∗+ A∗|_{P ) (x − x}∗_{) = − (x − x}∗)|_{Q (x − x}∗) P A∗_{+ A}∗|_{P = −Q < 0} and, as the result, kx − x∗_k2

P ≤ kx0− x∗k 2 Pexp

¡

−λmin¡P−1/2_QP−1/2¢_t¢_{. To realize (3) and} the most of other feedback controls, the only one condition must be fulÞlled: the complete information on the given dynamics should be available.

The question is: what to do in the case when we deal with a partially or completely unknown dynamics, that is, the ”black-box” or ”grey-box” (there is an umnodelled dynamic part) situations are of interest or, may be, the dynamics of a given system is disturbed by an external noise of a stochastic nature. One of the possible solutions of this problems is related to Adaptive Control implementation (Krsti´c et al., 1995: Duncan et al., 1999) based on IdentiÞcation Technique (Ljung, 1999; Ljung and Gunnarson, 1990) providing a designer some sort of model estimates öxt generated by the model equation

d

dtx = öö f (öx, t | ct) + ög(öx, t | ct)ut (4)

The right-hand side functions can be used in an applied control law, for example, in (3) instead of unknown nonlinear functions participating in the dynamic model description, that is, u = ög+(ö_{x, t | c}t) h ø ϕ(x∗_{, t) − ö}f (ö_{x, t | c}t) + A∗(ö_{x − x}∗) i (5) or, the same expression, but only x is used instead of öx (since x is measurable). The adjusting or learning law úct= ct(öxt, xt, ut) to be designed for the parameters ct realizes, so-called, the identiÞcation process providing the ”closeness” of the model states öxtto the real system state vector xt.

The speciÞc features of the problem, tackled in this paper, are as follows: Þrst, the class of unknown dynamic systems is an extension of (1) including ones given by Stochastic

1

[·]+is the pseudoinverse (in Moor-Penrose sense) operator acting as AA+A = A, A+AA+= A+, 0+= 0, c+_{= c}|

(6)

Nonlinear Differential Equations with ”smooth enough” regular part; second, the class of the model structures (4), used for the identiÞcation purposes, includes, so-called, Differential Neural Networks (Dynamic Neural Networks functioning in continuous time) which weights are adjusted on-line by a special ”differential learning procedure” (Poznyak et al., 1999) and used, simultaneously, in the corresponding control law to provide a qualitative trajectory tracking process.

3. Uncertain Stochastic System Let

³

Ω, F, {Ft}_t≥0, P ´

be a given Þltered probability space ( (Ω, F, P) is complete, the F0 contains all the P-null sets in F, the Þltration {Ft}_t≥0is right continuous: Ft+:= ∩_s>tFs= Ft). On this probability space deÞne an m-dimensional standard Brownian motion, that is, ¡_ø

Wt_{, t ≥ 0}¢(with øW0_{= 0) is an {F}t}_t≥0-adapted Rm-valued process such that E©Wtø _{− ø}Ws_{| F}s ª = 0 P_{− a.s., P}©_{ω ∈ Ω : ø}W0 = 0ª= 1 E©£Wøt− øWs ¤ £_ø Wt− øWs ¤| | Fsª= (t − s) I P − a.s.

Consider the stochastic nonlinear controlled continuous-time system with the dynamics xt:

xt= x0+ t Z s=0 b (s, xs, us) dt + t Z s=0 c (s, xs, us) d øWs (6)

or, in the abstract (symbolic) form,

dxt= b (t, xt, ut) dt + c (t, xt, ut) d øWt, x0 = xgiven, _{t ∈ [0, ∞)} (7) The Þrst integral in (6) is an stochastic ordinary integral and the second one is an Itô integral. In the above ut _{∈ U ⊆ R}p _{is a control at time t, b : [0, ∞) × R}n _{× U → R}n, and c : [0, ∞) × Rn× U → Rn×m. It is assumed that

A1_{: {Ft}}_t≥0 is the natural Þltration generated by ¡Wø_{t, t ≥ 0}¢ and augmented by the P_{-null sets from F.}

A2: (U, d) is a separable metric space with a metric d.

DeÞnition 1. _{The function f : [0, ∞) × R}n_{× U → R}n×m is said to be an Lφ¡C2¢ -mappingif it is Borel measurable and is C2 _{in x (almost everywhere) for any t ∈ [0, ∞) and} any u ∈ U and also there exist a constant L and a modulus of continuity φ : [0, ∞) → [0, ∞) such that for any t ∈ [0, ∞) and for any x, u, öx, ö_{u ∈ R}n_{× U × R}n_{× U}

kf (t, x, u) − f (t, öx, ö_{u)k ≤ L kx − ö}_{xk + φ (d (u, ö}_{u)) , kf (t, 0, u)k ≤ L} kfx(t, x, u) − fx(t, öx, ö_{u)k ≤ L kx − ö}_{xk + φ (d (u, ö}u))

kfxx(t, x, u) − fxx(t, öx, öu)k ≤ φ (kx − öxk + d (u, öu))

(7)

A3: both b (t, x, u) and c (t, x, u) are Lφ ¡

C2¢_-mappings.

Notice that from the deÞnition given above it follows that kf (t, x, u)k ≤ L0+ L1kxk . So, kb (t, x, u)k2 ≤ Lb0+ Lb1kxk2, kc (t, x, u)k2≤ Lc0+ Lc1kxk2 (8) DeÞnition 2. A stochastic control ut is called a feasible in the stochastic sense (or, s-feasible) for the system (7) if

1.

ut∈ U [0, ∞) := n

u : [0, ∞) × Ω → U | ut is {Bt}_t≥0-adaptedo {Bt}_t≥0 is the Þltration generated by ((xτ, uτ : τ ∈ [0, t]) , t ≥ 0)

and augmented by the P - null sets from F

2. xt is the unique solution of (7) in the sense that for any xt and öxt, satisfying (7), P_{{ω ∈ Ω : xt}= öx_{t} = 1.}

The set of all s-feasible controls is denoted by Uf eass [0, ∞). The pair (xt; ut) , where xt is the solution of (7) corresponding to this ut, is called an s-feasible pair. The assumptions A1-A3 guarantee that any ut _{from U [0, ∞) is s-feasible. It is assumed also that the past} information about (xτ, uτ : τ ∈ [0, t]) is available for the controller: ut:= u

¡

t, x_{τ ∈[0,t]}, u_{τ ∈[0,t)}¢ implying that ut becomes {Ft}_t≥0-adapted too.

The only sources of uncertainty in this system description are the system random noise ø

Wt and the priori unknown Lφ ¡

C2¢_{-functions b (t, x, u) and c (t, x, u). The class Σ} un of uncertain stochastic systems (7) satisfying A1-A3 is considered below as the main object of the investigation.

4. Diﬀerential Neural Networks

Consider the following DNN (Poznyak et al., 1999) with a single hidden layer2:

döxt= [Aöxt+ W1,tσ (V1,txöt) + W2,tϕ (V2,txöt) γ (ut)] dt (9) where öxt _{∈ R}n _{is its state vector at time t, σ : R}r _{→ R}k is the given Lφ

¡

C2¢-mapping, ϕ : Rs → Rl is the given Lφ

¡

C2¢-mapping, V1,t ∈ Rr×n, V2,t∈ Rs×n are a {Bt}_t≥0-adapted adjustable internal hidden layer weight matrices, W1,t _{∈ R}n×k, W2,t _{∈ R}n×l _{are a {B}t}_t≥0 -adapted adjustable the output layer weight matrices, γ : Rp → Rqthe given Lφ

¡

C2¢-mapping satisfying kγ (u)k ≤ γ0+ γ₁_{kuk for any u ∈ U, A ∈ R}n×n _{is a constant matrix. The initial} state vector öx0 is supposed to be quadratically integrable

³

En_köx_0k2o_{< ∞}´. 2

A multi layer DNN can be represented in a single layer form by the corresponding matrix dimension extension.

(8)

Remark 1. The mappings σ and ϕ are usually assumed to be of the main diagonal structure with the bounded sigmoidal elements σii and ϕii, that is,

σii:= ai 1 + e−b|ixˆ − c i (i = 1, min {r; k}) , øσ := sup x∈Rnkσ (x)k , ø ϕ := sup x∈Rnkϕ (x)k which obviously belong to the class of Lφ

¡

C2¢-functions. 5. The Learning Law for IdentiÞcation Process

The matrices collection (A, V1,t, V2,t, W1,t, W2,t) should be selected to provide ”a good enough” identiÞcation process for any possible uncertain stochastic system from Σun.

Accept several additional assumptions concerning the stability property of the process to be identiÞed and the structure of the applied control.

A4: the applied s-feasible contol ut∈ U is quadratically integrable (uniformly on t) as well the corresponding stochastic system dynamics xt, that is, lim sup

t→∞ ³ E n kutk2 o + E n kxtk2 o´ < ∞. In fact, this assumption restricts the consideration by the class of quadratically stable systems.

A5: _{it is assumed that ku}tk ≤ δ0+ δ1kxtk. This class of controllers includes practically all known ones widely used in diﬀerent applications (Linear (PID-type), Sliding Mode, HJ, Locally Optimal and others).

Accept also some technical assumption which will be use in the main result formulation. A6: There exist a stable matrix A and positive deÞnite matrices Λσ, Λη_σ, Λf, Λγ, Q0 such that the matrix algebraic Riccati equation

G := P A + A|P + ³ L2_σV_1,0| W1,0ΛσW_1,0T V1,0+ Q0 ´ +P³Λ−1_σ + Λ−1_η σ + Λ −1 f + 8øσ 2°_°Λ−1 γ ° ° W2,0W2,0| ´ P = 0 (10)

has a positive solution P = P|> 0.

A7. The large number law (Poznyak, 2000) is valid for the considered processes T−1 T R t=0 h sε_µ¡P1/2∆t ¢ kxtk2− Ensε_µ¡P1/2∆t ¢ kxtk2oidta.s._{→ 0} T−1 RT t=0 £ sε µ ¡ P1/2_∆ t ¢ − E©sε µ ¡ P1/2_∆ t ¢ª¤ dta.s._{→ 0} (11)

where ε > 0, the ”dead-zone” function sµ_{(z) : R}n_{→ R}1 is deÞned as follows:

sµ_{(z) := [1 − µ/ kzk]+}= (

1 − µ/ kzk , µ/ kzk ≤ 1

0, _{µ/ kzk > 1} (12)

and ∆t := öxt − x. The last assumptions is not so restrictive since the only decreasing of the autocorrelation functions of the considered processes is required to fulÞll (11) that

(9)

automatically holds for the accepted constructions (see Poznyak, 2000 and Poznyak et al, 2000).

The weights in (9) are adjusted by the following Diﬀerential Learning Law :                                            LW 1,t:= k−1w,1Wú1,t− Sµ,ε¡P1/2∆t¢P ∆t[σ|(V1,txtö_{) − ö}x|_t (V1,t_{− V}1,0)|D|σ(V1,töxt)] = 0 LW 2,t:= k−1w,2Wú2,t− kw,2Sµ,ε ¡ P1/2_∆ t ¢ P ∆t[γ|(ut) ϕ|(V2,txöt) −öx|_t(V2,t− V2,0)| µ_P_q i=1 γ_i(ut) D|iϕ(V2,txöt) ¶¸ = 0 LV_1,t:= k_v,1−1V1,tú ₋ Sµ,ε¡P1/2∆t¢ hDσ|(V1,txt) Wö 1,t| P ∆t+ 2L2σΛησ(V1,t− V1,0) öxt i ö x|_t = 0 LV_2,t:= k_v,2−1V2,tú _{− S}µ,ε¡P1/2∆t¢ ·µ_P_q i=1 γ_i(ut) D_iϕ| (V2,txt)ö ¶ W_2,t| P ∆t +4°°Λ−1_γ °° ° ° °W2,0| P ∆t ° ° °2L2_ϕ(V2,t− V2,0) öxt ¸ ö x|_t = 0 (13)

with any nonzero initial weights V1,0 and V2,0 and Sµ,ε(z) := µ s1+ε_µ (z) · 2sµ(z) + (2 + ε) µ kzk ¸¶ (14) µ2 := (C0+C1ρ) (2+ε)λmin(P−1/2_Q0P−1/2₎, f1 := L1+ kAk + kW2,0k øϕγ1δ1 f0 := L0+ kW1,0k øσ + kW2,0k øϕ (γ0+ γ1δ0) Ci := 2 (2 + ε) f_i2_(kΛf_{k + kΛ}γk) + (2 + ε)2LcikP k (15) ρ ≥ 1₂lim sup T →∞  T−1 T Z t=0 E n sε_µ ³ P1/2∆t ´ kxtk2 o   T−1 T Z t=0 E n s1+ε_µ ³ P1/2∆t ´o dt   −1 (16)

Theorem 1 (Main result on Learning Law). If, under the assumptions A1-A7, the learning law (13) is applied, then the following properties of the corresponding identiÞcation process are guaranteed:

1) the asymptotic convergence with the probability one to the dead-zone takes place: sµ ³ P1/2∆t ´ :=h_{1 − µ/} ° ° °P1/2∆t ° ° °i + a.s. → 0 (17)

2) the weight matrices as well as the identiÞcation error remain bounded (in mean square sense) during all learning process, that is, for any t ≥ 0

2 P i=1 ³ trE n V_i,t|Vi,t o + trE n W_i,t|Wi,t o´ + trE©V0¡P1/2∆t¢ª_{≤ C < ∞} V0_{(z) := kzk}2s2+ε_µ (z) (18)

(10)

The proof of the theorem is given in Appendix. 6. Adaptive Trajectory Tracking

In this section we consider the adaptive trajectory problem formulated for the subclass of completely controllable stochastic nonlinear processes with a linear forcing control to avoid the discussions concerning the controllability properties and simplify the main idea of the presentation given below. Let the random process xt be the subclass of (7) and is given by

dxt= [f (t, xt) + ut] dt + c (t, xt, ut) d øWt, x0 = xgiven, t ∈ [0, ∞) (19) where ut_{∈ R}n_{, f : [0, ∞) × R}n_{→ R}n _{and c : [0, ∞) × R}n_{× U → R}n×m are Lφ

¡

C2¢-mappings such that

A3’:

kf (t, x)k ≤ L0,f + L1,fkxk , kc (t, x, u)k ≤ øc (20)

The nominal trajectory x∗_tto be tracked is assumed to be bounded that, for the functions ø

ϕ(x∗_{, t) uniformly bounded on t and having summed bounded variation on x}∗_{, implies} sup

t kø

ϕ(x∗_t_{, t)k = ø}ϕ₀ _{< ∞} (21)

To simplify the further presentation, select the DNN structure as follows d

dtxöt= öft(öxt) + ut, öft(x) := Ax + W1,tσ (V1,tx) (22) Realize the adaptive trajectory tracking control ut as

ut= öut:= øϕ (x∗_{, t) − ö}ft(öxt) + A∗(öxt_{− x}∗_t) , A∗ - a stable matrix (23) Remark 2. Since the trajectory xt is assumed to be completely measurable, their use in (23) instead of öxtis preferable, that is, the control utcan be formed as

ut= öut:= øϕ (x∗, t) − öft(xt) + A∗(xt− x∗t) (24) In this case the proof of the stability for the close-loop system remains practically the same. The only one thing is changed: in the corresponding Lyapunov function (see the proof of Theorem 2) instead of ∆∗_t := öxt_{− x}∗_t should be used ∆∗_t := xt_{− x}∗_t.

Remark 3. We realize that in this simpliÞed case (g (x, t) = I) the high-gain controller also can be applied to provide the robustness property for the closed-loop system that corre-sponds to a negative linear feed-back given by

(11)

Such class of robust controllers ”ignores” the nonlinearities which are partially or completely unknown and, as a result (the simulations below) shows the less quality of tracking with respect to (24) where the nonlinear eﬀects are partially compensated by the term öft(xt) obtained by DNN implementation.

Select the stable matrix A and the positive deÞnite matrices Λ−1_σ , Λ−1_η σ, Λ

−1

∆f and Q1 such that the following algebraic matrix Riccati equation

Ric := P1A + A|P1+ P1 h Λ−1 σ + Λ−1ησ + Λ −1 ∆f i P1+ 3 kΛ∆fk C12I + Q1 + hL2_σV_1,0| W1,0ΛσW_1,0T V1,0i= 0, C1 := L1,f + kAk

has the positive solution P1= P1| > 0. Let P2satisÞes the following Lyapunov matrix equation P2A∗+ A∗|P2+ 3 kΛ∆fk C12I = −Q2

where A∗ the stable matrix participationg in (23).

Let the weight matrices of the DNN (22) be adjusted according to the following Diﬀerential Learning Law :              Lw1,t:= 2Sµ,ε(zt) [σ (V1,txöt) ∆|tP − Dσ(V1,txöt) (V1,t− V1,0) öxt∆|tP ] dt +2 (4 + ε) k_w,1−1dW_1,t| = 0 Lv1,t:= 2Sµ,ε(zt) £ ö xt∆|tP W1,tDσ(V1,txöt) + 2Lσ2xötxö|t (V1,t− V1,0)|Λησ ¤ dt +2 (4 + ε) k−1_v,1dV_1,t| = 0 (26)

where the ”dead-zone” function Sµ,ε(zt) is deÞned by (14), zt := " P₁1/2 0 0 P₂1/2 # Ã ∆t ∆∗_t !

and the number µ satisÞes µ2 _≥   2ρβ lim sup T →∞ T R t=0 E_{sε µ(zt)}dt T R t=0 E_{s1+εµ (zt)}dt  

 with β and ρ deÞned as

β :=h(2 + ε)2_{tr {P1}+ P_{2} + (2 + ε) 3 kΛ∆f}_{k C}2 0 i ø c2 C0 := L0,f+ kW1,0k øσ + (L1,f+ kAk) sup t kx ∗ tk ρ := minnλmin ³ P₁−1/2Q1P1−1/2 ´ ; λmin ³ P₂−1/2Q2P2−1/2 ´o 6.4. The main result on adaptive tracking with on-line identiÞcation

Theorem 2. The application of the adaptive control (23) together with the learning law (26) under the assumptions A1-A2 and A3’ (instead of A3) implies with probability one and in mean square sense the convergence of the vector process ¡∆|_t, ∆∗|_t ¢| : ∆t= öxt_{− x}t, ∆∗_t = ö

xt− x∗t to the ”dead-zone” deÞned by ∆|tP1∆t+ ∆∗|t P2∆∗t ≤ µ keeping the weights Wi,t, Vi,t bounded.

(12)

The proof of the theorem is given in the Appendix.

Corollary 1. The stability of unknown stochastic system follows from the boundness of the nominal trajectory x∗

t and the inequality kxt− x∗tk = k(xt− öxt) + (öxt− x∗t)k ≤ k∆tk + k∆∗tk . The tracking upper bound can be estimated (for large enough t) as

kxt− x∗tk 2 ≤ k∆tk2_P1 + ° ° °P2−1/2P1−1P −1/2 2 ° ° ° · k∆∗ tk 2 P2 ≤ maxn1; ° ° °P2−1/2P1−1P2−1/2 ° ° °oµ2 7. Simulation Results

7.1. IdentiÞcation of Unforced Stochastic Systems

Consider the nonlinear stochastic process generated by the Van der Pole oscillator with a noisy component, i.e., x1,0= 2, x2,0= 0 and

( dx1,t= x2,tdt dx2,t= 2.5 £¡ 1 − x2 1,t ¢ x2,t− x1,t ¤ dt + 0.075d øWt The DNN applied for the identiÞcations is as follows

döxt= [Aöxt+ W1,tσ (V1,txöt)] dt, öx1,0= 0.1, öx2,0= 0.5 A = −5I, σi(z) = 1/ (1 + exp (−0.2zi)) − 0.5 with the weights W1,t, V1,t∈ R2×2 adjusted by

The selected parameter values are P = 60I, W1,0 =

Ã 0.5 0 0 0.2 ! , V1,0 = Ã 1 1 1 2 ! , k1,w = 2, k1,v = 0.3, ε = µ = 0.01, Λησ = λησI, 2L 2 σλησ = 0.05. The correspondent simulation results, realized only with the use of two neurons (i = 1, 2), are depicted at the Þgures Fig.1-Fig.4.

(13)

Fig.3: The elements of W1. Fig.4: The elements of V1. The value of the performance index It = (t + 0.001)−1

t R τ =0k∆

τk2dτ obtained after t = 20 sec . is equal to It=20= 0.4609.

7.2. IdentiÞcation of Controlled Stochastic Systems The nonlinear stochastic process generated by

(

dx1,t = [−5x1,t+ 3 |x2,t| + u1,t] dt + 0.088d øW1,t, x1,0= 0.1 dx2,t _{= [−10x}2,t+ 2 |x1,t| + u2,t] dt + 0.088d øW2,t, x2,0= −0.2

(27)

is considered. The programmed controls are the sine-wave and the saw-tooth periodic func-tions, that is, u1.t= u1,0sin (ωt) , u1,0 = ω = 1 and u2,t= u2,0· (t − kτ) , t ∈ [kτ, (k + 1) τ) , u2,0= τ = 0.2, k = 0, 1, 2, ... The DNN applied for the identiÞcations is

döxt= [Aöxt+ W1,tσ (V1,txöt) + W2,tϕ (V2,töxt) ut] dt, öx1,0= 0.1, öx2,0= 0.05 (28)

σi_{(z) = 2/ (1 + exp (−r}szi_{)) − 0.5, r}s= 0.025, rf = 0.3

ϕ_1,2 = ϕ_2,1= 0, ϕ_{i,i(z) = 0.5/ (1 + exp (−rf}z_{i)) − 0.05} (29)

and the weights W1,t, V1,t, W2,t, V2,t _{∈ R}2×2 are adjusted by the learning law (13) with

q = 2, γ_i(u) = ui (i = 1, 2). The selected parameter values are P =

Ã 15 ₋₄ −4 15 ! , A = Ã −3 −0.01 −0.01 −2 ! , k1,w1 = k1,w2 = 0.3, k1,v1 = 25, k1,v2 = 2.5, ε = 0.01, µ = 0.001, 2L2 σλησ = 0.1, Λ−1γ = λγI, 4L 2

ϕλγ = 0.1 and the initial weight values equal to

W1,0 = Ã 0.5 0.1 0 2 ! , V1,0 = Ã 1 1 1 2 ! , W2,0 = Ã 0.5 0 0 0.2 ! , V2,0 = Ã 1 0.1 0.1 2 ! . The correspondent simulation results, realized only with the use of 4 neurons (2 in the output and 2 in the hidden layer), are depicted at the Þgures Fig.7-Fig.8. The value of the performance index It, obtained after t = 20 sec ., is equal to It=20= 0.008087.

(14)

Fig.7: x1 and öx1 (dashed). Fig.8: x2 and öx2 (dashed). 7.3. Trajectory tracking based on DNN identiÞer

The same plant (27) (with very high noise intensity: σ1= σ2 = 0.5) has been selected as the stochastic nonlinear system to be forced to track the nominal dynamics (the saw-tooth and the sine processes) given by

x∗₁ _{= 1 + (10k − t) /5, k = 0, 1, ...; x}∗₂ = sin (2πt/10)

The DNN (9) (W2,tϕ (xt) ≡ I) has been applied with the following parameters: P = Ã 1 0.25 0.25 1 ! , A = Ã −5 −0.2 −0.2 −4 ! , rs = 25, rf = 3, k1,w1 = 0.5, k1,v1 = 1, ε = 0.01, µ = 0.001, 2L2_σλησ = 0.1, Λ −1

γ = λγI. The initial weights were W1,0 = Ã 0.5 0 0 2 ! , V1,0 = Ã 1 1 1 0 ! .

The adaptive control law u(t) is given by (24) with A∗ _{= 75} Ã

−1 0

0 ₋₁

!

. The corresponding adaptive tracking results are given in the Figures 9 and 10.

(15)

The tracking performance index It, obtained after t = 20 sec ., is equal to It=20ad−track = 0.006665. The index corresponding to the high -gain controller (25) with K = 75I (but ”ignoring” the nonlinear eﬀects) is equal to I_t=20high−gain = 0.01155, that is, twice worse with respect to the adaptive controller (24) using on-line DNN estimates.

8. Conclusion

In this paper a new adaptive neuro controller is proposed to force a given (but unknown) continuous time stochastic process to track a nominal trajectory. It is based on the DNN -identiÞer adjusting the corresponding weight matrices by a special diﬀerential learning law. Even the sigmoidal nonlinearities are bounded and, in the Þrst glance, the hidden layer weights variation may not eﬀect the identiÞcation quality, the presented theoretical study (as well as the corresponding simulations) show that the hidden layer weights updating signiÞcantly improves the identiÞcation process. Stochastic Lyapunov analysis (with the special form of Lyapunov function with ”dead-zone” multipliers) is applied to state the existence of the identiÞcation and adaptive tracking error bounds which incorporate all characteristics of uncertainties (including noise variance).

Appendix

Proof of Theorem 1. Represent (7) in the following form

dxt= [Axt+ W1,0σ (V1,0xt) + W2,0ϕ (V2,0xt) γ (ut) + ∆ft] dt + c (t, xt, ut) d øWt ∆ft:= b (t, xt, ut) − [Axt+ W1,0σ (V1,0xt) + W2,0ϕ (V2,0xt) γ (ut)]

(30) By (9) and A1-A4, ∆ft _{veriÞes the inequality k∆f}tk ≤ f0+ f1kxtk. Then from (7) and (9) it follows that

d∆t= [A∆t+ F1+ F2γ (ut_{) − ∆f}t] dt − c (t, xt, ut) d øWt F1 := W1,tσ (V1,txöt) − W1,0σ (V1,0xt) = ÷W1,tσ (V1,txöt) + W1,0[öσt+ ÿσt]

F2 := (W2,tϕ (V2,töxt_{) − W}2,0ϕ (V2,0xt)) γ (ut) =³W÷2,tϕ (V2,txöt) + W2,0[öϕt+ ÿϕt]

´

γ (ut) , ÷Wi,t:= Wi,t− Wi,0

(31)

ö

σt:= σ (V1,txöt) − σ (V1,0xöt) , ÿσt:= σ (V1,0öxt) − σ (V1,0xt) ö

ϕ_t:= ϕ (V2,txöt) − ϕ (V2,0xöt) , ÿϕt:= ϕ (V2,0xöt) − ϕ (V2,0xt) In view of the diﬀerentiability and Lipshitz properties A3

ö

σt= Dσ(V1,txt) ÷ö V1,töxt+ η_σ,t, ÷Vi,t := Vi,t_{− V}i,0 ö

ϕ_tγ (ut) = q P i=1

γ_i(ut) Diϕ(V2,töxt) ÷V2,töxt+ η_ϕ,tγ (ut) Dσ(V1,töxt) := _∂z∂_{σ (z) |z=V1,tˆ}xt, Diϕ(V2,töxt) := ∂z∂ϕi(z) |z=V2,txtˆ

(16)

° °η_σ,t°_{° ≤ 2L}σ ° ° ° ÷V1,txöt ° ° ° , kÿσtk ≤ min {2øσ; LσkV1,0∆tk} ° °η_ϕ,t°_{° ≤ 2L}ϕ ° ° ° ÷V2,txöt ° ° ° , kÿϕtk ≤ min {2øϕ; LϕkV2,0∆tk} Introduce the following Lyapunov function candidate

Vt:= V0 ³ P1/2∆t ´ + (2 + ε) 2 X i=1 ³

k−1_v,itrnV÷_i,t|V÷i,t o

+ k−1_w,itrnW÷_i,t|W÷i,t o´

(33)

where 0 < P = P ∈ Rn×nis a matrix to be selected, ε, kw,i, kv,i are positive scalar parameters and the function V0(z) is deÞned by (18). Notice that

∂ ∂zV0(z) = Sµ,ε(z) := s1+εµ (z) h 2sµ(z) + (2 + ε)_kzkµ i z ∂ ∂∆V0 ¡ P1/2∆¢=³s1+ε_µ (z)h2sµ(z) + (2 + ε)_kzkµ i´ |z=P1/2_∆P ∆ (34) 2s1+ε_µ _{(z) ≤ Sµ,ε}_{(z) ≤ (2 + ε) s}1+ε_µ (z) ∂2 ∂z2V0(z) = ³ 2s2+ε_µ (z) + (2 + ε)_kzkµ s1+ε_µ (z)´I + (2 + ε)³_kzkµ s1+ε µ (z) + µ2 kzk2 (1 + ε) s ε µ(z) ´ zz| kzk2 ≤ (2 + ε) 2_sε µ(z) I ∂2 ∂∆2V0 ¡ P1/2∆¢_{≤ (2 + ε)}2sε_µ_{(z) |z=P}1/2_∆P (35)

As sµ(z) ∈ [0, 1] , by (34), (35) and the Itô formula (Gard, 1988), it follows dVt≤ 2 (2 + ε)

2 P i=1

³

k_v,i−1trnd ÷V_i,t|V÷i,t o + k_w,i−1trnd ÷W_i,t|W÷1,t o´ + Sµ,ε ¡ P1/2∆t ¢ 2∆|_tP d∆t+ (2 + ε)2sεµ ¡ P1/2∆¢_{tr {c (t, xt}, ut) P c|(t, xt, ut)} dt (36)

The use of (36), (??), (32), (31), the inequalities

X|Y + Y|_{X ≤ X}|Λ−1X + Y|ΛY (37)

valid for any X, Y ∈ Rm×n and any 0 < Λ = Λ| _{∈ R}m×m_{, and kA}|_{Ak ≤ tr {A}|_{A}, implies:} 1) 2∆|_tP A∆t= ∆|_t(P A + AP ) ∆t 2) 2∆|_tP F1≤ tr nh 2σ (V1,txöt) ∆|tP − 2Dσ(V1,txöt) ÷V1,txöt∆|tP i ÷ W1,t o +trnh2öxt∆|tP W1,tDσ(V1,txöt) + 4L2σxötxö|tV÷1,t|Λησ i ÷ V1,t o +∆|_t ³PhΛ−1_σ + Λ−1_η σ i P +hL2_σV_1,0| W1,0ΛσW_1,0T V1,0i´∆t

(17)

3) 2∆|_tP F2 ≤ tr {[2ϕ (V2,txöt) γ (ut) ∆|tP ½ −2 µ _q P i=1 γ_i(ut) Diϕ(V2,txöt) ¶ ÷ V2,txöt∆|tP ¸ ÷ W2,t ¾ +tr ½· 2öxt∆|tP W2,t µ _q P i=1 γ_i(ut) Diϕ(V2,txöt) ¶¸ ÷ V2,t ¾ +tr ½· 8°°Λ−1 γ ° °°°_°W_2,0| P ∆t ° ° °2L2 ϕxötxö|tV÷2,t| ¸ ÷ V2,t ¾ +∆|_tP³8øσ2°°Λ−1_γ °° W2,0W_2,0| ´P ∆t_{+ 2 kΛ}γk ³ f₀2+ f₁2_kxtk2 ´ 4) −2∆|tP ∆ft≤ ∆|tP Λ−1f P ∆t+ 2 kΛfk ³ f₀2+ f₁2_kxtk2´

Substituting these inequalities in to the right-hand side of (36), Þnally, it follows: dVt_{≤ − (2 + ε) S}µ,ε ¡ P1/2∆t¢∆|_tQ0∆t_{dt − S}µ,ε ¡ P1/2∆t¢∆|_tP c (t, xt, ut) d øWt + 2 (2 + ε) Sµ,ε ¡ P1/2_∆ t ¢ (kΛfk + kΛγk)³f2 0 + f12kxtk 2´_dt + (2 + ε)2sε_µ¡P1/2∆¢_{tr {c (t, x}t, ut) P c|(t, xt, ut_{)} dt} (38)

since, in view of the accepted learning laws (13), we have G = 0, LW_i,t = 0 and LV_i,t = 0. In view of Lφ

¡

C2¢_{- property (8), from (38) it follows} dVt≤ − (2 + ε) λmin(Q0) Sµ,ε ¡ P1/2∆t ¢ ³ 1 −_k∆tkµ2 2 ´ k∆tk2dt+ C1 ³ sε_µ¡P1/2∆t ¢ kxtk2− 2s1+εµ ¡ P1/2∆t ¢ ρ´_{dt − Sµ,ε}¡P1/2∆t ¢ ∆|_tP c (t, xt, ut) d øWt (39)

The integration of (39) implies 0 ≤ 2(2+ε)λmin(PT−1/2Q0P−1/2) T R t=0 s1+ε_µ ¡P1/2∆t ¢µ 1 − µ2 kP1/2_∆t_k2 ¶ k∆tk2dt a.s. ≤ T−1_V₀_{− T}−1_V_t_{+ C}₁_T−1 RT t=0 ³ sε µ ¡ P1/2_∆ t ¢ kxtk2− 2s1+ε µ ¡ P1/2_∆ t ¢ ρ´dt −T−1 T R t=0 Sµ,ε¡P1/2∆t¢∆|_tP c (t, xt, ut) d øWt and, as a result, 0a.s._{≤ T}−1_V₀_{+ D}_n_{− T}−1 RT t=0 Sµ,ε ¡ P1/2_∆ t ¢ ∆|_tP c (t, xt, ut) d øWt+ C1 Ã 2T−1 T R t=0 E©s1+ε_µ ¡P1/2∆t ¢ª dt !−1   T−1 RT t=0 E_{sε_µ(_P1/2_∆t)kxtk2 } 2T−1 RT t=0 E_{s1+εµ (P1/2∆t)}dt − ρ    dt a.s. ≤ T−1_V₀_{+ D}_n_{− T}−1 RT t=0 Sµ,ε ¡ P1/2_∆ t ¢ ∆|_tP c (t, xt, ut) d øWt (40)

(18)

where Dn:= C1T−1 T R t=0 h sε_µ¡P1/2∆t¢_kxtk2− E n sε_µ¡P1/2∆t¢_kxtk2 oi dt −2C1ρT−1 T R t=0 £ s1+ε_µ ¡P1/2∆t¢_{− E}©s1+ε_µ ¡P1/2∆t¢ª¤dt

In view of large number law for depended processes (Poznyak, 2000) and lemma 2 from (Poznyak et al.), it follows that Dna.s._{→ 0 and T}−1

T R t=0

Sµ,ε¡P1/2∆t¢∆|_tP c (t, xt, ut) d øWta.s._{→ 0.} So, (40) leads to the following conclusion that sµ

¡ P1/2_∆ t ¢a.s. → 0 since lim sup T →∞ T−1 T Z t=0 s1+ε_µ ³P1/2∆t ´Ã 1 − µ 2 ° °P1/2_∆ t ° °2 ! k∆tk2dta.s._{≤ 0}

The property 1 is proven. To obtain the property 2 it is suﬃcient to apply the operator of mathematical expectation to both sides of the inequality

Vt≤ V0+ T Dn− T Z t=0 Sµ,ε ³ P1/2∆t ´ ∆|_tP c (t, xt, ut) d øWt

resulted from (40) that implies E {Vt} ≤ E {V0} < ∞.

Proof of Theorem 2. 1) DeÞne the ”joint” Lyapunov function øVtas ø Vt:= V0 Ã" P₁1/2 0 0 P₂1/2 # Ã ∆t ∆∗ t !! + (2 + ε)³k_w,1−1trnV÷_1,t|V÷1,t o + k_v,1−1trnW÷_1,t| W÷1,t o´ (41)

The identiÞcation and tracking errors ∆t= öxt− xt and ∆∗t = öxt− x∗t satisfy d∆t= [A∆t+ F1− ∆ft] dt − ctd øWt d∆∗_t = A∗∆∗_tdt + ctd øWt, ct:= c (x, u, t) (42) where ∆ft veriÞes k∆ftk = ° ° °f (t, xt) − öf0(xt) ° ° ° ≤ L0,f + kW1,0k øσ + (L1,f + kAk) kxtk ≤ C0+ C1k∆tk + C1k∆∗tk − 2∆|tP1∆ft ≤ ∆|tP1Λ−1_∆fP1∆t+ 3 kΛ∆fk ³ C2 0 + C12k∆tk 2_{+ C}2 1k∆∗tk 2´ (43) By (34), (35) it follows dVt= Sµ,ε(zt)¡2∆|_tP1d∆t+ 2∆∗|_t P2d∆_t∗¢+ 2 (2 + ε)−1_v,1trnd ÷V1,tV1,t÷ o+ 2 (2 + ε) k_w,1−1trnd ÷W_1,t| W÷1,t o +1₂tr (" ct 0 0 ct # ∂2 ∂z2V0(zt) " ct 0 0 ct #|) dt (44)

(19)

2) The next step consists of the substitution of (42) in (44). The upper estimates for the terms 2∆|_tP1A and 2∆|tP1F1 remain absoluteley the same as before in Theorem 1. Substi-tuting them in (44) implies

d øVt≤ Sµ,ε(zt) ³ ∆|_t _{(Ric − Q1}) ∆t+ 3 kΛ∆fk C02+ tr n Lw1,tW÷1,t o +trnLv1,tV÷1,t o + ∆∗|_t £P2A∗+ A∗|P2+ 3 kΛ∆fk C12I ¤ ∆∗_t´dt + 2 (2 + ε)2cø2_{tr {P}1+ P2} sεµ(zt) dt − Sµ,ε(zt)¡∆|tP1ct− ∆∗|t P2ct ¢ d øWt and by the accepted learning law (Lw1,t= 0, Lv1,t= 0) we Þnally have

d øVt_{≤ −2ρs}1+ε_µ (zt_{) kz}tk2 ³ 1 − µ2/ kztk2 ´ dt − Sµ,ε(zt)¡∆|tP1ct− ∆∗|t P2ct ¢ d øWt ³h (2 + ε)2_{tr {P1}+ P_{2} + (4 + ε) 3 kΛ∆f}_{k C}2 0 i sε µ(zt) øc2− 2ρs1+εµ (zt) µ2 ´ dt The integration of the last inequality implies

2ρT−1 T Z t=0 s1+ε_µ (zt_{) kz}tk2 ³ 1 − µ2/ kztk2 ´ dt ≤ T−1V0ø + DT + D 0 T − D 00 T (45) where DT := 2ρT−1 T R t=0 E©s1+ε_µ (zt) ª dt    β T R t=0 E_{sε µ(zt)}dt 2ρ T R t=0 E_{s1+εµ (zt)}dt − µ2    D0_T := (2 + ε) 3øc2_kΛ∆f_{k C}2 0T−1 T R t=0 £ sε µ(zt) − E © sε µ(zt) ª¤ dt −2ρµ2T−1 T R t=0 £ s1+ε_µ (zt_{) − E}©s1+ε_µ (zt)ª¤dt D_T00 := T−1 T R t=0 Sµ,ε(zt) ¡ ∆|_tP1ct− ∆∗|t P2ct ¢ d øWt

By the deÞnition of µ, the law of large numbers and the property of Itô integral DT ≤ 0, D0T → 0, D 00 T → 0 (T → ∞) a.s. 0 ≤ 2ρlim sup T →∞ T−1 T R t=0 s1+ε_µ (zt_{) kz}tk2 ³ 1 − µ2/ kztk2 ´ dta.s._{≤ 0}

that proves the results of the theorem dealing. The mean square stability follows directly from (45).

References

Agarwal M., 1997, A Systematic ClassiÞcation of Neural Network Based Control, IEEE Control Systems Magazine, 17, 75-93.

Gard T.C., Introduction to Stochastic Diﬀerential equations, 1988, Marcel Dekker Inc., NY and Basel.

(20)

Duncan T.E., Lei Guo and B. Passik-Duncan, 1999, Adaptive Continuous-time Linear Quadratic Gaussian Control, IEEE Transaction on AC, 44 (9), 1653-1662.

HopÞeld J.J., 1984, Neurons with Grade Responce Have Collective Computational Prop-erties Like Those of a Two-state Neurons, Proc. Nat. Academy Sci., USA, 8, 3088-3092.

Hunt K.J., Sharbaro D., Zbikovski R. and P.J.Gawthrop, 1992, Neural Networks for Control Systems - A Survey”, Automatica, 28, 1083-1112.

Haykin S., 1994, Neural Networks: A Comprehensive Foundation, IEEE Press, NY. Isidori A., Nonlinear Control Systems, 3.rd Ed., 1995, New York, Springer- Verlag. Kosmatopulos E.B. and M.Christodolou, 1994, The Boltzmann g-RHONN: A Learning Machine for estimating Unknown Probability Distributions, Neural Networks, 7 (2), 271-278. Krsti´c M., Kanellakopulos I. and P.Kokotovich, Nonlinear and Adaptive Control Design, 1995, John Walley & Sons, NY.

Lewis F.L. and T.Parisini, 1998, Neural Network Feedback Control with Guaranteed Stability, Int. J. Control, 70 , 337-340.

Ljung L. System for IdentiÞcation. Theory for the Users (Second Edition). 1999. Prentice Hall PTR, Upper Saddle River, NJ 07458.

Ljung L. and S.Gunnarson, Adaptation and Tracking in System IdentiÞcation - A Survey, 1990, Automatica, 26 , 7-21.

Narendra K.S. and S.M.Li, 1998, Control of Nonlinear Time-Varying Systems Using NN, Proc. of 10-th Yale Workshop on Adaptive and learning Systems, 2-18.

Parisini T. and R.Zoppoli, 1994, Neural Networks for Feedback Feedforward Nonlinear Control Systems, IEEE on NN, 5, 436-449.

Poznyak A.S., E.N. Sanchez, Wen Yu and J.P. Perez, 1999, Nonlinear Adaptive Trajectory Tracking Using Dynamic Neural Networks, IEEE on NN, 10 (6), 1402-1411.

Poznyak A.S., Martinez-Guerra R. and A.Osorio-Cordero, 2000, Robust High-gain Ob-server for Nonlinear Closed-Loop Stochastic Systems, Mathematical Methods in Engineering Practice, 6, pp.31-60.

Poznyak A.S., 2000, A New Version of the Strong Law of Large Number for Dependent Vector Processes with Decreasing Correlation, Proceedings of CDC-2000, Sydney, 2881-2882. Rovithakis G.A. and M.A.Cristodoulou, 1994, Direct Adaptive regulation of Unknown Plants using DNN, IEEE on SMC, 24 (3), 400-412.

Rovithakis G.A. and M.A.Cristodoulou, 2000, Adaptive control with Recurrent High-Order Neural Networks: Theory and Applications, Springer.

Sandberg I.W., 1991, Approximation Theorems for Discrete Time Systems, IEEE on AC, 38, 564-566.