Differential graphical games for H-infinity control of linear heterogeneous multiagent systems

(1)

Differential graphical games for H-infinity control

of linear heterogeneous multiagent systems

Farnaz Adib Yaghmaie, Kristian Hengster Movric, Frank L. Lewis and Rong Su

The self-archived postprint version of this journal article is available at Linköping

University Institutional Repository (DiVA):

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-158823

N.B.: When citing this work, cite the original publication.

Adib Yaghmaie, F., Movric, K. H., Lewis, F. L., Su, R., (2019), Differential graphical games for H-infinity control of linear heterogeneous multiagent systems, International Journal of Robust and

Nonlinear Control, 29(10), 2995-3013. https://doi.org/10.1002/rnc.4538

Original publication available at:

https://doi.org/10.1002/rnc.4538

Copyright: Wiley (12 months)

(2)

Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/rnc

Differential Graphical Games for

H

∞

Control of Linear

Heterogeneous Multi-agent Systems

Farnaz Adib Yaghmaie

1 4

, Kristian Hengster Movric

2

, Frank L. Lewis

3

, Rong Su

4

1_{Department of Electrical Engineering, Link¨oping University, 58183 Link¨oping, Sweden,}

2_{Department of Control Engineering, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague,}

Czech,

3_{University of Texas at Arlington Research Institute, The University of Texas at Arlington, Texas, USA and Qian Ren}

Consulting Professor, Northeastern University, Shenyang 110036, China,

4_{School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore.}

SUMMARY

Differential graphical games have been introduced in the literature to solve state synchronization problem for linear homogeneous agents. When the agents are heterogeneous, the previous notion of graphical games cannot be used anymore and a new definition is required. In this paper, we define a novel concept of differential graphical games for linear heterogeneous agents subject to external unmodeled disturbances which contain the previously introduced graphical game for homogeneous agents as a special case. Using our new formulation, we can solve both the output regulation andH∞output regulation problems. Our graphical game framework yields coupled Hamilton-Jacobi-Bellman equations which are, in general, impossible to solve analytically. Therefore, we propose a new actor-critic algorithm to solve these coupled equations numerically in real time. Moreover, we find an explicit upper bound for the overallL2-gain of the output synchronization error with respect to disturbance. We demonstrate our developments by a simulation example. Copyright © 2019 John Wiley & Sons, Ltd.

Received . . .

1. INTRODUCTION

The next generation of networked systems is currently emerging from a number of different engineering domains. We already have examples of Internet of Things (IoT), Industry 4.0, smart cities and various other cyber-physical systems characterized by the requirement to coordinate efforts across large networks of different agents. In the near future, these systems are expected to increase in importance, with an ever-widening range of applications. Main design goals there are versatility, flexibility, easy real time reconfigurability, low communication burden, along with robustness to component failures and resilience to disturbances. These complex interconnected multi-agent systems create the need for novel team decision, distributed control, optimization and online computation methodologies.

A multi-agent system is defined as a group of interconnected dynamical systems interacting to achieve a desired collective behavior like state synchronization [1, 2, 3], output regulation [4, 5, 6], cluster synchronization [7], formation control [8], [9], etc. The canonical distributed control problem of state synchronization is usually defined for homogeneous agents where it is possible to use the local state synchronization error [1, 2, 3]. If the agents are heterogeneous, meaning that they

∗_{Correspondence to: Farnaz Adib Yaghmaie, Department of Electrical Engineering, Link¨oping University, 58183}

(3)

have different internal dynamics, it is not generally meaningful to achieve state synchronization; instead, one may consider an output regulation problem where a distributed controller is designed such that the outputs of all agents synchronize to a reference trajectory while the effect of modeled disturbances, i.e. disturbance with known dynamic model, is rejected. In these cases, the dynamics of the reference trajectory and the disturbance model are usually combined into a single dynamic model named exo-system in the literature [10, 4]. One possible solution to such output regulation problems is by the means of Internal Model Principle (IMP) where the idea is to incorporate an internal model of the exo-system in the dynamic controller of each agent [10, 11, 12, 4, 6]. If the agents are additionally subject to unmodeled disturbances then the IMP alone cannot be used for disturbance rejection andH∞ control methods are required. A significant part of the research on H∞control of multi-agent systems is onH∞state synchronization of homogeneous agents [13, 14] and more recentlyH∞output regulation of heterogeneous agents [15, 16, 17].

In the context of multi-agent systems, various optimization methods are utilized to achieve the required design characteristics. One has the conventional centralized optimization [18, 19] as well as more recent distributed approaches [20, 21] aiming at flexibility, reconfigurability, and robustness. The optimal control of a single dynamic system, also called a single-player game [22, 23], is the simplest dynamic optimization problem. However, this approach is usually found lacking in robustness to external disturbances acting on the system and possibly subsystem failures. The optimal control of a dynamic system subject to unmodeled disturbance is termed two-player zero-sum gamewhere the control player tries to minimize a cost function while the disturbance player maximizes it [24, 25]. More recently, the idea of graphical game is introduced to capture and exploit the locality and influences of dynamic systems on each other. In graphical games, the dynamics and the objective function of each player are influenced by other players in the neighborhood. A graphical game for homogeneous agents i.e. agents with the identical internal dynamics is suggested in [20] to achieve state synchronization using local state synchronization error. This concept is extended toH∞ graphical game where the agents are subject to external disturbances [21]. A graphical game for cluster synchronization of homogeneous agents is given in [7] where the agents in each cluster are synchronized to the reference trajectory in that cluster while different clusters have different reference trajectories. These graphical games provide a suitable platform for distributed optimal controller designs for the identical agents. The choice of individual player objective function, as discussed above, ensures that the multi-agent system achieves the desired collective behavior e.g. state,H∞-state or cluster synchronization [20, 21], [7].

Reinforcement Learning (RL) is concerned with learning optimal policies from interaction with an environment [26]. In RL, a decision-making system modifies its control policy based on the stimuli received in response to its previous policies to optimize the cost. In this sense, RL implies a cause and effect relationship between policies and costs [27], and as such, RL based frameworks enjoy optimality and adaptivity. Over the past few years, dynamic programming is utilized to develop RL techniques for adaptive-optimal control of dynamic systems, see [22, 24, 28, 25, 19, 20, 21] to name a few. Hence, RL is by now fairly standard in solving the single-player optimal and robust control problems [22, 23]. However, as of recently, RL can also be used to tackle even more complicated optimization structures like multiple interconnected dynamics system, namely a multi-player game, where a single centralized dynamics contains the dynamics of all players [18, 19]. Likewise, RL methods have also been successfully applied in graphical games [20, 21], [7]. Moreover, faced with complicated coupled design equations arising from such problems, RL remains the only viable method applicable in real time. Such recent approaches are particularly appropriate for problems arising from the domain of multi-agent systems.

In this paper, we aim to develop a novel graphical game framework for linear heterogeneous multi-agent systems. For this purpose, we bring together distributed control of heterogeneous multi-agent systems, graphical games, and reinforcement learning techniques. There are four main contributions in this paper. (1) We define the novel concept of graphical games for heterogeneous agents as opposed to homogeneous agents considered in [20, 21]. This allows us to achieve output regulation among heterogeneous agents. Graphical game for heterogeneous agents is also considered in [12], however, the communication graph in [12] is required to be acyclic (i.e. there

(4)

is no-loop in the graph). This restrictive assumption significantly simplifies the formulation and decouples the controller design of each agent from the others. (2) We assume that the agents are subject to unmodeled disturbances and define anH∞graphical game for heterogeneous agents. The only reference pertaining toH∞control of multi-agent systems in the graphical game framework [21] considers only homogeneous agents. (3)H∞graphical game for heterogeneous agents results in coupled Hamilton-Jacobi-Bellman equations that are difficult to solve analytically. We use RL and develop new critic networks to obtain solutions to these equations. In contrast, the actor-critic networks in [20, 21] can be used only for homogeneous agents. (4) We obtain an upper bound for theL2-gain of output synchronization error with respect to unmodeled disturbances; in contrast to [21], which does not calculate the upper bound but only contends its existence.

The rest of the paper is organized as follows. In Section 2, we discuss notation conventions and review preliminaries. In Section 3, we define theH∞ output regulation problem. In Section 4, we define the novel graphical games for linear heterogeneous agents. In Section 5, we present our results regardingH∞output regulation using graphical games and derive the resulting overallL2-gain of output synchronization error with respect to disturbances. In Section 6, we suggest a distributed online procedure for solving theH∞output regulation graphical game. We demonstrate the validity of theoretical developments with a simulation example in Section 7. We conclude the paper in Section 8.

2. PRELIMINARIES 2.1. Notation

The following notation will be used throughout this paper. Let Rn×m be the set of n × m real matrices.In denotes the identity matrix of dimensionn × nand 1N is anN column vector of 1. 0 denotes a matrix of zeros with compatible dimensions. The Kronecker product of two matricesA andBis denoted byA ⊗ B. The positive (semi) definiteness constraint on the matrixPis expressed asP > 0(P ≥ 0). LetAi∈Rni×mi _for_{i = 1, ..., N}_{, where}_N _{is a positive integer. The operator}

Diag 1:N {Ai}is defined as Diag 1:N {Ai} =      A1 0 . . . 0 0 A2 . . . 0 .. . 0 . .. ... 0 0 . . . AN      . (1)

The maximum singular value of a matrixAis denoted byσ(A)and its kernel is denoted byKer(A). An eigenvalue of a square matrixAis denoted byλi(A).

2.2. Lp-norm

Forp ∈ [1, +∞), letLp = Lnp[0, +∞)denote the space of functionsa(t) ∈Rnsuch thatt → |a(t)|p is integrable over[0, +∞), where|a(t)|is the instantaneous Euclidean norm of the vectora(t). The Lp-norm ofa(t) ∈ Lnp[0, ∞)is defined as

ka(t)kLp= Z ∞ 0 |a(τ )|p_dτ 1/p < +∞. 2.3. Graph theory

Suppose that the interaction among the followers is represented by an undirected graphG = (V, E) with a finite set ofN nodesV = {v1, ..., vN}and a set of undirected edgesE ⊆ V × V.E = [αij] is the adjacency matrix with αij = 1 if (vj, vi) ∈ E and αij= 0 otherwise. Since the graph is undirected, the adjacency matrix is symmetric. The graph is simple, i.e. αii= 0, i = 1, ..., N. A

(5)

pathfrom nodevito nodevjis a sequence of edges joiningvitovj. If there exists a path from node vito nodevj then it is said that nodevi is reachable fromvj. The set of neighbors of nodevi is Ni= {vj: (vj, vi) ∈ E }. For graphG, the in-degree matrixDis a diagonal matrixD = Diag

1:N {di} with di=P_j∈N_iαij. The laplacian matrix for graph G is defined as L = D − E. Suppose that matrix G = Diag

1:N

{gi} shows the pinning from the leader (v0) to the followers (v1, ..., vN). Then, gi = 1 if there is a link between the leader and follower i and gi = 0 otherwise. Denote the augmented graph by G = ( ¯¯ V , ¯E) which is obtained by attaching node v0 and its outgoing edges toG. GraphG¯shows the interaction among the followers and the leader.

A graphG = (V, E)is connected if there exists a path fromvitovjfor allvi, vj∈ V. If the initial and the terminal nodes of a path are the same, the path is called a cycle. A graph without any cycle is named an acyclic graph. A graph is a connected tree if every node except one node, called the root, has in-degree equal to one. The root node has its in-degree equal to zero. A graph has a spanning treeif there exists a tree containing every node inV.

3. H∞OUTPUT REGULATION PROBLEM

Consider a set ofN + 1heterogeneous agents withN followers given as LTI systems ˙

xi = Aixi+ Biui+ Piωi, (2)

yi= Cixi+ Eiui, (3)

zi= Dixi, (4)

and a leader given by

˙

ξ0= Xξ0, (5)

y0= R1ξ0, (6)

z0= R2ξ0, (7)

in whichxi ∈Rni_{, y}

i∈Rp,zi ∈Rqandui∈Rmi_{denote the state, the synchronization output, the}

measured output and the control signal for followeri = 1, ..., N. Theξ0∈Rl, y0∈Rpandz0∈Rq denote the state, the synchronization output and the measured output of the leader. All followers are subject to external unmodeled disturbancesωi∈ L2. The motivation behind introducing two outputs yi, ziis to achieve synchronization in outputsyi, by communicating the measured outputszi.

We suggest a distributed static output-feedback controller of the following form

ui= Kiezi, (8)

ezi = X

j∈Ni

αij(zi− zj) + gi(zi− z0), (9)

where ezi is the local neighborhood error in z-outputs and Ki∈Rmi×q_{. Define the} _y_-output

synchronization error and thez-output synchronization error as

δyi= yi− y0, (10) δzi= zi− z0. (11) Letx = [xT 1, ..., xTN] T_,_{y = [y}T 1, ..., yNT] T_,_{z = [z}T 1, ..., zNT] T_,_{ω = [ω}T 1, ..., ωTN] T_,_δ y = [δy1T, ..., δTyN] T andδz= [δz1T, ..., δzNT ]

T_{denote the overall vectors of}_x

i,yi,zi,ωi,δyiandδzirespectively. Then, the overall closed-loop system of all followers, their controllers and the leader is given by the following

˙ x = Aclx + Bclξ0+ Pclω, ˙ ξ0= Xξ0, δy = Cclx − Eclξ0, δz = Dclx −1N ⊗ R2ξ0, (12)

(6)

where Acl= Diag 1:N {Ai} + Diag 1:N {BiKi}((L + G) ⊗ Iq)Diag 1:N {Di}, Bcl= −Diag 1:N {BiKi}((L + G) ⊗ Iq)(1N ⊗ R2), Ccl= Diag 1:N {Ci} + Diag 1:N {EiKi}((L + G) ⊗ Iq)Diag 1:N {Di}, Ecl= Diag 1:N {EiKi}((L + G) ⊗ Iq)(1N ⊗ R2) +1N ⊗ R1, Dcl= Diag 1:N {Di}, Pcl= Diag 1:N {Pi}.

Now, we define theH∞output regulation problem.

Problem 1(H∞output regulation problem for linear heterogeneous multi-agent systems)

Consider a group of N + 1 heterogeneous LTI systems defined by (2-7). Design the feedback gains Ki such that for ∀xi(0), i = 1, ..., N, the closed-loop system of (2)-(7) using (8) achieves the following properties:

a. Forω ≡0 andξ0≡0, the origin of the systemx = A˙ clxis asymptotically stable. b. Forω ≡0, we haveδy→0 andδz→0 ast → +∞for all initial conditionsξ0(0). c. Forω ∈ L2andT > 0 L2_δ_y_ω= RT 0 kδyk 2 2dτ RT 0 kωk 2 2dτ < +∞. (13)

Properties a.-b. define the output regulation problem [29, 30] in absence of unmodeled disturbance; i.e. ω ≡0. Property c. concernsH∞ control in presence of unmodeled disturbances [31]. We make the following assumptions throughout this paper.

Assumption 1

The augmented graphG = ( ¯¯ V , ¯E)contains a spanning tree with the leader as its root node. Assumption 2

The leader’s dynamicsXin (5) does not have any strictly stable pole. Assumption 3

The triple(Ai, Bi, Di)is output-feedback stabilizable.

According to Lemma 1.4 of [30], the closed-loop system (12) achieves y- and z-outputs synchronization to the outputs of the leader for ω ≡0 only if there exists an invariant subspace for the closed-loop system (12), whereyi= y0andzi= z0fori = 1, ..., N

Acl Bcl 0 X Π Il = Π Il X, Ccl 0 Π Il = Ecl, Dcl 0 Π Il =1N⊗ R2, (14)

whereΠ ∈_RPNi=1ni×l_{. Using the last equation in the first and second equations, (14) is simplified}

to Diag 1:N {Ai}Π = ΠX, Diag 1:N {Ci}Π =1N ⊗ R1, Diag 1:N {Di}Π =1N⊗ R2.

(7)

LetΠ = ΠT

1 ΠT2 ... ΠTN T

, whereΠi ∈Rni×l_{has full column rank. Then, the aforementioned}

necessary condition for y- and z-outputs synchronization is summarized in the following assumption.

Assumption 4

For eachi = 1, ..., N, there exists a matrixΠi∈Rni×l_{of full column rank such that}

AiΠi= ΠiX, CiΠi= R1, DiΠi= R2.

(15)

Remark 1

We make Assumption 2 without loss of generality (remark 1.3 in [30]). A leader not satisfying Assumption 2 is asymptotically identical to a simpler leader, one of a smaller order, that satisfies it. As we are mainly interested in the long-term behavior, we can restrict our attention to leaders satisfying Assumption 2. Any difference would only be in the transient, with no long-term effects. In fact, if the output regulation problem is solvable by any controller under Assumption 2, then it is also solvable by the same controller even if this Assumption is violated [30].

3.1. Coordinate Transformations

In this subsection, we introduce a coordinate transformation which is useful in formulating the output regulation problem and the graphical game in Section 4. Building on Assumption 4, supplement the columns ofΠiin (15) by a set of linearly independent columns ofΨi∈Rni×(ni−l)

to form a complete basisTi= [ΠiΨi] ∈Rni×ni _{of the single-agent state space R}ni_{. Then in such}

basis, one has the transformed state[ξT i , νiT]T xi = Πi Ψi ξi νi = Ti ξi νi . (16)

Define the following matrices ˆ Ai:= Ti−1AiTi= X Fi 0 Mi , Bˆi:= Ti−1Bi= _ˆ B1i ˆ B2i , Pˆi:= Ti−1Pi= _ˆ P1i ˆ P2i . (17)

Define the local neighborhood error inξis

eξi= N X j=1 αij(ξi− ξj) + gi(ξi− ξ0). (18) Let i= [eTξi, ν T

i ]T. Then, the dynamics of system (2) and the control (8) in the transformed coordinatesiread ˙i= ¯Aii+ ¯Biui+ ¯Piωi− N X j=1 αij(Bjuj+ Pjωj+ Fjνj), (19) ui=KiR2eξi+ (di+ gi)KiDiΨiνi− Ki N X j=1 αijDjΨjνj, (20) where ¯ Ai := X (di+ gi)Fi 0 Mi , B¯i := (di+ gi) ˆB1i ˆ B2i , P¯i:= (di+ gi) ˆP1i ˆ P2i , B_j := _ˆ B1j 0 , P_j:= _ˆ P1j 0 , F_j := Fj 0 . (21)

(8)

Equation (19) describes agent i dynamics in the new T

i = [eTξi, ν

T

i ]-coordinates. The following technical lemma can be used to simplify the dynamics in (20).

Lemma 1

One can always selectΨiin (16) such thatFi=0 in (17). Proof

From the definition ofAˆiin (17)

Ai Πi Ψi =Πi Ψi X F 0 Mi .

Hence, one hasAiΨi− ΨiMi= ΠiFi, which is a Sylvester equation. Using the Kronecker product’s property, this is equivalent to(In˜i⊗ Ai− M

T

i ⊗ Ini)vec(Ψi) = vec(ΠiFi), wheren˜i= ni− l. If

one wantsFi=0, then one should select

vec(Ψi) ∈ null(In˜i⊗ Ai− M

T

i ⊗ Ini). (22)

Let λk ∈ spec(Ai), λ˜k ∈ spec(Mi). Then, (λk− ˜λk) ∈ spec(In˜i⊗ Ai− M

T

i ⊗ Ini). Since

spec(Mi) ⊂ spec(Ai), we have0 ∈ spec(I˜ni⊗ Ai− M

T

i ⊗ Ini)and it is always possible to satisfy

(22). For a more general result, please see Theorem 4.4.14 of [32].

Using Lemma 1, we hereafter assume that we have selectedΨisuch thatFi=0 fori = 1, ..., N.

3.2. Closed-loop system and node error dynamics for graphical game

In this subsection, we define the node error dynamics which are suitable for graphical games formulation for heterogeneous agents fully developed in Section 4. To do so, we express the closed-loop system of (19) using the control (20)

˙eξi =Xeξi+ (di+ gi) ˜B1iKiR2eξi+ (di+ gi) 2_B_˜ 1iKiDiΨiνi+ (di+ gi) ˜P1iωi − (di+ gi) ˜B1iKi N X j=1 αijDjΨjνj− N X j=1 αijB˜1jKjR2eξj − N X j=1 αij(dj+ gj) ˜B1jKjDjΨjνj+ N X j=1 αijB˜1jKj N X l=1 αjlDlΨlνl− N X j=1 αijP˜1jωj, ˙ νi= ˜Aiνi+ ˜B2iKiR2eξi+ (di+ gi) ˜B2iKiDiΨiνi+ ˜P2iωi− ˜B2iKi N X j=1 αijDjΨjνj (23) Define βi:= −Ki N X j=1 αijDjΨjνj+ (di+ gi)(KiDiΨi− ¯Ki)νi, β = βT 1, ..., βNT T , (24) uopi:= KiR2eξi+ (di+ gi) ¯Kiνi, (25) so thatui= uopi+ βi. In (24)-(25),K¯i∈R

mi×(ni−l)_{is a gain matrix which has no effect on the}

design of the controller but we introduce it to facilitate the following theoretical developments of the optimal design. The signalβiin (24) contains the term−(di+ gi) ¯Kiνianduopiin (25) contains

(di+ gi) ¯Kiνi. Clearly, this term has no effect on the design of the controller becauseuopi andβi

always appear added together.

By definitions in (24)-(25), the closed-loop system (23) can be represented as ˙i= ¯Aii+ ¯Biuopi+ ¯Piωi+ ¯Biβi−

N X

j=1

(9)

We call (26) the node error dynamics for graphical games for heterogeneous agents. It is important to point out that (26) is in a standard form of the dynamics usually considered in the graphical game framework [21, 20], i.e. it depends on the states, policies and disturbances of other agents in the neighborhood. Note however that while the agents in [21, 20] are homogeneous, the dynamics in (26) are heterogeneous.

Remark 2

Note that νis and the related βis stem from agents heterogeneity and they pose additional complications for control design that do not arise in identical agents. In this more general setup with heterogeneous agents, we develop the graphical game in Section 4 in the sequel, whereas the existing results along similar lines consider only identical agents. For a special case of identical agents, our formulation indeed reduces to the familiar one in [21, 20].

In Section 4, we use (26) to develop the graphical game and devise the appropriate controlsuopi.

In Section 5 then, we bring additional conditions guaranteeing that the results of Section 4 solve the original Problem 1.

4. GRAPHICAL GAME FOR LINEAR HETEROGENEOUS AGENTS

In this section, we use the machinery of graphical games and define an H∞ graphical game for linear heterogeneous agents. Using this graphical game framework, we ultimately solve our H∞ output regulation problem of Section 3.

4.1. H∞graphical game formulation

As it can be seen from (26), the dynamics of each agent depends on the neighboring agents. In a similar way, we define the L2-condition for agent i such that it contains the policies and disturbances of other agents in its neighborhood. Letuop−i = {uopj, j ∈ Ni},ω−i= {ωj, j ∈ Ni}

andβ−i= {βj, j ∈ Ni}. Define theL2-condition for the dynamics (26) as

Z T 0 {T iQii+ uTopiRiiuopi+ N X j=1 αijuTopjRijuopj}dτ ≤ γξν2i Z T 0 {ωT i S1iiωi+ N X j=1 αijωjTS1ijωj}dτ + γβ2i Z T 0 {βT i S2iiβi+ N X j=1 αijβjTS2ijβj}dτ, (27)

where Qi > 0, Rii> 0, Rij ≥ 0, S1ii> 0, S1ij≥ 0, S2ii> 0, S2ij ≥ 0, γξνi> 0, γβi > 0 and

T > 0. The L2-condition in (27) is equivalent to the optimization of the following quadratic performance index

Ji(i(0), uopi, uop−i, ωi, ω−i, βi, β−i) =

Z +∞

0

Li(i, uopi, uop−i, ωi, ω−i, βi, β−i)dτ

= Z +∞ 0 {T iQii+ uTopiRiiuopi+ N X j=1 αijuTopjRijuopj− γ 2 ξνi[ω T i S1iiωi + N X j=1 αijωTjS1ijωj] − γβ2i[β T i S2iiβi+ N X j=1 αijβjTS2ijβj]}dτ. (28)

In (28), we have two players: the control playeruopiwhich tries to minimize the performance index

and the disturbance player{ωi, βi}which tries to maximize it. Remark 3

Note that in (27), theβis are considered as arbitraryL2disturbances, in contrast to theβis defined in (24) in Section 3, specified there as part of the feedback (8). This is because the present section

(10)

primarily seeks to designuopiwith good disturbance suppression properties. Later in Section 5, the

actualβisignals in (24) are accounted for by the small-gain theorem. The optimal value of this zero-sum graphical game is defined as

V_i∗(i(0)) = min u_opi maxωi,βi Ji(i(0), uopi, u ∗ op−i, ωi, ω ∗ −i, βi, β∗−i) = max ωi,βi min u_opiJi(i(0), uopi, u ∗ op−i, ωi, ω ∗ −i, βi, β∗−i) (29)

where,u∗_op_−i and{ω∗

−i, β−i∗ }are the optimal control and disturbance policies of the players in the neighborhood of playeri. For the fixed control and disturbance policiesuopiandωi, βi, the quadratic

value function is defined as

Vi(i(t), uopi, uop−i, ωi, ω−i, βi, β−i) =

Z +∞ t {TiQii+ uTopiRiiuopi+ N X j=1 αijuTopjRijuopj − γ2 ξνi[ω T i S1iiωi+ N X j=1 αijωjTS1ijωj] − γ2βi[β T i S2iiβi+ N X j=1 αijβjTS2ijβj]}dτ. (30)

We make the following assumption regarding the performance index. Assumption 5

The performance index (28) is zero-state observable.

Assumption 5 guarantees that no solution can identically stay in zero cost other than the zero solution [33]. It is a necessary assumption to prove stability of disturbance-free system under optimal control [31]. Now, we are ready to define theH∞graphical game for linear heterogeneous multi-agent systems.

Problem 2(H∞graphical game for linear heterogeneous multi-agent systems)

Consider a group of N + 1 heterogeneous LTI systems defined by (2-7). Design uopi to solve

optimization problem (29) with respect to dynamics (26). Remark 4

The graphical game in this paper is a2N-player game in the sense that we haveN followers each having a disturbance player(ωi, βi)and a control playerui. It is also possible to call this game, a 3N-player game if we assume for each follower two disturbance playersωiandβiseparately, and a control playerui.

In the sequel we solve theH∞graphical game problem.

4.2. Solution of theH∞graphical game

When the value function (30) is finite, using the Leibniz’s formula, a differential equivalent to the value function is given in terms of the Hamiltonian

Hi= Li(i, uopi, uop−i, ωi, ω−i, βi, β−i) + ∇V

T

i ˙i= 0 (31)

where∇Vi= ∂V_∂_ii. At the equilibrium, one has the stationarity conditions

∂Hi ∂uopi =0, → u∗_op_i= −1 2R −1 ii B¯ T i ∇Vi, (32) ∂Hi ∂ωi =0, → ω∗_i = 1 2γ −2 ξνiS −1 1iiP¯ T i ∇Vi, (33) ∂Hi ∂βi =0, → β_i∗=1 2γ −2 βi S −1 2iiB¯ T i ∇Vi. (34)

(11)

Substituting the optimal policy (32) and the worst-case disturbances (33)-(34) into (31) yields coupled Hamilton-Jacobi-Bellman (HJB) equations

T_i Qii+ 1 4∇V T i B¯iRii−1B¯ T i ∇Vi+ 1 4 N X j=1 αij∇VjTB¯jR−1jj RijR−1jj B¯ T j∇Vj − γ2 ξνi 1 4[γ −4 ξνi∇V T i P¯iS1ii−1P¯ T i ∇Vi+ N X j=1 αijγξν−4j∇V T j P¯jS1jj−1S1ijS1jj−1P¯ T j ∇Vj] − γ2 βi 1 4[γ −4 βi ∇V T i B¯iS2ii−1B¯ T i ∇Vi+ N X j=1 αijγ_β−4_j ∇VjTB¯jS−12jjS2ijS2jj−1B¯ T j ∇Vj] + ∇V_iT[ ¯Aii− 1 2 ¯ BiR−1ii B¯ T i ∇Vi+ 1 2γ −2 ξνi ¯ PiS−11iiP¯ T i ∇Vi+ 1 2γ −2 βi ¯ BiS2ii−1B¯ T i ∇Vi] − ∇ViT[ N X j=1 αij{− 1 2BjR −1 jj B¯ T j∇Vj+ 1 2γ −2 ξνjPjS −1 1jjP¯ T j ∇Vj+ 1 2γ −2 βj BjS −1 2jjB¯ T j∇Vj}] = 0. (35)

Based on (35), the HJB equation of playeridepends on the HJB equations of other players in its neighborhood. It is in general impossible to solve the coupled HJB equations (35) analytically [31]. Later in Section 6, we use RL techniques and develop a numerical procedure to bring the solutions in real time. For further developments in this section however, we assume the solutions are available. LetV_i∗be the quadratic optimal solution to (35) andu∗_op_i(V_i∗), ω_i∗(V_i∗)andβ_i∗(V_i∗)in (32)-(33) be the optimal policy and the worst-case disturbances in terms ofV_i∗. In the next theorem, we prove that suchV_i∗satisfies theL2-condition (27).

Theorem 1

SupposeV_i∗is a quadratic positive semi-definite solution to (35) fori = 1, ..., N. Let Assumption 5 hold. Using the optimal policyu∗opi(V

∗

i )in (32),

1. The disturbance-free system (26) (ωi≡0, βi≡0, i = 1, ..., N) is asymptotically stable. 2. For all disturbancesωi, ω−i, βi,β−i∈ L2, theL2-condition (27) is satisfied.

Proof

The proof contains two parts. In the first part, we prove the stability of the disturbance-free system (26) and in the second part, we prove theL2-condition (27).

1. For any smooth value functionVi, the Hamiltonian is defined as

Hi(i, ∇Vi, uopi, uop−i, ωi, ω−i, βi, β−i) =

TiQii+ uTopiRiiuopi+ N X j=1 αijuTopjRijuopj− γ 2 ξνi[ω T i S1iiωi + N X j=1 αijωjTS1ijωj] − γβ2i[β T i S2iiβi+ N X j=1 αijβjTS2ijβj] + dVi dt . (36)

LetV_i∗be a smooth positive semi-definite solution to (35). Completing the squares leads to

Hi(i, ∇Vi∗, uopi, u ∗ op−i, ωi, ω ∗ −i, βi, β−i∗ ) = Hi(i, ∇Vi∗, u∗opi, u ∗ op−i, ω ∗ i, ω−i∗ , βi∗, β−i∗ ) + (uopi− u ∗ opi) T_R ii(uopi− u ∗ opi) − γ2 ξνi(ωi− ω ∗ i) T_S 1ii(ωi− ω∗i) − γ 2 βi(βi− β ∗ i) T_S 2ii(βi− βi∗).

(12)

Selectinguopi = u ∗ opi(V ∗ i ), one has T_i Qii+ u∗TopiRiiu ∗ opi+ N X j=1 αiju∗TopjRiju ∗ opj − γ 2 ξνi[ω T i S1iiωi + N X j=1 αijωTjS1ijωj] − γβ2i[β T i S2iiβi+ N X j=1 αijβjTS2ijβj] + dV_i∗ dt ≤ 0. (37)

Setωi ≡0, ω−i≡0, βi=0, β−i≡0. According to (37)

dV_i∗ dt ≤ −( T i Qii+ u∗TopiRiiu ∗ opi+ N X j=1 αiju∗TopjRiju ∗ opj) ≤ 0.

Because of Assumption 5, one can conclude from the above inequality that the disturbance-free system (26) is asymptotically stable.

2. Integrate (37) V_i∗(i(T )) − Vi∗(i(0)) + Z T 0 {T i Qii+ u∗TopiRiiu ∗ opi+ N X j=1 αiju∗TopjRiju ∗ opj − γ 2 ξνi[ω T i S1iiωi + N X j=1 αijωjTS1ijωj] − γ2βi[β T i S2iiβi+ N X j=1 αijβjTS2ijβj]}dτ ≤ 0.

Selecti(0) =0. SinceVi∗(i(0)) = 0andVi∗(i(T )) > 0, (27) is satisfied.

Remark 5

Assuming a quadratic structure for the value functionV_i∗= 0.5T_iP_igi, the control signal (32) reads

uopi= −R −1 ii B¯ T i P g ii= −Rii−1B¯ T i P g i eξi νi = Ki1eξi+ Ki2νi.

The above equation is useful in finding the controller gainKi. Recall the definition of policyuopi

in (25)

uopi = KiR2eξi+ (di+ gi) ¯Kiνi.

In order to calculateKifromKi1= KiR2, the following standard condition is required

Rank(R2) = Rank R2 Ki1 . (38)

One can satisfy (38) by e.g. selecting any invertibleR2 similar to [2, 3]. Then,Ki= Ki1R−12 and ¯

Ki= (di+g1 i)Ki2. For convenience we assume,

Assumption 6 R2is invertible.

4.3. Nash equilibrium solution

In anH∞graphical game, we are interested in convergence to a Nash equilibrium (NE). The game has a well-defined NE if Ji(u∗opi, u ∗ op−i, ωi, ω ∗ −i, βi, β∗−i) ≤ Ji(u∗opi, u ∗ op−i, ω ∗ i, ω ∗ −i, βi∗, β ∗ −i) ≤ Ji(uopi, u ∗ op−i, ω ∗ i, ω ∗ −i, βi∗, β ∗ −i). (39) The following theorem shows that a positive solution to (35) also guarantees convergence to an NE.

(13)

Theorem 2

LetV_i∗be a quadratic positive semi-definite solution to (35) fori = 1, ..., N such that the closed-loop system of ˙i= ¯Aii+ ¯Biu∗opi+ ¯Piω ∗ i + ¯Biβ∗i − N X j=1 αij(Bju∗opj + Pjω ∗ opj + Bjβ ∗ j) (40)

is asymptotically stable. Let Assumption 5 hold. Then,

1. The control signal and disturbances (32)-(34) form an NE. 2. The optimal value of the game is thenV_i∗(i(0)).

3. The optimal control (32) solves Problem 2. Proof

1. First, we show that (32)-(34) form an NE. Rewriting (28) and adding zero Ji= Z +∞ 0 Lidτ + Z +∞ 0 ˙ Vidτ + Vi(i(0)) − Vi(i(+∞)) = Z +∞ 0

Hi(i, ∇Vi, uopi, uop−i, ωi, ω−i, βi, β−i)dτ + Vi(i(0)) − Vi(i(+∞)).

LetV_i∗be a smooth positive semi-definite solution to (35). By completing the squares one has

Ji =Vi(i(0)) − Vi(i(+∞)) + Z +∞ 0 {Hi(i, ∇Vi∗, u ∗ opi, u ∗ op−i, ω ∗ i, ω ∗ −i, βi∗, β ∗ −i) + (uopi− u ∗ opi) T_R ii(uopi− u ∗ opi) + N X j=1 αij(uopj − u ∗ opj) T_R ij(uopj − u ∗ opj) + 2 N X j=1 αiju∗TopjRij(uopj − u ∗ opj) − N X j=1 αij∇ViTBj(uopj− u ∗ opj) − γ2 ξνi(ωi− ω ∗ i)TS1ii(ωi− ω∗i) − γξν2i N X j=1 αij(ωj− ω∗j)TS1ij(ωj− ω∗j) − 2γ2 ξνi N X j=1 αijω∗Tj S1ij(ωj− ωj∗) − N X j=1 αij∇ViTPj(ωj− ωj∗) − γβ2i(βi− β ∗ i) T S2ii(βi− β∗i) − γ 2 βi N X j=1 αij(βj− βj∗) T S2ij(βj− βj∗) − γ2 βi N X j=1 αijβj∗TS2ij(βj− βj∗) − N X j=1 αij∇ViTBj(βj− β∗j)}dτ.

Since (40) is asymptotically stable, V_i∗(i(+∞)) = 0. Set uopj = u

∗

opj, ωj= ω

∗ j, βj= β_j∗, ∀j ∈ Niin the above equation. Then (39) follows easily.

2. Next, we obtain the optimal value of the game. Set uopi = u

∗

opi, ωi= ω

∗

i, βi= β∗i; then, J_i∗= V_i∗(i(0)).

(14)

Figure 1. The interconnected system

Remark 6

Theorem 2 shows that the controlsu∗

opi in (32) lead to the desired NE satisfying theL2-condition

(27) and thus solve Problem 2. Note however that Problem 2, and indeed all the developments of this section, consideruopiandβias independent signals, whereas those are in fact inseparably linked in

the actual control signal (20). This fact requires additional conditions, elaborated further in Section 5, guaranteeing that the control design proposed here solves the original Problem 1 of Section 3.

5. H∞OUTPUT REGULATION USING GRAPHICAL GAME

In Section 4, we shown that V_i∗ in (35) solves Problem 2, (see Theorem 2). Building on developments of Section 4, in this section, we give conditions guaranteeing that the graphical game solution V_i∗ and the control designed from it can indeed be used to solve the H∞ output regulation problem (Problem 1). Moreover, we give an upper bound for the overallL2-gain of output synchronization error with respect to disturbances; i.e.Lδyω.

Theorem 3

Let H1 and H2 be two subsystems, whose inputs are θ1, ω and θ2, ω and outputs are y1, y2, respectively. Letω1, ω2∈ L2, and assume thatH1andH2are interconnected as depicted in Fig. 1 and ky1kL2 ≤ L11kωkL2+ L12kθ1kL2, ky2kL2 ≤ L22kωkL2+ L21kθ2kL2. (41) IfL12L21< 1, theny1∈ L2and ky1kL2 ≤ (1 − L12L21) −1_(L 11+ L12L22)kωkL2. (42) Proof According to (41) ky1kL2 ≤ L11kωkL2+ L12kθ1kL2 ≤ L11kωkL2+ L12ky2kL2 ≤ L11kωkL2+ L12(L22kωkL2+ L21ky1kL2). Hence (1 − L12L21)ky1kL2 ≤ (L11+ L12L22)kωkL2.

(15)

Figure 2. The representation of the nominal and the interconnected systems

The overall system of (24)-(26) can be represented as in Fig. 2. There, we have two interconnected subsystems. We name the upper block subsystem the nominal system, whose description is given in (25)-(26) and we call the lower block subsystem the interconnected system whose description is given in (24). This representation is useful as we can use Theorem 3 to proveL2-stability. The following theorem presents one of the main results of this paper; it specifies the set of conditions such thatV_i∗in (35) and the control based on it solve theH∞output regulation problem (Problem 1). Define Smi:= max i,j∈Ni {σ(pαijS1ij), σ( p S1ii)}, βmi:= max i,j∈Ni {σ(pαijS2ij), σ( p S2ii)}, Ly= 1 σ(Tg) h max

i σ(C¯ iTi) + maxi σ(E¯ iKi)¯σ(L + G) maxi σ(D¯ iTi) i , L21= max i σ(K¯ i)¯σ(L + G) maxi σ(D¯ iΨi) + maxi σ((d¯ i+ gi) ¯Ki) L11= v u u t N X i=1 σ(pQi)−2γξν2iS 2 mi, L12= v u u t N X i=1 σ(pQi)−2γβ2iβ 2 mi, Tg= _{(L + G) ⊗ I} l 0 0 IPN i=1ni−N l . (43) Theorem 4

Let Assumptions 1-5 hold. LetV_i∗be a quadratic positive semi-definite solution to (35) andu∗opi(V

∗ i ) in (32) be the optimal policy. Assume that the controller gainKi is obtained according to Remark 5. If either (i)σ(K¯ i)¯σ(DiΨi)and ¯σ( ¯Ki)are sufficiently small or (ii)γβi is sufficiently small for

i = 1, ..., N, then,

1. An upper bound for theL2-gain of output synchronization errorδywith respect toωis given by

Lδyω= Ly(1 − L12L21)

−1_L

11. (44)

2. Distributed control (8) solves theH∞output regulation problem (Problem 1).

(16)

1. Consider subsystemH1from Theorem 3 as the overall system of (26). This subsystem has two inputsωandβ, whereωis the disturbance input andβ is related to the subsystemH2, and one output. First note that we always have

Z T 0 σ(pQi)2kik22dτ ≤ Z T 0 T_iQii+ uTopiRiiuopi+ N X j=1 αijuTopjRijuopjdτ.

Using the above inequality in accordance with (27), Z T 0 kk2 2dτ = Z T 0 N X i=1 kik22dτ ≤ N X i=1 σ(pQi)−2γξν2iS 2 mi Z T 0 kωk2 2dτ + N X i=1 σ(pQi)−2γβ2iβ 2 mi Z T 0 kβk2 2dτ.

Hence, theL2-gain ofwith respect toωandβreads

kkL2≤ v u u t N X i=1 σ(pQi)−2γξν2iS 2 mikωkL2+ v u u t N X i=1 σ(pQi)−2γβ2iβ 2 mikβkL2. (45)

Next, we consider the subsystem H2 from Theorem 3 as (24) with input and output β. According to the definition ofβin (24), theL2-gain ofβ with respect toreads

kβkL2≤ (max i σ((d¯ i+ gi) ¯Ki) + maxi σ(K¯ i)¯σ(L + G) maxi σ(D¯ iΨi))kkL2. (46) By substituting (46) in (45) kkL2 ≤ (1 − L12L21) −1_L 11kωkL2. (47)

By Theorem 3, ∈ L2ifL12L21< 1which is satisfied by either of the conditions (i)-(ii) in the body of the theorem. Letx¯i= [δξTi, ν

T

i ] whereδξi= ξi− ξ0. Letandx¯ be the overall

vectors ofiandx¯irespectively. Then,

= Tgx,¯ (48)

whereTgis given in (43). The output synchronization errorδyiin the coordinatex¯ireads

δyi = Cixi+ Eiui− R1ξ0= CiTix¯i+ EiKi " N X i=1 αij(DiTix¯i− DjTjx¯j) + giDiTix¯i # .

Using (47)-(48) and the above equation, theL2-gain of the output synchronization errorδy reads

kδykL2≤ Ly(1 − L12L21)

−1_L

11kωkL2, (49)

which gives the upper bound in (44).

2. To prove that (8) solves theH∞ output regulation problem, we need to show that the three properties in Problem 1 hold.

a. By (47)-(48),x ∈ L¯ 2. Because of the linearity of the system and by settingω ≡0, one can conclude thatx →¯ 0. Using the definition of x¯T

i = [ξi− ξ0, νi], by settingξ0≡0, we havexi→0.

b. By (49), δy∈ L2. By linearity of the system and ω ≡0, we conclude that δy→0. Similarly, we can show thatδz→0.

(17)

c. TheL2-gain ofδy with respect toωis given in (49) and its finiteness is guaranteed by the conditions (i) or (ii) in the body of the theorem.

Remark 7

In Section 4, the designed controluopi and the signalβi were considered separate while they are

linked in the actual control signal (20) owing to heterogeneity of the agents. Theorem 4 reconciles development of graphical games in Section 4 with the original problem and takes the linkage of uopi andβiin the actual control signal (20) into consideration. This requires additional conditions

(given in the body of Theorem 4) that do not appear for homogeneous agents. Note that conclusions of Theorem 4 indeed reduce for identical agents to the cases familiar from the literature [20, 21]. Note also that here the graph is not required to be acyclic [34].

6. ONLINE SOLUTION TOH∞GRAPHICAL GAME

As we dicussed in Section 4, one needs to obtain solutions to the coupled partial differential HJB equations (35) to solve the graphical game problem. It is in general impossible to solve these equations analytically. However, RL has shown promising results in solving such complicated coupled equations numerically and is the only viable method applicable in real time. In this section, we propose a numerical RL procedure to design the controller gainKiand to obtain solutions to the coupled HJB equations (35) in real time. Note that because of the heterogeneity of the agents in our paper, the RL frameworks in [20, 21] cannot be used.

Our online learning structure uses four adaptive networks. The first network approximates the value function and is named the critic network. The second one approximates the control policyuopi

in (32) and is named the actor network. The third and fourth networks approximate the disturbances ωiandβiin (33)-(34).

Assume that the value functionVi(i(t))is smooth. Then, according to Weierstrass higher-order approximation theorem, one can approximateVi(i(t))by

Vi(i) = WiTΦi(i) + εi,

in which,Φi is a basis function vector with µni neurons,Wi is the optimal weight andεi is the

approximation error. The weights of the critic network, which provide the best approximation to (35), are unknown. LetWˆidenote the current estimate of the critic weights. Then, the approximated value function is given by

ˆ

Vi(i) = ˆWiTΦi(i), (50)

and the approximated error of the Bellman equation is

Hi= Li+ ˆWiT∇Φi˙i= εHi. (51)

Next, we define an actor network to approximate the optimal policy (32) ˆ uopi = − 1 2R −1 ii B¯ T i ∇ΦTi Wˆi+N, (52)

whereWˆi+Ndenotes the current estimate of the actor weights. Also, we use two additional networks to approximate the worst-case disturbances (33)-(34)

ˆ ωi= 1 2γ −2 ξνiS −1 1iiP¯ T i ∇ΦTi Wˆi+2N, (53) ˆ βi= 1 2γ −2 βi S −1 2iiB¯ T i ∇ΦTi Wˆi+3N, (54)

(18)

whereWˆi+2N and Wˆi+3N denote the current estimates of the weights of disturbancesωi and βi respectively.

The following theorem presents another main result of this paper; it gives the tuning laws for the adaptive network weights such that the HJB equation (35) is solved numerically, and closed-loop system of (26), the weight estimation errorsW˜i= Wi− ˆWi, ˜Wi+N = Wi− ˆWi+N, ˜Wi+2N= Wi− ˆWi+2N, ˜Wi+3N= Wi− ˆWi+3Nare locally Uniformly Ultimately Bounded (UUB) [20]. Theorem 5

Consider Problem 2 and let the conditions in Theorem 4 hold. Assume that the value function, the policy and the disturbances are estimated by (50) and (52)-(54) respectively. Assume that

σi+N = ∇Φi{ ¯Aii+ ¯Biuôpi+ ¯Piωî+ ¯Biβî−

N X

j=1

αij(Bjuˆopj+ Pjωˆj+ Bjβˆj)}, (55)

is Persistently Exciting (PE). Tune the weights of the critic network as ˙ˆ Wi= −ai σi+N (1 + σT i+Nσi+N)2 εop_H i (56) whereεop_H

iis obtained by inserting (52)-(54) in (51). Tune the weights of the actor and disturbance

networks as ˙ˆ

Wi+N= − ai+N{( ¯S1iWî+N − ¯T1iWî) − 1 4 ¯ DiWî+N ¯ σT_i+N msi ˆ Wi −1 4 N X j=1 αji∇ΦiB¯iR−1ii RjiR−1ii B¯ T i ∇Φ T iWî+N ¯ σ_j+NT msj ˆ Wj}, (57) ˙ˆ

Wi+2N= − ai+2N{( ¯S2iWî+2N− ¯T2iWî) + 1 4γ −2 ξνi ¯ E2iWî+2N ¯ σT i+N msi ˆ Wi +1 4γ −2 ξνi N X j=1

αji∇ΦiP¯iS1ii−1S1jiS1ii−1P¯ T i ∇Φ T iWˆi+2N ¯ σT j+N msj ˆ Wj}, (58) ˙ˆ

Wi+3N= − ai+3N{( ¯S3iWî+3N− ¯T3iWî) + 1 4γ −2 βi ¯ E3iWî+3N ¯ σT i+N msi ˆ Wi +1 4γ −2 βi N X j=1

αji∇ΦiB¯iS2ii−1S2jiS2ii−1B¯ T i ∇Φ T i Wˆi+3N ¯ σT j+N msj ˆ Wj}, (59) where, ¯ σi+N = σi+N 1 + σT i+Nσi+N , msi= 1 + σ T i+Nσi+N, ¯ Di= ∇ΦiB¯iR−1ii B¯ T i ∇Φ T i, E¯2i = ∇ΦiP¯iS1ii−1P¯ T i ∇Φ T i, E¯3i= ∇ΦiB¯iS−12iiB¯ T i ∇Φ T i , (60)

and ai > 0, ai+N > 0, ai+2N> 0, ¯T1i> 0, ¯S1i> 0, ¯T2i> 0, ¯S2i> 0, ¯T3i> 0, ¯S3i> 0, i = 1, ..., N are the tuning parameters. Then,

1. The closed-loop system of (26), the weight estimation errorsW˜i, ˜Wi+N, ˜Wi+2NandW˜i+3N are locally UUB.

2. εHi is UUB andWˆi converges to the approximated solution of the HJB equation (35) for

i = 1, ..., N.

3. uˆopi, ˆωi, ˆβiconverge to the approximated NE.

Proof

(19)

Leader 2 f 1 f 3 f f₄ 5 f

Figure 3. The communication graph

Remark 8

The tuning laws (56)-(59) in Theorem 5 are fully distributed and they depend only on local information available to each single-agent. Hence those are indeed applicable on undirected graphs, satisfying Assumption 1. The tuning laws (56)-(57) bring solutions to coupled HJB equations (35) and the applicable controls (20) in real time for heterogeneous agents. Additional adaptive networks (54) are used to estimate the βi signals stemming from agent heterogeneity. These networks are absent for homogeneous agents, see [20, 21].

7. SIMULATION RESULTS

Consider a group of five followers and one leader communicating with each other according to the graph shown in Fig. 3. The edge weights in the communication graph are all set to one. Consider the followers’ dynamics as

                   ˙ xi=      0 1 0 0 −1 0 0 0 0 0 −7 2 0 0 0 −4      xi+      0 1 −0.2 0      ui+      0 1 0 0      ωi yi= h 1 1 1 1 i xi, zi= " 1 0 0 0 0 1 0.005 0 # xi, for i = 1, 3, 5,                ˙ xi=    0 1 0 −1 0 0 0 0 −2   xi+    0.1 1 0.2   ui+    0 1 1   ωi, yi= h 1 1 −1ixi, zi= " 1 0 0.001 0 1 −0.003 # xi, for i = 2, 4.

Consider the leader’s dynamics as ˙ ξ0= 0 1 −1 0 ξ0, y0= 1 1ξ0, z0= 1 0 0 1 ξ0.

According to the followers’ and the leader’s dynamics one can find that

Πi=    1 0 0 1 0 0 0 0   , Ψi=    0 0 0 0 1 0 0 1   , fori = 1, 3, 5, and Πi=   1 0 0 1 0 0  , Ψi=   0 0 1  , fori = 2, 4.

(20)

0 50 100 150 200 250 300 −1.5 −1 −0.5 0 0.5 1 1.5 t ω i ω 1 ω 2 ω 3 ω 4 ω 5

(a) The worst-case disturbancesωi, i = 1, ..., 5

0 50 100 150 200 250 300 −10 −8 −6 −4 −2 0 2 4 6 8 10 t δ yi δ y1 δ y2 δ y3 δ y4 δ y5

(b)y-output synchronization errorδyi, i = 1, ..., 5 Figure 4.H∞output regulation results

We use four adaptive networks as detailed in Section 6 to solve the differential graphical game forH∞output regulation in real time. The weights in theL2-condition (27) are selected as

Qi= 10I, S1ii = I, S2ii = I, Rii= 0.5, S1ij = 0.05I, S2ij = 0.05I Rij = 0.05, γξνi= 1.5, γβi= 0.8, i = 1, ..., 5, j ∈ Ni.

We select the tuning parameters as

ai= 0.5, ai+N = ai+2N= ai+3N = 0.05, ¯S1i = ¯S2i = ¯S3i= I, ¯T1i= ¯T2i= ¯T3i= I, i = 1, ..., 5. The graphical game is implemented according to Theorem 5 and the gainKiis obtained as shown in Remark 5. The worst-case disturbances in (53) are applied to the agents and are shown in Fig. 4a. They-output synchronization errors for followers 1-5 are shown in Fig. 4b. Our simulation example illustrates feasibility and efficiency of adaptive networks for solving coupled HJB equations (35)-those networks perform sufficiently fast to be run online for reasonably large systems.

8. CONCLUSION

In this paper, we have defined H∞ graphical games for linear heterogeneous agents. Because of heterogeneity of the agents, we have used the output regulation theory to define the node error

(21)

dynamics properly. Our graphical games formulation has two important properties: firstly, the agents are heterogeneous and secondly, unmodeled disturbances are present. This allows us to solve output regulation andH∞output regulation problems. We have obtained an upper bound for theL2-gain of the output synchronization error with respect to disturbances which has not been hitherto derived in the graphical game framework. We believe that the current developments shed some light on the problem of graphical games with heterogeneous agents.

ACKNOWLEDGEMENT

Farnaz Adib Yaghmaie is supported by the Vinnova Competence Center LINK-SIC and by the Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP). Kristian Hengster Movric is supported by the GACR grant 16-25493Y. Frank l. Lewis is supported by ONR grant N00014-17-1-2239, ONR Grant N00014-18-1-2221, NSF Grant ECCS-1839804, and China NSFC grant #61633007. Rong Su is supported by NRF BCA GBIC grant on Scalable and Smart Building Energy Management (NRF2015ENC-GBICRD001-057) and MoE Academic Research Grant on Secure and Privacy Preserving Multi-Agent Cooperation (RG94/17-(S)-SU RONG (VP)).

REFERENCES

1. K. Hengster-Movric, F. L. Lewis, and M. Sebek, “Distributed static output-feedback control for state synchronization in networks of identical LTI systems,” Automatica, vol. 53, pp. 282–290, 2015.

2. C.-Q. Ma and J.-F. Zhang, “Necessary and sufficient conditions for consensusability of linear multi-agent systems,” IEEE Transactions on Automatic Control, vol. 55, no. 5, pp. 1263–1268, 2010.

3. L. Scardovi and R. Sepulchre, “Synchronization in networks of identical linear systems,” Automatica, vol. 45, no. 11, pp. 2557–2562, 2009.

4. F. Adib Yaghmaie, F. L. Lewis, and R. Su, “Output regulation of linear heterogeneous multi-agent systems via output and state feedback,” Automatica, vol. 67, pp. 157–164, 2016.

5. P. Wieland, R. Sepulchre, and F. Allg¨ower, “An internal model principle is necessary and sufficient for linear output synchronization,” Automatica, vol. 47, no. 5, pp. 1068–1074, 2011.

6. C. Huang and X. Ye, “Cooperative output regulation of heterogeneous multi-agent systems: AnH∞criterion,”

Automatic Control, IEEE Transactions on, vol. 59, pp. 267–273, 2014.

7. N. Yang, J.-W. Xiao, L. Xiao, and Y.-W. Wang, “Non-zero sum differential graphical game: cluster synchronisation for multi-agents with partially unknown dynamics,” International Journal of Control, pp. 1–12, 2018.

8. F. Adib Yaghmaie, R. Su, F. L. Lewis, and L. Xie, “Multi-party consensus of linear heterogeneous multi-agent systems,” IEEE Transactions on Automatic Control, vol. 62(11), pp. 5578 – 5589, 2017.

9. Y.-W. Wang, X.-K. Liu, J.-W. Xiao, and Y. Shen, “Output formation-containment of interacted heterogeneous linear systems by distributed hybrid active control,” Automatica, vol. 93, pp. 26–32, 2018.

10. X. Wang, Y. Hong, J. Huang, and Z.-P. Jiang, “A distributed control approach to a robust output regulation problem for multi-agent linear systems,” Automatic Control, IEEE Transactions on, vol. 55, no. 12, pp. 2891–2895, 2010. 11. J. Lunze, “Synchronization of heterogeneous agents,” IEEE transactions on automatic control, vol. 57, no. 11, pp.

2885–2890, 2012.

12. F. Adib Yaghmaie, F. L. Lewis, and R. Su, “Output regulation of heterogeneous linear multi-agent systems with differential graphical game,” International Journal of Robust and Nonlinear Control, vol. 26, pp. 2256–2278, 2016. 13. Y. Liu and Y. Jia, “RobustH∞consensus control of uncertain multi-agent systems with time delays,” International

Journal of Control, Automation and Systems, vol. 9, no. 6, pp. 1086–1094, 2011.

14. X. Yang and J. Wang, “Finite-gain Lp consensus of multi-agent systems,” International Journal of Control,

Automation and Systems, vol. 11, no. 4, pp. 666–674, 2013.

15. F. Adib Yaghmaie, K. Hengster Movric, F. L. Lewis, R. Su, and M. Sebek, “H∞-output regulation of linear

heterogeneous multiagent systems over switching graphs,” International Journal of Robust and Nonlinear Control, vol. 28, no. 13, pp. 3852–3870, 2018.

16. Q. Jiao, H. Modares, F. L. Lewis, S. Xu, and L. Xie, “Distributed l2-gain output-feedback control of homogeneous and heterogeneous systems,” Automatica, vol. 71, pp. 361–368, 2016.

17. M. Zhang, A. Saberi, and A. A. Grip, Havard F.and Stoorvogel, “H∞ almost output synchronization for

heterogeneous networks without exchange of controller states,” IEEE transactions on control of network systems, vol. 2, no. 4, pp. 348–357, 2015.

18. D. Liu, H. Li, and D. Wang, “Online synchronous approximate optimal learning algorithm for multi-player non-zero-sum games with unknown dynamics,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 44, no. 8, pp. 1015–1027, 2014.

19. K. G. Vamvoudakis and F. L. Lewis, “Multi-player non-zero-sum games: Online adaptive learning solution of coupled hamilton–jacobi equations,” Automatica, vol. 47, no. 8, pp. 1556–1569, 2011.

20. K. G. Vamvoudakis, F. L. Lewis, and G. R. Hudas, “Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality,” Automatica, vol. 48, no. 8, pp. 1598–1611, 2012.

(22)

21. Q. Jiao, H. Modares, S. Xu, F. L. Lewis, and K. G. Vamvoudakis, “Multi-agent zero-sum differential graphical games for disturbance rejection in distributed control,” Automatica, vol. 69, pp. 24–34, 2016.

22. T. Bian and Z.-P. Jiang, “Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design,” Automatica, vol. 71, pp. 348–360, 2016.

23. D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iteration,” Automatica, vol. 45, no. 2, pp. 477–484, 2009.

24. D. Liu, H. Li, and D. Wang, “Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm,” Neurocomputing, vol. 110, pp. 92–100, 2013.

25. D. Vrabie and F. Lewis, “Adaptive dynamic programming for online solution of a zero-sum differential game,” Journal of Control Theory and Applications, vol. 9, no. 3, pp. 353–360, 2011.

26. F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE circuits and systems magazine, vol. 9, no. 3, 2009.

27. R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press Cambridge, 1998, vol. 1, no. 1. 28. F. A. Yaghmaie and D. J. Braun, “Reinforcement learning for a class of continuous-time input constrained optimal

control problems,” Automatica, vol. 99, pp. 221–227, 2019.

29. A. Isidori, Nonlinear control systems. Springer Science & Business Media, 2013. 30. J. Huang, Nonlinear output regulation: Theory and applications. SIAM, 2004, vol. 8. 31. F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal control. John Wiley & Sons, 2012. 32. R. A. Hom and C. R. Johnson, Topics in matrix analysis, 1991.

33. H. K. Khalil, Noninear Systems. Prentice-Hall, New Jersey, 1996, vol. 2, no. 5.

34. F. Adib Yaghmaie, F. L. Lewis, and R. Su, “Output regulation of heterogeneous multi-agent systems: A graphical game approach,” in American Control Conference (ACC), 2015, 2015, pp. 2272–2277.