Multi-agent Approach to Community Detection in Complex Networks

(1)

Multi-agent Approach to Community Detection in Complex Networks

Ibrahim Said & Alexander Johansson Supervisor:Xiaoming Hu

Bachelor Thesis Department of Mathematics

Division of Optimization and Systems Theory KTH

May 2015

(2)

Abstract

A multi-agent approach to community detection is studied. There are three objectives in this thesis. The first is to investigate how the parameters of the model affect the community structure. To investigate this, the parameters are swept one at the time and the results are then compared to each other. The second objective is to study how the initial values of the agents affect the community structure. This is studied by letting all parameters be fixed and vary the initial values. The third objective is to study how robust the model is to networks with negative links and networks with missing links. This is studied by fixing all parameters and replacing some of the positive links with negative links and comparing the outcome with the original communities. Removal of some links is then done in a similar way and compared to the original network. The study of the multi-agent approach has led to conclusions being drawn. The parameters ρ and α are sensitive and setting the initial values in a good way gives increased convergence speed. The final conclusions are that opinion dynamics with decaying confidence is a suitable model to networks that contains negative links while the robustness to missing links depends on the accuracy demanded by the application.

(3)

Sammanfattning

Multi-agent system används för att detektera kluster i komplexa nätverk.

Det finns tre m˚al med uppsatsen. Det första är att undersöka hur parametrarna i den matematiska modellen p˚averkar klusterdetekteringen.

Detta undersöks genom att parametrarna i modellen varieras och sedan jämförs resultaten. Det andra m˚alet är att studera hur begynnelsevärdena för agenterna p˚averkar klusterdetekteringen. Detta görs genom att fixera alla parametrar och variera begynnelsevärdena. Det tredje m˚alet är att redogöra huruvida modellen är kompatibel med nätverk som inneh˚aller negativa och studera robustheten mot saknade länkar. Detta undersöks genom att fixera alla parametrar och sedan ersätta n˚agra positiva länkar med negativa länkar och sedan jämföra resultatet. Problemet med saknade länkar undersöks p˚a liknande vis. Denna studie har lett till följande slutsatser. Parametrarna ρ och α är känsliga. Genom att sätta begyn- nelsevärdena p˚a ett bra sätt kan man öka konvergenshastigheten. Den sista slutsatsen är att modellen är kompatibel med negativa länkar och beroende p˚a tillämpning s˚a är modellen även robust mot saknade länkar i nätverket.

(4)

List of Figures

1 Two isomorphic graphs . . . 1 2 Zachary’s karate club . . . 9 3 College football network . . . 10 4 Zachary’s karate club with ρ = 0.95, M = 0.97 and α = 0.5 . . . 11 5 Zachary’s karate club with ρ = 0.89, M = 0.97 and α = 0.5 . . . 12 6 Zachary’s karate club with ρ = 0.82, M = 0.97 and α = 0.5 . . . 12 7 Zachary’s karate club with ρ = 0.7885, M = 0.97 and α = 0.5 . . 12 8 Zachary’s karate club with ρ = 0.3, M = 0.97 and α = 0.5 . . . . 13 9 College football network with ρ = 1, M = 0.99 and α = 0.5 . . . 14 10 College football network with ρ = 0.896, M = 0.99 and α = 0.5 . 14 11 College football network with ρ = 0.895, M = 0.99 and α = 0.5 . 14 12 College football network with ρ = 0.85, M = 0.99 and α = 0.5 . . 15 13 Zachary’s karate club with ρ = 0.7885, M = 9.7 and α = 0.5 . . . 15 14 Zachary’s karate club with ρ = 0.7885, M = 0.49 and α = 0.5 . . 16 15 Zachary’s karate club with ρ = 0.7885, M = 0.97 and α = 0.25 . . 16 16 Zachary’s karate club with ρ = 0.7885, M = 0.97 and α = 0.95 . . 17 17 College football network with ρ = 0.85, M = 0.99 and α = 0.5 . . 17 18 College football network with ρ = 0.783, M = 9.9 and α = 0.5 . . 18 19 College football network with the alternative method of choosing

initial options. Parameters are ρ = 0.783, M = 9.9 and α = 0.5 . 18 20 Family network with ρ = 0.5, M = 0.86 and α = 0.5 . . . 19 21 Family network with a negative link and parameters ρ = 0.5, M =

0.86 and α = 0.5 . . . 20 22 Zachary’s karate club with two negative links and parameters

ρ = 0.7885, M = 0.97 and α = 0.5 . . . 21 23 Zachary’s karate club with one negative link and parameters ρ =

0.7885, M = 0.97 and α = 0.5 . . . 21 24 Zachary’s karate club with two missing links and parameters ρ =

0.7885, M = 0.97 and α = 0.5 . . . 22 25 Zachary’s karate club with one missing link and parameters ρ =

0.7885, M = 0.97 and α = 0.5 . . . 22 26 State plot of Zacharys with parameters ρ = 0.89, M = 0.97 and α =

0.5 . . . 23 27 State plot of Zacharys with ρ = 0.82, M = 0.97 and α = 0.5 . . . 24 28 Two communities detected with modularity optimization . . . . 26 29 Four communities detected with modularity optimization . . . . 26

(6)

1 Introduction

This thesis has three objectives. The first objective is to detect communities using a multi-agent systems approach and study how the parameters of the model affect the community detection. The second objective is to suggest a method to set the agent’s initial state and study if the results will change. The last objective is to investigate if the multi-agent approach is robust to negative links in networks and small errors in topology.

Since real-world networks often are modeled as graphs, graph theory plays a central role when analyzing networks and detecting communities. The relevant graph theory needed for community detection is reviewed in section 1.1. A general background to community detection is then given in section 1.2. Different approaches, works and applications are presented to give a further insight to the field of community detection.

1.1 Graph theory

A graph is here defined as an ordered pair of sets G = (E, V ) where E is the set of edges and V is the set of vertices. The number of vertices is |V | and the number of edges is |E|. The vertices are labelled so that vi∈ V is vertex i. An edge e = (i, j) ∈ E is a set that consists of two vertices that have a connection between them. The vertices vi and vj is called adjacent or neighbours if there exists an edge that connects them. A graph G is called an undirected/directed graph if the elements in V are an unordered/ordered pair. The order of a graph is |V | and the size of a graph is |E|. The degree d_iof a vertex v_iis defined as the number of edges connected to vertex v_i. What should be noted is that a loop counts as two edges. A graph G=(E,V) is called simple if it is undirected with no loops. The neighbour set N_i = {j | (i, j) ∈ E} consists of all neighbouring vertices to a vertex i. The number of neighbours to vertex vi is denoted |Ni|.

Two graphs can have the same graph structure but not have the same labelling or visualization since V is an unordered set. Graphs that have this property are isomorphic to each other. Figure 1 shows two different isomorphic graphs.

1 4

2 3 5

1 4

2

3

5

Figure 1: Two isomorphic graphs

A graph G can also be expressed in matrices. The adjacency matrix, degree

(7)

matrix and laplacian will be used to represent graphs. The adjacency matrix A is defined as follows:

a_ij =

(1 if (i, j) ∈ E

0 otherwise (1)

The matrix A represents the adjacency relations of the vertices in G. The element a_ij = a_jiif the graph is undirected. The element a_ij= 1 if two vertices i and j are adjacent. The adjacency matrix for the graphs in figure 1 is given below

A=







0 1 1 1 0

1 0 1 0 0

1 1 0 1 0

1 0 1 0 1

0 0 0 1 0





 The degree matrix D is defined as follows:

dij =

(d_i if i = j

0 otherwise (2)

It represents the degree of each node and d_i = |N_i| for an undirected graph.

The laplacian L of a Graph G is defined as follows:

L = D − A (3)

The laplacian stores information about properties of the graph.

l_ij=







|Ni| if i = j

−1 if (i, j) ∈ E 0 otherwise

(4)

Both row and column sums are zero since

|Ni| =X

j

aij and di= |Ni| (5)

so

li=X

j

lij =X

j

aij− di= |Ni| − |Ni| = 0 (6) The laplacian L is symmetric because of the fact that D is a diagonal matrix and A is symmetric.L is also positive semi-definite meaning that all eigenvalues λi ≥ 0. From this follows that the vector ¯1 is an eigenvector to L with λ1= 0 which is the smallest eigenvalue. The algebraic multiplicity of this eigenvalue is the number of connected components of the graph. The second-smallest eigenvalue λ2 is called algebraic connectivity and is nonzero as long as the graph is connected. The laplacian for the graphs in figure 1 is given below

(8)

L=







3 −1 −1 −1 0

−1 2 −1 0 0

−1 −1 3 −1 0

−1 0 −1 3 −1

0 0 0 −1 1







1.2 Community detection

A complex network is a network with a graph representation that has an ir- regular structure and non-trivial topology. Many real-world complex systems may be modelled as a complex network where the vertices are components and the edges are the relations between these. Graph theory is therefore a powerful tool to analyze these systems that might be of very different character. Com- plex networks can often be partitioned into smaller high dense group of nodes that have few connections to the rest of the network. These groups are called communities. Networks that can be partitioned into communities are said to have a community structure. An area of interest is organizing, structuring and detecting the communities of these systems. There are application in a vari- ety of disciplines such as biology, sociology, engineering, physics and economics [1]. Because of the broad application spectrum, community detection algorithms use a lot of different approaches. Quantifying and mathematically defining good community structure is a hard task and there are many different approaches.

Modularity is a quality measure of how well networks are divided and is often used to quantify community structure. It is defined as follows: Consider a graph G with m edges and two vertices vi and vj with degrees di and dj that belongs to G. If the edges of G are randomly distributed, then the expected value of edges that connects vi and vj is ^d_2mⁱ^d^j. The ith community is notated as si. The modularity Q is then given below[2]

Q = 1 2m

X

ij

[a_ij−d_id_j

2m]δ_s_i_,s_j (7)

Where δ_s_i_,s_j is Kronecker’s delta which takes the value 1 if the vertices v_i and v_j belongs to the same community and 0 if not. The term a_ij is an element from the adjacency matrix A.

One of the most used approaches to community detection are optimization algorithms. These have in common that they try to optimize a certain quality measure of the community structure in a network, for example modularity.

Another common algorithm is the Girvan-Newman algorithm [3], which use betweeness to detect communities. The betweeness of a node indicates how central it is and measures how many times the node occurs in the shortest route for all possible start and ends. The algorithm is an iterative method that removes nodes with high betweeness and then recalculates the betweeness of the new network until the network becomes the empty set. The removal of nodes

(9)

with the highest betweeness leads to a division of connected components that represent communities until no more division is possible, so that the result will show the hierarchical structure of the network.

(10)

2 Mathematical model

This thesis uses a multi-agent approach to detect communities in networks.

Multi-agent systems can be represented by a graph G = (E, V ) where the vertices are agents and the edges represent connections between the agents. The definition of an agent is not definite but the agents used in the model are au- tonomous agents which do not need any human input[4].The information related to an agent i is the variable x_i. It can represent many different quantities such as position, human opinion and price depending on the application. The agents exchange information with each other. Then they use the shared information to update their own information. This process is repeated. The dynamics of the multi-agent systems can be either continuous or discrete. One can model the dynamics so that agents with edges between them will attract each other and form communities.

This section will give a theoretical framework to finding communities using opinion dynamics. In section 2.1 the concept of consensus is introduced. It is a fundamental concept to opinion dynamics because consensus can be interpreted as finding one community consisting of all agents. Opinion dynamics with decaying confidence is then explained in 2.2-2.3. The objective of the model is to find multiple communities. This is further explained in section 2.4

2.1 Consensus

Consensus is when all the agents of a system reaches an agreement of a variable, meaning that the chosen variable of each agent converges to one value. Consider a simple graph G = (E, V ) and set an initial values xi specific for each agent i = 1, ..., |V |. One kind of continuous dynamics that will achieve consensus is then [5]

˙

x_i(t) = X

j∈N_i

a_ij(x_j(t) − x_i(t)) (8)

Where a_ij is an element in the adjacency matrix A (1) and therefore a_ij= a_ji= 1 if there is a link between vertices i and j and a_ij = a_ji= 0 if (i, j) 6∈ E. The dynamics above is written in matrix form

˙

x(t) = −Lx(t) (9)

Where L is the graph laplacian (4) and x is a |V | × 1 vector which consist of xi

for xi = 1, ..., |V | . A slightly more complicated recursive consensus dynamics is [5]

xi(t + 1) = xi(t) + α X

j∈Ni

aij(xj(t) − xi(t)) (10) Where the introduced parameter α affect how fast the agents are changing value.

If α is close to zero the convergence speed of the agents will be slow thus the

(11)

agents will barely move even if they are far from other agents. In matrix form this dynamics is written:

x(t + 1) = P x(t) (11)

Where P = I − αL.

The only possible equilibrium for both the continuous model and the discrete model is a state where all agents have the same value x1= ... = xi= ... = x_{|V |}.It is shown in the literature that if G is a connected and undirected graph then all agents will converge to the mean value of all agents initial value

t→∞lim xi(t) = P

j∈V

xj(t = 0)

|V | = x^?, i = 1, ..., |V | (12)

2.2 Opinion dynamics

Consider an undirected graph G = (E, V ) where |V | = n is the number of vertices and the set N_i= {j | (i, j) ∈ E} consist of the index of all vertices connected to vertex i with an edge. Every vertex i = 1, ..., |V | is assigned an initial value xi(t = 0). The vertices vi ∈ V communicate with its neighbourhood, that is with all vertices j ∈ Ni and the values of a nodes neighbours deter- mines xi(t + 1). The model mimics the dynamics of how an opinion is spread among humans over time. The vertices are representing humans and the edges are representing a connection between two humans in a network. The model is time discrete and the movement of the vertices is synchronized. The model is described mathematically as follows [6]

x_i(t + 1) =

n

X

i=1

p_ijx_j(t) (13)

pij is a weight function that describes how much the value of vertex j will affect the value of vertex i in the next time step. Typically pij=pji if the impact between two vertices are mutual. Notice that pii tells how much the previous value of vertex i impact its own value in the next time step. In matrix form the model is written

X(t + 1) = P X(t) (14)

Where X is a |V | × 1 vector consisting of all xi values.

X =





 x₁

... x_n





 (15)

and P is a n × n matrix consisting of the weights pij.

P =







p11 · · · p1n

... . .. ... pn1 · · · pnn





 (16)

(12)

The dynamics above is general and there is no restriction on P . To make the model more tractable some restrictions are introduced. If a vertex have many neighbours then the contribution from every single neighbour is small, so it is reasonable to assume that the contribution from every single neighbour is inversely proportional to the number of neighbours. It is also assumed that the impact of a vertex previous opinion is large and that if one vertex do not have neighbours then its opinion will not change. This results in the following dynamics [6]

x_i(t + 1) =







xi(t) +_|N^α

i|

P

j∈N_i

(xi(t) − xj(t)) if Ni6= ∅

xi(t) if Ni= ∅

(17)

where α ∈ (0, 0.5). To write this dynamics in matrix form the following matrix Q is introduced

Qij =







1 if i = j

−_|N¹

i| if (i, j) ∈ E 0 otherwise

(18)

and P is therefore P = I − αQ.

2.3 Confidence boundary and decaying confidence

If G is a connected graph and the vertices can communicate with all of its neighbours then the only equilibrium is complete consensus as in section 2.1.

Two assumptions are made to make the model useful in a community detection context. The first assumption is that if the distance between the opinions of two vertices is larger than an assigned confidence boundary than these vertices will not have an impact on each other even if there is an edge between them.

This can be thought about as two humans need to have some understanding to affect each other. So mathematically Ni= {j | (i, j) ∈ E ∧ |xi− xj| ≤ R}. The second assumption is made that the confidence boundary is decaying. This is simulating that when time goes humans opinions are getting more static. And only humans with small dissidents have an impact on each other and this is written

R = M ρ^t (19)

Where M is the initial boundary and ρ ∈ (0, 1) is the confidence decay [6].

2.4 Applied to Community detection

In the literature [6] the vertices viand vjis defined to be in the same community if

j ∈ Ni as t → ∞ (20)

(13)

Due to (19) this implies that every community has a value that its members has converged to.

t→∞lim xi= lim

t→∞xj = x^∗ (21)

The parameters α,M and ρ have been introduced and these options affect how the vertices form communities with each other.

One factor that is not studied in the literature that are important for the speed of the algorithm and quality of the result is the initial vector X(t = 0). In previous literature the initial vector is either randomized or the elements in it are distributed with equally distance. Instead another method is suggested.

The intuitive assumption is that if vertices have many common neighbours then the probability that these vertices is in the same community is high. It is also assumed that the vertices with high degree are central to their community.

Γ is the set of nodes with the highest degree. The size of Γ is denoted g.

The set of nodes with most common neighbours to a node Γ_i ∈ Γ is denoted Ψi = {Ψij|j = 1, ..., p} and have size p. So Ψ ⊂ V and Γi ∈ V .The set Ψ is defined as Ψ = {Ψi|i = 1, ..., g}. The nodes Γi are placed so that they have a distinct distance from all nodes. Then, the nodes Ψi are placed close to the node Γi.

(14)

3 Simulation

In section 3.1 the networks that opinion dynamics is used on will be introduced.

Some background information about these real-world networks will also be presented to give a sense of how the community structure could be. The tools used for implementation of the model and visualization of networks is presented in section 3.2 .

3.1 Networks

We will detect communities in two real-world networks. The first is Zachary’s karate club where the nodes are members in the club and the edges are a friend- ship between two members. There are 34 vertices and 78 edges in Zachary’s karate club. This network is a classical network when testing community community detection algorithms. There exist two natural communities due to different opinions about increasing membership fees.

Figure 2: Zachary’s karate club

The second network we are going to detect communities in is a network of Amer- ican football teams that belongs to one of twelve different divisions. It consists of 115 nodes and 613 edges. The nodes represent teams and the edges represent if two teams have played a game against each other. A natural community

(15)

structure would be the real-world divisions.

Figure 3: College football network

3.2 Implementation

We implement the model described in section 2 in Matlab. The visualization of the graphs is generated by built-in function biograph in Matlab. When we visualize communities in Zachary’s karate club we are coloring the communities and keep both intra-connections and inter-connections. When we visualize communities in the American football network we are coloring the communities and keeping only the intra-connections. The difference in visualization is because of that it is hard to clearly see the community structure in the football network due to the large size. The inter-connections in the karate club is not removed because the size is not so large that you cannot see the communities if they are given before. Keeping the intra-connections gives a better understanding to the community structure.

(16)

4 Results

In section 4.1 opinion dynamics with decaying confidence is used to detect communities. The parameters ρ,M and α are varied to see how these affect the community structure. How the initial opinions affect the community structure is investigated in section 4.2.This is done by comparing the results when setting the initial opinions in different ways. In section 4.3 a study is made on how suitable the approach is to networks with negative links and how robust the model is to missing edges in the network.

4.1 Study of parameters

In the following sections 3.2.1-3 we study how the outcome of the algorithm is affected by changes in the parameters ρ,M and α. The parameter ρ is tested on both the Zachary’s karate club network and the college football network.

4.1.1 Convergence speed ρ

As seen in (19), the parameter ρ decides how fast the agents converge. The figures below will demonstrate how this parameter affects the community detection. This is done by varying ρ and letting all other parameters be fixed.

Initially are the agents evenly distributed over a distance of 1.

Figure 4: Zachary’s karate club with ρ = 0.95, M = 0.97 and α = 0.5

In figure 4 it can be seen that the agents have converged and reached consensus.

This is expected to happen when ρ is high enough. If ρ is chosen to be close to one then the dynamics will be close to a consensus protocol and therefore the communities will be few. The network will now be studied for lower values on ρ.

(17)

(18)

From the figures 5,6 and 7 it can be seen that successively lowering ρ leads to more detected communities. If ρ is chosen to be low then the decay will be faster which means that communities will form faster and be smaller than for higher values on ρ. The lowest values will lead to all agents forming their own community.

Figure 8 shows that no significant communities are detected for ρ = 0.3.

The study of the parameter ρ that was made above showed that a high ρ leads to consensus and lowering the value of ρ leads to detecting more communities until a certain value. After this certain value will the agents form so small groups that they can no longer be considered a community. The algorithm will now be studied on the college football network to get a better understanding of the results. It will also show that the algorithm is compatible with larger networks.

(19)

Figure 9: College football network with ρ = 1, M = 0.99 and α = 0.5

Figure 10: College football network with ρ = 0.896, M = 0.99 and α = 0.5

(20)

Figure 9 shows that the agents have reached a consensus agreement for ρ = 1. The figures 10,11 and 12 shows that more communities are detected when lowering ρ. This corresponds to the results shown for Zachary’s karate club.

4.1.2 Initial confidence boundary M

The parameter M in (19) set the initial boundary of how large the distance between two agents initial opinions can be while stile being able to communicate.

To investigate this we show pictures below of community detection with different value of M and all other parameters fixed. Initially the agents are equally distributed over a distance d = 1.

(21)

The community structure in figure 13 and figure 7 are identical even though M is ten times higher in figure 13. It is therefore clear that the model is not so sensible for variation of the parameter M . In figure 7 M is chosen to be precisely as high so that initially all agents can communicate with its neighbours.

In figure 14 the value of M is decreased and it is clear that if the value of M is chosen so that some agents cannot communicate with its neighbours initially then it will radically affect the community structure.

4.1.3 Responsivity α

The parameter α in (17) decides how much the agents respond each other. Below figures are show with different values of α while all other parameters are fixed.

Initially the agents are equally distributed over a distance d = 1.

(22)

When comparing the figures 7, 15 and 16 it is seen when decreasing α smaller communities is found and for increasing α bigger communities is found. There- fore the parameters α and ρ has to be set with respect to each other.

4.2 Study of initial opinions

In this section it is demonstrated how the initial values of the agents affect the community structure. In section 2.4 a suggestion of a method on how the initial state can be set is presented. It will be investigated if this method is more effective than evenly distributed initial states. In previous sections the initial values are equally distributed over a distance d = 1. Below a graph is shown where the initial values are slightly changed from the evenly distributed setup.

(23)

When figure 17 is compared with figure 12 it is seen that these graphs are not isomorphic because the detected communities are not the identical in the figures.

It is therefore motivated to study how to set the initial values in a effective way.

Two figures that illustrate community detection on the college football network are shown below where ρ, α are fixed and M is set so that all agents initially can possibly communicate with each other. In the first figure the initial values are equally distributed over a distance d = 1. The second figure the method suggested in section 2.4 is used to distribute the initial values. Two parameters g and p was introduced in section 2.4. The parameter g is set to 12 and p is set to 6.

Figure 19: College football network with the alternative method of choosing initial options. Parameters are ρ = 0.783, M = 9.9 and α = 0.5

(24)

It is clearly shown in figure 18 and figure 19 that the method introduced in section 2.4 can be an effective method of setting initial values. In figure 18 no information can be mined while in figure 19 the communities are equivalent to the real-world communities in the American football network.

4.3 Robustness

In this section we will study how robust the approach is due to networks that contains negative links and missing links. In section 4.3.1 we test the multi- agent approach on a small artificial network and look how negative links in it affect the community detection. In section 4.3.2 some negative links are added to the Zachary’s karate club network and we study if the multi-agent approach will find reasonable communities. In section 4.3.3 some links will be removed from Zachary’s karate club network.

4.3.1 Artificial network

Two figures with identically parameters and initial state will be shown below.

The only difference between these figures is that in the first network all links are positive and in the second network the red link is negative.

Figure 20: Family network with ρ = 0.5, M = 0.86 and α = 0.5

(25)

Figure 21: Family network with a negative link and parameters ρ = 0.5, M = 0.86 and α = 0.5

The negative link between node 5 and node 6 splits the network into two communities. This kind of network could simulate a family conflict and the community structure is reasonable.

4.3.2 Modified Zachary’s karate club, negative link

As seen in section 4.3.1 replacing a positive link with a negative may give a split among a community. A negative link will give a repulsive relation between two agents so instead of trying to reach consensus they do the opposite. We will now replace some of the positive links with negative links on Zachary’s karate club to study if the algorithm can handle negative links as expected and how robust it is to changes. The implementation will be compared to the community structure in figure 7

(26)

Figure 22: Zachary’s karate club with two negative links and parameters ρ = 0.7885, M = 0.97 and α = 0.5

It can be seen in figure 22 that the pink community in figure 7 that now is in conflict because of two negative links have merged with a larger community, the light blue one.

Figure 23: Zachary’s karate club with one negative link and parameters ρ = 0.7885, M = 0.97 and α = 0.5

In figure 23 the negative link between agent 5 and agent 7 cause the yellow community in figure 7 to split up into two new communities.

4.3.3 Modified Zachary’s karate club, missing links

When creating a network from a real-world situation it may not be obvious which nodes that should have a edge between since their is multiple way of defining the criteria for having edge. For example in a financial network there

(27)

is threshold of how large the correlation between two stocks must be to have an edge between them [8]. This can also be a problem because sometimes edges are missing due to incomplete information. How these missing links affect the community structure will be studied in this section.

Below two graphs are shown of Zachary’s karate club. In first graph, the edges between node 25 and node 26 and between node 25 and 32 are removed. In the second graph, the edge between node 5 and node 11 is removed.

Figure 24: Zachary’s karate club with two missing links and parameters ρ = 0.7885, M = 0.97 and α = 0.5

Figure 25: Zachary’s karate club with one missing link and parameters ρ = 0.7885, M = 0.97 and α = 0.5

(28)

5 Discussion

In this section the objectives of the thesis and the results are connected and discussed. Which parameters of the model that are important, sensitive and robust are discussed in section 5.1. Further in section 5.2 it is described how the initial values of the agents affect the community detection. In section 5.3 it is discussed if the model can find reasonable communities in networks with negative links. It is also discussed if the model is robust to missing links in networks.

5.1 Parameters

In section 4.1 the parameters ρ,M and α of the model was investigated. It was seen that ρ play a important role of how many communities the network are partitioned in. The parameter ρ decides the convergence speed. An increase in ρ leads to a decrease in convergence speed and vice versa. When ρ is high the agents have much time to reach a consensus because the confidence boundary decrease slowly meaning that the dynamics will behave as in section 2.1. This is identical to stating that one community consisting of all agents is detected.

Decreasing ρ under a certain value the agents will be forced to converge at a speed faster than O(ρ^t). We study networks that are initially connected and remove edges as R is decreased. In general the graph will not be connected after some time steps. When a component consists of agents with convergence speed higher than O(ρ^t), this component becomes a consensus problem which forms a community.This is illustrated in two figures below where the y-axis is opinion and the x-axis is time.

Figure 26: State plot of Zacharys with parameters ρ = 0.89, M = 0.97 and α = 0.5

(29)

Figure 27: State plot of Zacharys with ρ = 0.82, M = 0.97 and α = 0.5

In figure 26 the agents have converged into two communities and in figure 27 the agents have converged into three communities. In both figures all agents have reached consensus inside their communities.

When using opinion dynamics in community detection ρ is a very sensitive parameter as seen in section 4.1.1 and there is no universal value of this parameter that will give a reasonable result to all possible networks. ρ must therefore be adjusted to each network.

The parameter M represents the initial boundary. It is the largest communi- cation distance meaning that two neighbouring agents with a distance between them that is larger than M will never be able to communicate. It is therefore very important that one set M such that all agents can communicate with each other. This means that M should be at least the as big as the largest distance between two agents. As long as this is satisfied, the model is robust to changes in M. This can be seen in section 4.1.2.

α is a measure of how responsive an agent is to attraction. Increasing α leads to agents responding more to their environment meaning that |xi(t + 1) − xi(t)|

will not decrease but may increase. The slowing effect of decreasing α can be compensated by lowering ρ and vice versa.

5.2 Initial state

In section 4.2 the results of two different methods used to set the initial state of agents were presented. The first is to spread the agents equidistant over an interval. The second is the method suggested in section 2.4. The method is based on two assumptions. The first is that nodes with high degree are more important for the community structure than nodes with low degree. The second assumption is that if two nodes have many common neighbours then the

(30)

probability that these two nodes are in the same community are high. The way of placing the nodes Γi is motivated by fact that if the nodes of Γ is placed too close to each other may communities merge into each other. Placing the nodes this way leads to communities faster since the nodes probably starts closer to their fellow community nodes. It is reasonable to assume that the method improve the results when the assumptions are satisfied. The method should work best on networks where the paths between nodes with high degree are long. Especially, if the path between two nodes with high degree are longer than four edges, then Ψ_i∩ Ψ_j = ∅ ∀i, i 6= j. What should be considered is that if two or more of the nodes in Γ are members of the same community may this method be problematic since there is a risk that the community is partitioned into two or more parts. What should be noted is that the value of the parameters p and k that is introduced in section 2.4 is chosen depending on the network. There is no optimal p and k that is universal. It it seen in section 4.2 the algorithm could not find reasonable communities when using equidistant initial values because ρ was too low but it could find reasonable communities with the method introduced in 2.4 with the same ρ. So the suggested method did improve the convergence speed.

5.3 Robustness

There is no absolute way of creating a real-world network. When modelling a real-world situation as a graph there is a span of different property setups. For instance, the college network could have been modelled as a weighted graph where the weight of a link represents how many games two teams have played against each other. Except from the setups, missing data or inaccuracy of the model might lead to some connections or node missing. The robustness to negative and missing links on opinion dynamics is therefore tested in section 4.3.

In section 4.3.1 it is seen that a negative link splits the network in a reasonable and expected way. Zachary’s karate club with some positive links replaced with negative links is shown in section 4.3.2. It can be seen in figure 22 that the negative links in the former community leads to an absorption by a larger community. In figure 23 one community splits into two because of the negative link. Both figures show reasonable outcomes. This shows that the model is compatible with negative links.

In section 4.3.3 it is studied how the community structure is affected by missing links in the network. In figure 24 the absence of two links leads to a split in a small community but depending on application this is reasonable.

5.4 Comparison with modularity optimization

To see if opinion dynamics is an effective way of detecting communities it has to be compared to other approaches. Opinion dynamics will be compared to modularity optimization since it is one of the most common approach and very

(31)

different from the multi-agent approach. Below two figures are showed of communities detected with a modularity optimization approach [7].

Figure 28: Two communities detected with modularity optimization

Figure 29: Four communities detected with modularity optimization

When detecting two communities in Zachary’s karate club it is seen that the result in figure 28 which uses an modularity optimization is identical to figure 5. The figures 29 and 7 showing detection four communities only differ by one node. So both approaches show similar results when detecting communities in Zachary’s karate club.

To verify the similarity of the results from Zachary’s karate club will another network be used to compare the two approaches. The real-world community structure of the football network is a division 12 communities. When using opinion dynamics on the network 12 communities were found as seen in 11.

Using modularity optimization same number of communities is found. This means that both approaches are still valid at this network size.

(32)

While opinion dynamics only needs information about the adjacency matrix of the network, modularity optimization needs a pre-assigned number of communities that one wants the optimization algorithm to find. The problems that can rise in many real-world networks are the number of communities unknown.

This resembles the problem of finding the optimal value for the parameter ρ.

When optimizing modularity using for instance mathematical programming, the only parameter that can be varied is the pre-assigned number of communities.

It means that there is only one way of finding for example five communities.

It is a hard task to pre-assign a reasonable number of communities on massive networks.

(33)

6 Conclusions

In this report a multi-agent approach to community detection is studied. The parameters of opinion dynamics with decaying confidence are analyzed. It is concluded that the model is sensitive to changes in ρ and α while robust to changes in M .

The impact of how initial values are set is studied. It is shown that communities can often be detected when setting the values randomly. What has been realized is that it will affect the convergence speed, so one has to set the initial value in a good way to detect communities effectively. This report has suggested one method to set initial values.

Robustness has been discussed and the conclusion is that this algorithm is suitable for detecting communities in networks with negative links. The compatibil- ity with negative links means that a broader span of networks can be analyzed through this algorithm. Most community detection algorithms do not support negative links. A study on how removal of some links effect the community structure is made. This is made to test if the model to errors in the creation of real-world networks. The conclusion is that depending on the application, opinion dynamics can be used on networks with missing information.

We did detect communities using opinion dynamics with decaying confidence and studied the model in line with the three main objectives. With this done there are still many aspects and problems to study within the field of community detection that is outside the scope of this thesis.

(34)

References

[1] Fortunato S.Community detection in graphs 2010.

[2] Newman M.E.J. Communities, modules and large-scale structure in networks 2014.

[3] Girvan M. and Newman M.E.J. Community structure in social and biolog- ical networks 2002.

[4] Schumacher M. Objective coordination in multi agent system engineering 2001.

[5] Olfati-Saber R. , Alex Fax J. and M. Murray R. Consensus and Cooperation in Networked Multi-Agent Systems 2007.

[6] Morarescu I. and Girard A. Opinion Dynamics With Decaying Confidence:

Application to Community Detection in Graph. 2011.

[7] Nicosia V. , Mangioni G. , Carchiolo V. and Malger M. Extending the definition of modularity to directed graphs with overlapping communities 2009.

[8] S¨orensen K. Clustering in Financial Markets - A Network Theory Approach 2014.

Multi-agent Approach to Community Detection in Complex Networks