Network analysis of the share ownership structure on the Swedish stock market

(1)

Network analysis of the share ownership structure on the Swedish stock market

Ludvig Bohlin

(2)

Department of Physics Linnaeus väg 20 901 87 Umeå Sweden

www.physics.umu.se

(3)

Network analysis of the share ownership structure on the

Swedish stock market

Ludvig Bohlin

Integrated Science Lab Department of Physics

Umeå University

June 21, 2012

(4)

Network analysis of the share ownership structure on the Swedish stock market Master’s thesis, Master of Science in Engineering Physics, Umeå University.

Ludvig Bohlin, ludvig.bohlin@gmail.com.

Supervisor: Martin Rosvall, Department of Physics, Umeå University.

Examiner: Ludvig Lizana, Department of Physics, Umeå University.

Presented: Umeå University, June 14, 2012.

Approved for print: June 19, 2012.

(5)

Abstract

The stock market is an example of a complex system, i.e. it consists of a number of traders, interacting in such a way that their collective behaviour, the behaviour of the market, is not a simple combination of their individual behaviour. One of the most important tasks in modern finance is finding efficient ways of summarizing and visualizing the stock market data to obtain useful information about the behavior of the market.

In this thesis we investigate the possibility of finding a way to summarize and cluster share ownership data from the Swedish stock market. This is done by using a network approach to analyze the structure of the share ownership in order to find significant patterns in the data. The analysis of the network is performed with the community detection algorithm InfoMap, which turns the problem of finding clusters into the problem of optimally compressing the flow of information on the structure of the network.

The results of the analysis indicate that it is possible to find significant patterns in the ownership data when looking at the holdings of individuals using a binary approach. By using the clusters with the largest information flow, a majority of the an- alyzed individuals are categorized into clusters that accommodates for different properties regarding the ownership of the included individuals. The clustering results are visualized using alluvial diagrams which also are used to display changes that occur in the ownership structure between two dates.

Sammanfattning

Aktiemarknaden är ett exempel på ett komplext system, d.v.s. ett system som består av ett antal aktörer som alla samverkar på ett sådant sätt att deras kollektiva beteende, marknadens beteende, inte bara är en kombination av deras individuella beteende. En av de viktigaste uppgifterna på dagens finansmarknad är att hitta effektiva sätt att sammanfatta och visualisera data från aktiemarknaden som skulle kunna ge användbar information om marknadens beteende.

I denna avhandling undersöker vi om det går att hitta ett sätt att sammanfatta och klustra aktieinnehavsdata från den svenska aktiemarknaden. Detta utförs med hjälp av en nätverksbaserad metod som används för att analysera aktieinnehavstrukturen och hitta viktiga mönster i datamaterialet. Analysen av nätverket sker med klustringsalgo- ritmen InfoMap, som gör om problemet att hitta kluster i ett nätverk till problemet att hitta en optimal komprimering av informationsflödet på nätverkets struktur.

Resultatet av analysen visar pä möjligheten att hitta mönster i aktieinnehavsdatat när man tittar på individers innehav genom att använda ett binärt tillvägagångssätt.

Genom att använda de kluster som har störst informationsflöde så kan en majoritet av de analyserade individerna kategoriseras i kluster med olika egenskaper när det gäller ägandet hos de inkluderade individerna. Klustringsresultatet kan visualiseras med hjälp av alluvialdiagram, och dessa kan också användas för att visa på förändringar som sker i ägarstrukturen mellan två datum.

(6)

(7)

Preface

This is my Master’s thesis for the degree of Master of Science in Engineering Physics at Umeå University. The thesis has been written at the Integrated Science Lab (IceLab) at Umeå University during the spring of 2012.

I would like to thank my supervisor Martin Rosvall for providing valuable input and guidance throughout the project and Krister Modin at Euroclear Sweden AB who made this work possible. Gratitude is also directed towards the people at IceLab for welcoming me as a member of the team, and to Patrik Törmänen for helpful comments and numerous Board Hockey games that helped me clear my mind. I also would like to thank my girlfriend Emelie for believing in me and always supporting me.

Ludvig Bohlin, Umeå,

June 21, 2012.

(8)

(9)

Chapter 1 Introduction

In this chapter the main topic of the thesis and the motivation behind it are introduced.

The chapter starts with a presentation of the problem background, then the problem formulation is given and finally a presentation of the outline of the thesis is provided.

1.1 Background

The financial market plays a huge role in our daily lives. The current Euro-crisis as well as historical stock market crashes like the Dot-com bubble in the late 90’s or the modern global financial crisis are testaments of the influence of economical markets in society. Whether we trade shares or not—nowadays almost everyone is affected by the stock market. In the end of 2011 about 1.5 million people, or 16%, in Sweden owned shares on the stock market, and the proportion has decreased the last ten years [37]. However, this does not mean that the Swedes have abandoned shares as a form of investment, but instead the ownership is managed indirectly through funds. About 74% of the Swedish population (18–74 years) own shares in mutual funds and if the savings for the premium pension is included almost the whole population is covered [28]. The stock market is an example of a complex system, i.e. it consists of a number of traders, interacting in such a way that their collective behaviour, the behaviour of the market, is not a simple combination of their individual behaviour [22]. One of the most important problems in modern finance is finding efficient ways of summarizing and visualizing the stock market data to obtain useful information about the behaviour of the market [7].

1.2 Problem formulation

In this thesis we aim to investigate the possibility of finding a way to summarize and cluster share ownership data from the Swedish stock market. This is done by using a network approach to analyze the structure of the share ownership in order to find significant patterns in the data. In the present situation mainly the largest stock holders of each company are of interest on the market, but since most holders have small share amounts, a lot of potentially important information about the market stays unused. The goal in this thesis is therefore to be able to cluster the owners with small holdings in order to highlight and understand how the clustering can be used to obtain useful

(12)

information about the stock market. A desirable feature of the analysis is that it can be displayed in a clear and accessible way, and thus the clustering result should not be too excessive. It would also be preferable if the result could be visualized. Another goal is also that the clustering should be sustainable in the longer term so that it is possible to follow different clusters and their behaviour in time.

1.3 Outline of thesis

The thesis is structured as follows:

• Chapter 2 presents the field of networks and introduces how to detect communities in networks.

• Chapter 3 goes through the algorithm InfoMap that is used to find structures in the data.

• Chapter 4 provides a review of the Swedish stock market.

• Chapter 5 presents the dataset and explains the method that is used in the analysis.

• Chapter 6 contains the results obtained from the analysis.

• Chapter 7 gives a discussion of the results.

• Chapter 8 summarizes the outcome of the thesis in the conclusions.

(13)

Chapter 2 Networks

This chapter introduces the network methodology and gives examples of different kinds of networks and quantities to evaluate their properties. The chapter also treats community detection in networks and presents similarity measures.

2.1 Preliminaries

Our everyday lives are filled with different kinds of networks. Internet, World Wide Web and subway systems are all concrete examples, but networks can also be more vague such as networks of acquaintances, chemical reactions or chains of historical events [25]. Networks are everywhere in the world—they have always been—and they will always be. Despite this fact, the multidisciplinary field of networks and the modeling of systems as networks is a relatively new approach in science, which started to catch scientists’ interest in the last decade of the 20th century. This growth of network awareness and the increasing popularity of network analysis in the scientific litera- ture [8] is not only a result of the computational advances in data gathering, storage and processing technology of recent decades. It has also come to our understanding that real-world system are made up of a large number of entities, interacting in such a way that their collective behaviour is not only the sum of their individual behaviour—

i.e. nature is a complex system [22]. To analyze and understand such systems, network science suits very well.

An important feature of networks that describe complex systems is that they possess a significant amount of similar statistical and topological properties—regardless of the application domain. Although networks can be very different, many of their properties are common to networks of a wide variety of types [23]. Similar to the way a landscape can be simplified by a map, the topology of a real-world system can be described as a network by focusing on the connectivity pattern of its individual components [32]. So whether one wants to understand a technological, biological or social system, modeling it as a network could be a good way to start.

Figure 2.1 shows an example of a real-world network. The nodes in the network are the 43 unique winners of the prestigious soccer award Ballon d’Or, often referred to as the World Player of the Year award [2]. A link has been established between two players if they have represented the same club during the same season.

(14)

Stanley Matthews Alfredo Di Stefano

Raymond Kopa

Luis Suarez Omar Sivori

Josef Masopust Lev Yashin

Denis Law

Eusebio Bobby Charlton Florian Albert

George Best

Gianni Rivera

Gerd Muller Johan Cruyff

Franz Beckenbauer Oleg Blokhin

Alan Simonsen

Kevin Keegan

Karl-Heinz Rummenigge

Paolo Rossi

Michel Platini Igor Belanov Ruud Gullit

Marco Van Basten

Lothar Matthaus Jean-Pierre Papin

Roberto Baggio

Hristo Stoichkov

George Weah

Matthias Sammer

Ronaldo

Zinedine Zidane Rivaldo

Luis Figo

Michael Owen Pavel Nedved

Andrei Shevchenko

Ronaldinho

Fabio Cannavaro Kaka

Cristiano Ronaldo Lionel Messi

Pajek

Figure 2.1: A network with the 43 unique winners of the soccer award Ballon d’Or as nodes, connected with a link if they have played together in the same club during the same season. The network has been visualized with the program PAJEK.

2.2 Network structure

Mathematical representation of networks originates from graph theory, which dates back to mathematician Leonhard Euler and his solution of the Königsberg bridge problem in 1736 [17]. The term graph is usually used to present basic properties of networks in a more mathematical sense, but we will be consistent in using the term network, since it is a more adequate description capturing both the mathematical representation and the actual system. A network is a structure that consists of a set of objects, called nodes or vertices, connected by links or edges. The links between nodes can be either directed or undirected. In the case of a directed network the links can be denoted arcs. For example, if the nodes represent persons with e-mail addresses, a directed link (arc) can be established from one person to another if an e-mail is sent between them.

If the person receiving the e-mail replies the link is then considered undirected. Links can also be weighted, representing for example the connection speed of sending the mail or the distances between the nodes. Normally nodes are not allowed to have more than one link between pairs, and neither are links from a node to itself permitted. An example of a network consisting of 6 nodes and 8 undirected unweighted links can be seen in figure 2.2.

A measure of how many neighbours a node has is the degree, i.e. the number of

(15)

1 2

3

4

5

6 Node

Link

Figure 2.2: An example network with 6 nodes and 8 undirected and unweighted links.

links attached to a node. In the case of a directed network one has to discriminate between in-degree, the number of incoming arcs, and out-degree, the number of outgoing arcs. The probability distribution of these degrees over the whole network is called the degree distribution, P (k). Accordingly it is defined to be the fraction of nodes in the network with degree k.

2.3 Complex networks

A complex system is commonly defined as a system that consists of interacting components whose collective behaviour cannot be explained from the behaviour of the individual units alone [22]. The components may act according to rules that may change over time and that may not be easily understood. A frequent characteristic of all complex systems is that they display organization without any external organizing principle being applied and a central characteristic is adaptability [5]. An example of a complex system is intelligent life in general and the human brain in particular—we have knowl- edge about the structure and the composition of the brain, but the thoughts and actions of its holder can seldom be predicted. Other examples of complex systems include families, societies, ecological systems, the weather, economy, information systems and financial markets [25].

It can be a good idea to consider the difference between a complex and a somewhat complicated system. One can for example compare a modern smartphone with a flock of birds. Superficially the birds are all similar and the flock has far fewer members than the smartphone has parts, and it is therefore tempting to think that the smartphone is more complex than the flock of birds. However, the flock of migrating birds is an adaptable system—unlike the smartphone. The flock responds to changes in the environment and when flying, the rules of the flock are fluid since the head of formation often are changed. The smartphone on the other hand is not a complex system since all its parts have strictly defined roles and prescribed interactions [5].

A way of handling and analyzing complex systems is by using network theory. By studying the network one can study the underlying complex system itself. The study of complex networks has become more and more important because of its ability in understanding indirect effects of the systems.

(16)

2.3.1 Real-world networks

In order to understand the properties of real-world complex networks, different mathematical models of networks have been suggested. One of the first models was the uniform random graph modelpresented in 1959 by Erd˝os and Rényi [14]. The model starts with a fixed number of nodes and sets a link between each pair of nodes with equal probability until the desired number of links are reached. Although the uniform random graph model captured some properties of real-world networks, empirical mea- surements showed significant differences in the structures when the number of nodes and links were the same. For instance, the uniform graph model could not handle the property of a high so called clustering coefficient that often takes place in the real-world networks [41, 40]. The clustering coefficient measures the extent to which nodes in a network tend to cluster together, i.e. form groups. In other words, a high clustering coefficient means that the probability of the event that two given nodes are connected by a link is higher if these nodes have a common neighbour.

Furthermore, the distribution of edges is not only globally, but also locally inho- mogeneous, with high concentrations of edges within special groups of nodes, and low concentrations between these groups. This feature of real networks is called community structure[15]. However, the most important drawback of the uniform random graph model is the difference in the degree distribution compared to the real-world networks where the degree distribution often follows a power law. A power law degree distribu- tionimplies that most of the nodes in the network have a relatively low degree, while a few nodes¹have substantially higher degree, i.e. there is no typical scale in the network and such networks are often called scale-free networks. The power-law distribution of degrees k is given by

P (k) ∝ k^−γ,

where γ is a constant. Power laws appear in for example the cumulative distribution functions of the number of citations to papers and the number of hits received by web pages, with γ around 3 and 2 respectively [24].

The general result is that real-world networks are not random graphs, neither are they very regular. Their main properties include small average distance² between nodes, high clustering, power-law degree distributions [23] and in many cases they also reveal a complex behaviour [32].

2.4 Bipartite networks

A bipartite network, or a two-mode network, is a network whose nodes can be divided into two disjoint sets of top and bottom nodes. An example of a bipartite network is the Movie-Actor network, which consists of movies as top nodes, actors as bottom nodes and links representing whether the actor appeared in the film. Links between two actors or two films are not permitted, since nodes in the same set can’t have direct links in the bipartite case, in contrast to classical, unipartite, networks. A result of having two node sets is that bipartite networks are associated with two degree distributions; one for the top nodes and one for the bottom nodes.

Many real-world networks are naturally bipartite. Example of bipartite representation include social networks like the scientific collaboration network, where the two node sets consists of papers and authors, metabolic networks in biology, where the two

1Referred to as hubs.

2The number of edges traversed along the shortest paths between all possible pairs of network nodes.

(17)

types of nodes are reactions and metabolites, and information networks when looking at for example a word-document network, where one type of nodes is documents (web pages, e-mails, etc.) that link to the words they contain [21].

Given a bipartite network it is possible to transform it into two unipartite networks, one containing the bottom nodes and one with the top nodes. These one-mode projections can be obtained by connecting bottom nodes that are connected to the same top node or vice versa. Figure 2.3 shows an example of a bipartite network and its two unipartite one-mode projections. It is though important to notice that the one-mode projection approach disregards information about the structure within the original network [21]. One way of capturing more of the original structure is to make the projection weighted, i.e. giving each link between two bottom nodes in the projected network a weight equal to the number of top nodes they have in common, or vice versa. However this method still doesn’t capture all of the information in the original network [25].

A B C D E

1 2 3 4

1 2

3 4

A

B C

D E

Figure 2.3: A bipartite network with 4 top and 5 bottom nodes and the two corresponding unipartite one-mode projections.

2.5 Community detection in networks

A property that seems to be common to many networks is community structure. Com- munity structure can be seen as the division of network nodes into groups in which the network connections are dense, but between groups which connections are sparser [16]. Social networks are classical examples of networks with communities and the word community itself refers to a social context. People naturally tend to form groups within their work environment, family and friends. An example of a simple network with three communities can be seen in figure 2.4.

An important task in analyzing networks and understanding their structure is to be able to detect the communities. The aim of community detection in networks is to identify the communities and, if possible, their hierarchical organization—only by using the information encoded in the network topology. Community detection in large networks can provide valuable information as nodes belonging to a tight-knit community are more than likely to have other properties in common [15]. For example, the communities in the World Wide Web correspond to topics of interest and nowadays community information is considered to be used for improving search engines in order to provide better and more personalized results [11]. Moreover, the information dif-

(18)

Figure 2.4: A small network with three communities indicated by dashed circles. The internal links of communities are more dense than the between communities external links.

fusion and spreading mechanism in a network can be affected and determined by the community structures. Identifying the communities is hence a fundamental step not only for discovering what makes entities come together, but also for understanding the overall structural and functional properties of the whole network [12].

Detecting communities in networks has become a fundamental problem in network science. The human eye is an excellent tool for detecting community patterns in small networks, but for analyzing large networks another method is needed, and therefore numerous algorithms have been developed. In Ref. [20] a number of algorithms for detecting community structure are evaluated using a recently introduced class of benchmark graphs. The result of the analysis shows that the method called InfoMap, see Ref. [34, 33], displays the best performance on detecting the communities for both directed and undirected graphs—with the additional advantage of a relatively low computational complexity, which enables studies of large systems. For this reason InfoMap and its theoretical background will be further explored in the next chapter of this thesis.

A more mathematical term for methods in which large sets of data is grouped into communities of smaller sets of similar data is clustering or cluster analysis. Clustering is the task of assigning a set of objects into groups, called clusters, so that the objects in the same cluster are more similar to each other than to those in other clusters based on a predefined similarity measure. From now on the two concepts community detection and clustering will be used interchangeably.

2.5.1 Hierarchical clustering

The traditional method for detecting community structure in networks is hierarchical clustering[16]. Hierarchical clustering seeks to build a hierarchy of groups and the method is commonly divided in the two methods agglomerative and divisive. The agglomerative hierarchical clustering iteratively merges the two most similar objects, or clusters, until only one cluster containing all the objects remains while the divisive hierarchical clustering starts with one single cluster and works the opposite way. To distinguish which objects are most similar, a similarity measure based on the attributes of the objects must be defined, see section 2.6. An example of a tree diagram, called dendrogram, that is commonly used to illustrate the arrangement of the clusters pro- duced by hierarchical clustering, can be seen in figure 2.5.

One concern about agglomerative methods is that they tend to fail with some frequency to find the correct communities in networks where the community structure is

(19)

Figure 2.5: Dendrogram showing an example of a hierarchical clustering of 30 objects.

In the bottom of the figure all objects are assigned to their own cluster. Moving up- wards, similar clusters are merged together until only one cluster remains, containing all of the objects. The tree can be cut at a certain height to obtain the partition of the objects into a specified number of clusters.

known. This makes it difficult to place much trust in their performance in other cases [26].

2.6 Similarity measures

To detect communities and cluster nodes in a network, a similarity or distance³measure can be used. To find an appropriate measure it is important to clarify when two nodes are considered similar, since different measures accounts for different properties of the network structure. In general the measure finds the degree of closeness, or separation, between two nodes and represents it as a single numeric value. The properties of a node could be represented in a vector, which for example could contain the node’s connections to other nodes in the networks, or the node’s position in space.

2.6.1 Euclidean distance

The Euclidean distance measures the distance between two points in any dimension of space. The distance is a standard metric for geometrical problems. The Euclidean distance between two points represented by vectors x and y is given by

Euc(x, y) = s

X

k

(x_k− y_k)². (2.1)

Euclidean distance is actually a dissimilarity measure since it is larger for vectors that differ more, and zero if the vectors are identical.

2.6.2 Cosine similarity

Cosine similarity is a measure of similarity between two vectors by measuring the cosine of the angle between them [38]. The cosine similarity of vectors x and y is

3Note the difference between distance and similarity. A normalized measure of 1 indicates perfect similarity but maximum distance. A value of 0 indicates no similarity but minimum separation in distance.

(20)

given by

CosSim(x, y) = hx, yi

||x|| · ||y||. (2.2)

The cosine similarity is bounded between 0 and 1 if the elements of vectors x and y are non-negative. A value of 0 indicates that the vectors have no non-zero elements in common, and a value of 1 indicates that the vectors have all non-zero entries in the vector in common.

An important property of the cosine similarity is its independence of length. This means that vectors of the same composition but different totals are treated identically e.g. CosSim(x, y) = CosSim(x, 2y) since it is the direction of the vectors that is of importance. The measure is however not invariant to shifts. If vector x was shifted to x + 1, the cosine similarity of x and y would change.

(21)

Chapter 3 InfoMap

InfoMap, see Ref. [34, 33], is a community-detection algorithm that makes use of the duality between the problem of compressing a dataset, and the problem of detecting and extracting significant patterns or structures within those data. This duality is explored in the statistical field of minimum description length statistics, or MDL. The basic idea of MDL is that the more regularities in the data, the more we can compress it [18].

For a network we can think of communities as regularities—so by finding these, the representation of the network can be compressed.

To analyze a given network we want to use the information concealed in the network representation. In order to capture this information and thereby better understand the network, InfoMap focuses on how the structure of the network constrains the flow of information occurring on it. The goal is therefore to setup a system that measures how much we can compress the flow on the network depending on how we partition the network. The best partition is the one that compresses the flow on the network the most. To setup this system we start by reviewing basic concepts of information theory before we explain in more detail how InfoMap works.

3.1 Information theory

Information theory involves the quantification of information and therefore has appli- cations in many different areas. A central question in the theory is to develop a usable measure of the information acquired when observing the occurrence of an event having probability p. The first simplification will be to ignore any particular features of the event, and only observe whether or not it happened. Thus we will think of an event as the observance of a symbol whose probability of occurring is p. The information will therefore be defined in terms of the probability p [9].

We want our information measure I(p) to have several properties. First of all, information should be a non-negative quantity, i.e. I(p) 6= 0. Secondly, if an event always occur (p = 1), we get no information from the occurrence of the event, i.e. I(1) = 0.

Thirdly, if two independent events occur, then the information we get from observing the events is the sum of the two informations, i.e. I(p1· p2) = I(p₁) + I(p₂). Finally, we want the information measure to be a continuous function of the probability [36].

From the given axioms, the following relations can be derived:

• I(pⁿ) = I(p · p · . . . · p

| {z }

n

) = I(p) + I(p) + · · · + I(p) = n · I(p)

(22)

• I(p) = I((p^m¹)^m) = m · I(p^m¹) =⇒ I(p^m¹) = _m¹ · I(p)

In general we thus get that I(p^mⁿ) = _mⁿ · I(p). By continuity, for 0 < p ≤ 1, and a real number a > 0 the expression becomes

I(p^a) = a · I(p).

The only function satisfying the equality above is the logarithm, and hence we can derive the information acquired when observing the occurrence of an event having probability p

I(p) = − log_b(p) = log_b 1 p

, (3.1)

where b is a positive constant. The base b determines the units that are used, and it is commonly chosen to be 2, resulting in bits as units for the information measures.

Unless we want to emphasize the units, we don’t have to bother specifying the base for the logarithm, and only write log(p). Typically, and from now on, we will think in terms of log₂(p).

3.1.1 Entropy

In information theory, the entropy is a measure of the uncertainty associated with a random variable. Suppose that we have k symbols (a1, a₂, . . . , a_k) and a source providing us with a stream of these symbols with respective probabilities (p1, p₂, . . . , p_k).

Assuming that the symbols are emitted independently by the source, it is interesting to find the average amount of information received from each symbol in the stream. If symbol aiis observed, then I(pi) = log(_p¹

i) information is gathered from that particular observation. In a long run of observations, say N , approximately N · pioccurrences of symbol aiwill occur. Thus, in the N observations, we will get the total information

IN =

k

X

i=1

(N · pi) · log 1 pi

. (3.2)

The average information received per symbol hence becomes IN

N = 1

N

k

X

i=1

(N · pi) · log 1 pi

=

k

X

i=1

pi· log 1 pi

.

As we have observed, the information has strictly been defined in terms of the probabilities of the events. Looking at the provided symbol as a random variable X—

with sample space (a1, a₂, . . . , a_k) and probability distribution P = (p1, p₂, . . . , p_k)—

we define the entropy H(X) of the random variable X as [10]

H(X) =

k

X

i=1

pi· log 1 pi

, (3.3)

where each pi ≥ 0 andPk

i=1pi = 1. An important property of entropy is that it is maximized when all the symbols are equally probable.

(23)

3.1.2 Huffman coding

In order to model a trajectory on a network based on an information theoretic approach we need a smart way to name nodes. Consider for example a random walk on a network consisting of 5 nodes. As a first approach we could assign binary codes to each node, requiring log₂(5) = 3 bits in each name to uniquely label the nodes. A 39-step walk on the network can therefore be described in 3 · 39 = 117 bits. Suppose now that we know the visiting frequencies of the nodes in our presumptive walk. In this case a straightforward method of giving names to nodes is instead to use Huffman coding [19]. Huffman coding is an entropy encoding algorithm that assigns short codewords to common events with high probability and longer codewords to rare ones. Suppose for example that we have the node visit frequencies of the five nodes, which we denote A–E, given in the table in figure 3.1a. We can see that the visit frequency, i.e. the probability, of node A is much higher than the probability of the other nodes—and therefore a shorter code is used for A. A Huffman encoding based on the frequencies can be computed by first creating a tree of nodes, starting with the two nodes having the lowest frequencies, denoted children nodes, and create a so called parent node from them, having the sum of the children’s frequencies. The two branches of the tree are then assigned with code 0 and 1 and the procedure is repeated including the parent node and removing the children. When all nodes have been considered and the tree is complete, the code of a node is found by starting at the top of the tree and following the branches down to the node while collecting the binaries. The procedure is illustrated in figure 3.1b and for node E we can for example see that by following the tree from the top down to the node, Huffman code 111 is obtained.

The Huffman coding is also prefix-free, which means that no code is a prefix to any other code. In this way the codewords can be uniquely decoded even if codes are sent after each other as a long signal, as long as the coding table is sent before the data.

To sum up—if prior statistics are known about the system that we want to code—then Huffman coding is a good method of compressing data.

Node Count Code # of bits

A 15 0 15

B 7 100 21

C 6 101 18

D 6 110 18

E 5 111 15

TOTAL: 87

(a) Node visit frequencies, the nodes’ unique code numbers and the total number of bits for their occurrences using the Huffman code.

B(7) C(6) D(6) E(5)

A(15)

P4(39)

P3(24)

P2(13) P1(11)

0 1

0 1 0 1

(b) A Huffman tree generated from the frequencies in the table to the left. The value in brackets indicates the total frequency count of a node.

Figure 3.1: Example of Huffman coding.

3.1.3 Shannon’s source coding theorem

Shannon’s coding theorem determines the limits of possible data compression, i.e it gives a lower limit for the length of code words describing the data. The requirements are that the code should be uniquely decodable—it should be possible to parse any

(24)

codeword unambiguously into the corresponding data¹. The theorem states that the average code-word length L for a source of entropy H(X) is bounded as

L ≥ H(X), (3.4)

which means that the average length of a codeword can be no less than the entropy of the random variable itself [36]. The theorem states that when you use N codewords to describe the N states of a random variable X, which occurs with frequencies pi, the average length of a codeword can be no less than the entropy, H(X), of the random variable itself.

3.2 Random walks on networks

To model the flow occurring on a network, InfoMap makes use of a random walker and follows his trajectory on the network structure. The walker starts at a randomly chosen node and moves in the next time step through one of the node’s links to a neighbouring node. The probability of choosing a link is proportional to the relative weight of the link. If the network is directed there is a chance that the random walker gets stuck when for instance a node lacks outgoing links. For this reason a small teleportation probability, τ , is introduced, meaning that the walker at every step with probability τ jumps to a randomly chosen node anywhere in the network [34]. In each time step the procedure is repeated and the walker moves on the network, creating a trajectory of visited nodes. In order to save this trajectory as a coherent data stream, nodes must first be given unique code names.

The trajectory of the random walker provides important information about the structure of the network. The ergodic node visit frequencies of the walker specifies the statistical probabilities of being at a certain node. This information is used to match the length of codewords to the frequencies of their use by giving frequently visited nodes shorter names according to Huffman coding, explained in section 3.1.2. In this way we have compressed the data of the trajectory by finding regularities in the network and a first step in finding communities with InfoMap is accomplished.

3.3 Two-level description

Real-world networks often display community structure. In the sense of a random walker on a network, this can be seen as a set of regions in which the walker tends to spend much time and between which movements are more rare. This regional structure can be used in order to minimize the trajectory code of visited nodes by giving each region, called module, its own codebook². In this way the network is divided into two levels of description, one module level and one node level within each module. A dis- tinction between within-module movements and between-module movements therefore has to be made.

To describe the network, unique Huffman code names are maintained for the modules, and within these modules names are reused for the individual nodes. In this

1For instance, coding nodes c1, c2, c3 and c4 as c1 = 0, c2 = 01, c3 = 11 and c4 = 00 is not a uniquely decodable code. The codeword 0011 could be either c3c4or c1c1c3.

2This approach can for instance be compared to the dialing codes used in Sweden. People in a certain region are more likely to call each other and hence only have to dial the telephone number without the dialing code. In this way the same telephone numbers can be reused in different dialing code areas and the average length of telephone numbers dialed becomes shorter than if everyone in Sweden would have a unique one.

(25)

way two codebooks are needed—first a module codebook, which specifies the node names of each module—and a second index codebook, which specifies which module codebook to be used. A special codeword, the exit code, is chosen as part of the within-module coding and indicates that the walk is leaving the current module. The exit code is therefore always followed by the code of the new module into which the walk is moving. Hence the method of describing the network in two-levels introduces extra codewords both when the random walker enters and exits modules.

The two-level description provides the problem of finding a balance where the modules are small enough to reduce the average node codeword length—but large enough and divided in such a way that the random walker statistically stays there for a long time before leaving, so that the cost of using codewords for entering and exiting modules is not too high. This is the optimization problem facing InfoMap, and in order to understand how it is solved, we now go into a little more detail about the theory behind by dividing it into the two levels of nodes and modules.

3.3.1 Node level

Let pαdenote the probability of the random walker being at a certain node α. For an outgoing link from α to node β, having weight wα,β, we can calculate the probability that the random walker follows this link in a given step. For this to happen the walker must not teleport, which has probability (1 − τ ), where τ is the probability of teleportation. The total probability of a given step, q_αyβ, between node α and β in a module hence becomes

q_αyβ = p_α· w_α,β· (1 − τ ). (3.5) The node visit frequency of node α, with the contribution from random teleportation excluded, is then the sum of the probabilities of moving to the node

pα=X

β

q_βyα. (3.6)

The total probability of within-module movements in module i then becomes the sum of the probabilities over all nodes in the module, i.e.P

α∈ip_α.

3.3.2 Module level

With an initial partitioning of a network containing n nodes with probabilities pα, for α = 1, . . . , n, it is straightforward to calculate the module visit frequency of module i.

This is the sumP

α∈ipαof the probabilities for all nodes within the module.

Exiting a module can happen in two ways—either by teleportation to another module³ or by following a link to another module. The probability of exiting a specific module i, q_iy, having ninodes is therefore given by

q_iy= τ ·n − ni

n ·X

α∈i

pα+ (1 − τ ) ·X

α∈i

X

β /∈i

pαwα,β, (3.7) where accordingly the first term describes the probability of teleportation to a node outside module i from every node α in i, and the second term describes the total probability of not teleporting but instead moving from module i through a link from node α in module i to node β in another module.

3Note that the probability of randomly choosing a node outside module i, having ninodes, isⁿ⁻ⁿ_nⁱ.

(26)

The per step probability that the random walker switches modules, q_y, then becomes the sum of the exit probabilities for all modules

q_y=

m

X

i=1

q_iy. (3.8)

3.3.3 Summary

We have now derived expressions both for the probability of movements between nodes in modules and the probability of movements between the modules themselves, and hence it is possible to compute the entropies for the respective movements. Before applying Shannon’s source coding theorem to the derived probabilities there is however one more thing to consider—the exit codewords.

In order to also adjust the length of the exit codewords to the frequency of their use, these codewords are encoded together with the within-module codewords. Since the exit codewords are necessary to separate movements within-modules from between- modules, the exit probability of module i, q_iy, is included in the probability of the within-module movements. By doing this we can compute the total within-module movement probability of module i, pⁱ, which is the probability of exiting the module, q_iy, according to Eq. (3.7), plus the module visit frequency of module i,

pⁱ= q_iy+X

α∈i

pα. (3.9)

By using the above probability and calculating the entropy according to Eq. (3.3), Shannon’s coding theorem now gives the limits of possible data compression for coding H(Pⁱ), the entropy of movements within module i.

H(Pⁱ) = q_iy q_iy+P

β∈ip_β log q_iy q_iy+P

β∈ip_β

!

+X

α∈i

pα

q_iy+P

β∈ipβ

log pα

q_iy+P

β∈ipβ

! .

(3.10)

In the same way H(Q), the entropy for the movements between modules, can be calculated.

H(Q) =

m

X

i=1

q_iy Pm

j=1q_jy log q_iy Pm

j=1q_jy

!

, (3.11)

which is the lower limit of the average length of a codeword used to name a module. In analogy with within-module movements we here have used Shannon’s source coding theorem and treated the modules as m states of a random variable X that occur with frequencies q_iy/Pm

j=1q_jy.

3.4 Map equation

We now have derived the expressions to display the core in the InfoMap algorithm—

the map equation. For a network partition M of n nodes into m modules the map equation, L(M ), gives the average number of bits per step that it takes to describe an

(27)

infinite random walk on the network. By collecting the terms from both the within- and between-module movements the map equation reads

L(M) = q_yH(Q) +

m

X

i=1

pⁱH(Pⁱ), (3.12)

where the first term gives the average number of bits necessary to describe movement between modules, and the second term gives the average number of bits necessary to describe movements within modules. In the first term q_y is the probability that the random walker switches modules on any given step and H(Q) is the entropy of the module names. In the second term H(Pⁱ) is the entropy of the within-module movements including the exit code and pⁱis the fraction of within-module movements that occur in module i plus the probability of exiting module i. To find the network partition that minimizes the map equation InfoMap uses a combination of two methods⁴.

3.4.1 Hierarchical map equation

The map equation tries to find community structure by considering a two-level description of the flow on the network. The organization of real-world networks is however rarely limited to only two levels—social and biological systems are for example often characterized by hierarchical organization [29]. In order to account for these hierarchical structures in networks, a generalized coding structure based on the two-level map equation has been developed, called the hierarchical map equation, see Ref. [35] for a detailed description. In the hierarchical map equation the constraint of a two-level description is released and an arbitrary number of submodules is permitted.

4Greedy search and simulated annealing.

(28)

(29)

Chapter 4 The Swedish stock market

Financial markets are examples of complex systems, i.e. they generally consists of a number of so called agents (traders), interacting in such a way that their collective behaviour is not a simple combination of their individual behaviour [22]. Although every single one of the agents conducts his activities with the aim of realizing the highest possible profit—which he tries to achieve by interacting with other agents through the selling and buying of financial assets at different times—the response of the market is often not predictable. This chapter presents a review of the financial market concerned in this thesis—the Swedish stock market system.

4.1 Stock companies and shares

A joint stock company, referred to as a stock company hereinafter, is a business entity where the owners themselves are not financially responsible for the company. Swedish stock companies can be divided into the categories of private stock companies or public stock companies¹. A private stock company must have at least 50 000 SEK in equity and public companies at least 500 000 SEK. It shall be stated in the stock company’s statutes whether the company is public and it is therefore not only sufficient that the company has more than 500 000 SEK in capital to be public. In Sweden there are about one thousand companies that are public stock companies and these are entitled to public offering of shares and listing on the stock exchange [4]. In order to identify shares on the market, each company’s share is assigned an unique ISIN code (International Securities Identification Number). Only public stock companies may trade shares on a Swedish or foreign stock exchange or another organized marketplace. In Sweden and other Scandinavian countries, there is a regime with different voting rights on shares, called A- and B-shares, where an A-share entitles the holder to more² votes in the company than a B-share. The trading in the two different shares are conducted separately and hence both shares have their own ISIN code. Other share types, such as preference shares³, also exist but not at all in the same extent as A- and B-shares.

All stock companies have an obligation to keep a share register which shows who are the owners of the shares in the company. Keeping a register of shareholders in a

1Also called publicly traded companies.

2The B-shares often equals 1/10 (one-tenth as many votes) of an A-share, but also older 1/1000 (one thousandth of the number of votes) as influence degree occurs.

3A share type that may have priority over other shares when it comes to dividend and liquidation.

(30)

listed company can mean considerable work since the changes that occur in the share register sometimes are extensive. On average, approximately 80 percent of all shares changes owner during one year, but the changes of the major owners are usually smaller [42].

In the register, shares are generally registered in the owner’s name. The only excep- tion is for nominee registered shares⁴, which are registered in the name of the nominee.

The share register is public and anyone who requests it is, provided payment of admin- istrative costs, entitled a printout of the current register shareholders or parts of it. This printout shall include the shareholders with more than 500 shares in the company [4].

4.1.1 Securities institutions and nominees

A securities institution⁵, also called stockbroker, can be either a bank or a firm (such as Internet brokers) that have a license from the Swedish Financial Supervisory to conduct trading on behalf of the customer [6]. A securities institution acts like the intermediary between buyers and sellers, i.e. in their own name trade securities on behalf of clients. There is nothing that prevents individuals and businesses from buying and selling shares with each other without the intervention of a stockbroker—however the difficulty is often to find the counterpart on the stock market and the securities institutions have these preconditions.

Another type of exchange trading on behalf of a customer is conducted through funds. A fund is a major investment in shares, obligations, options or other securities.

Instead of trading in individual stocks, there is an option to purchase fund shares from a fund nominee. The fund nominee invests the fund’s money in shares of different classes and manages the administration of the fund which means that the nominee is registered as shareholder in the company’s share register. On the Swedish market there are a little over 80 fund management companies which together with foreign fund management companies offer savings in more than 5 000 funds.

4.1.2 Outsourcing share registration

A stock company can outsource the share registration by leaving the management of the share register to a so-called central securities depository. There are currently only one central securities depository in Sweden, namely Euroclear Sweden AB [31]. There are primarily public companies such as listed companies and other companies that already have or plan to have many shareholders that have opted to be Euroclear-registered companies. For the companies that are not Euroclear-registered, it is the board that is responsible for managing the share register.

4.2 Euroclear

Euroclear Sweden AB is a central securities depository which keeps a record of the vast majority of equity and debt securities traded on the financial markets in Sweden. The company also carries out clearing and settlement of transactions in Swedish shares and interest-bearing securities and the business is based entirely on automatic processing.

The company is a member of the Euroclear Group, the world’s largest provider of domestic and cross border transactions of shares, bonds, derivatives and funds [39].

4Förvaltarregistrerad aktiein Swedish.

5Värdepappersinstitutin Swedish.

(31)

Each private shareholder in Sweden has a personal account at Euroclear and when there has been a change in the account—when shares are purchased or sold—the shareholder receives a compilation of the holdings on their securities account [42].

A central part of Euroclear’s work with the registration process is to manage the share registers of the registered companies. To become a Euroclear-registered company, the company has to reach an agreement with Euroclear and a so called depository agent institute, i.e. a particularly accepted bank or brokerage [1]. Euroclear charge for their services and the registered companies must also in some cases provide guaran- tees to Euroclear. In December 2011, the number of Euroclear-registered companies in Sweden was around one thousand [39].

4.2.1 Brief history

The Swedish Securities Register Centre, Värdepapperscentralen (VPC), was founded in 1971 with the task of dealing with the share registers of Swedish companies, execute instructions on dividends and issue stock certificates. In 1989 the processing of securities changed significantly when physical share certificates ceased to exist in Sweden.

VPC was therefore instead given the responsibility for the new account based system of securities and settlement of security transactions. In 2008 the Belgian Euroclear Group acquired all shares in NCSD (Nordic Central Securities Depository, Scandinavia’s securities depository) which in turn owned all shares in VPC. The Swedish central securities depository had thus acquired a new, foreign owner, and in 2009 VPC’s name was changed to Euroclear Sweden. The company continues to be a Swedish-registered company governed by Swedish law and thereby under the supervision of Finansinspek- tionen, the Swedish Financial Supervisory Authority [31].

4.3 Trading and the stock exchange

Stock companies can record their shares on a stock exchange to make them tradeable on a regulated market. A stock market is a company which is licensed by Finansinspek- tionen to run one or more so called regulated markets for securities trading. In order to be listed on a stock exchange, stock companies often must complete a comprehensive review. In Sweden, stocks are traded on the regulated market places at NASDAQ OMX Stockholm AB (NASDAQ OMX) and Nordic Growth Market NGM AB (NGM). Most of the trading, both in number and sales, takes place primarily on NASDAQ OMX where most of the companies are listed [3].

4.3.1 Stockholm Stock Exchange

Nasdaq OMX Nordic Stockholm, often called Stockholm Stock Exchange, is a marketplace for trading securities. In addition to shares in various Swedish companies, also other types of securities including bonds, warrants and options are traded. All listed companies sign an agreement with the Stockholm Stock Exchange and thereby agree to follow certain rules regarding for example accounting and information. There are currently more than 500 Swedish companies whose shares are listed on the stock market and about half of these are listed on the Stockholm Stock Exchange [30].

(32)

4.3.2 Other listings

Companies can seek new share capital without this being arranged through a listing on the Stockholm Stock Exchange. These companies may want to seek equity or venture capital through other channels. Therefore there is a need to create a trade in the shares of these companies even if they do not meet the requirements of the Stockholm Stock Exchange [42]. For this reason an exchange company or a securities company may be authorized to operate as a so called trading platform. The companies whose shares are traded on a trading platform has simplified regulations to follow and thus also smaller companies can be included. In Sweden, share trading is conducted on the trading platforms First North⁶, Nordic MTF⁷, Burgundy and AktieTorget. Swedish shares are also traded on other European trading platforms [3].

6Operated by NASDAQ OMX.

7Powered by NGM.

(33)

Chapter 5 Method and dataset

This chapter presents the structure of the data considered in this thesis. The chapter also explains the method that is used to analyze the dataset.

5.1 Dataset

The dataset consists primarily of the share amounts for the owners in all the Euroclear- registered companies in Sweden. Thus, the dataset consists of the ownership on the Swedish stock market and may consist of both individuals, legal persons and corporations from both Sweden and other countries. The dataset is provided by Euroclear Sweden AB and extracted from the quarterly share register reports between 2009 and 2011, with a total of 13 dates. Data about the share amount of companies and their corresponding share ISIN codes are given separately for all dates. The ownership dataset is formatted in the way seen in table 5.1.

Table 5.1: Example of the ownership dataset appearance.

98602188552D190337SESE0000123456000000000001230 89906222188D252169SESE0000215736000000000000020 00202205876F467534SESE0000198675000000034248978

.. .

The meaning of the digits on each row in the data is explained below:

9

|{z}

Class figure

8602188552

| {z }

Personal ID number

D

|{z}

Share registration

type

1

|{z}

Account type

90337SE

| {z }

Zip code &

country ID

SE0000123456

| {z }

Share ISIN number

000000000001230

| {z }

Number of shares

The class figure designates if the holder is a company, e.g. a stockbroker, or another type of corporation, and in these cases the personal identification number is replaced by a corporate identification number. Owners lacking a Swedish personal or corporate identification number can also be differentiated by the century figure.

(34)

5.1.1 Share prices

Additional data also includes a list of share prices for companies listed on the Stock- holm stock exchange, from the second and last quarter of 2011. This dataset was obtained from the Swedish Central Statistics Office, Statistiska Centralbyrån (SCB).

The list of data states the share prices at closing time, i.e. the price of the latest sold share on the last trading day¹ of the quarter. Altogether the list contains share prices for over 500 companies.

5.1.2 Data for analysis

In this thesis we will consider the holdings of individuals having a Swedish personal identification number². The reason for this is because this branch of holders, unlike corporations, are hard to naturally divide and categorize into groups, although they in number constitute a major part of the market. Most of these individuals hold small share amounts and are therefore not of interest for the companies, but another reason for this is also that the group often contains a large number of individuals, which makes the group hard to understand and extract useful information from. Furthermore, we will only consider the holdings in companies with price information. This means that it is primarily the companies listed on the Stockholm stock exchange that are included. This is desirable since more information is obtained, and also because only a minor part of the holders are included among the rest of the companies. Overall this means that the data concerned in the analysis is taken from the second and last quarter of 2011, at both dates containing around 1.7 million individuals holding shares in roughly 500 different ISIN codes³.

5.2 Aim of analysis

The primary aim of the data analysis is to investigate if it is possible to cluster owners based on their holdings. The clustering result should be used in order to summarize and highlight patterns in the overall share ownership structure. A desirable feature of the clustering is that it should be sustainable over time, i.e. a cluster occurring at one date should not disappear in the following date, unless there are strong reasons for this. Moreover, another goal of the clustering is that it should be displayed in a clear and accessible way and thus not be too excessive. The secondary aim of the analysis, arising if the primary clustering is possible, is to try to find out of how the obtained clusters may change their holdings in companies over time, in order to understand their trading behaviour.

5.3 A network approach

To analyze the dataset we want to model it as a network with individuals as nodes. In this section we present three different network approaches and how the links are created in each case. For all approaches, as a starting point we will look at the collection

1Nasdaq OMX Nordic Stockholm is usually open business hours and closed on holidays.

2This corresponds to individuals having century figure 8, 9 or 0.

3Note that a company can have more than one ISIN code associated with it. This is due to the fact that A- and B-shares, and other potential share types in a company, must have different ISIN codes.

(35)

of share holdings for each individual—these holdings will be referred to as the individual’s portfolio. In all approaches we will consider the portfolio of an individual as a vector where a nonzero element at index i represents the individual’s possession value⁴ in share i. Each share thereby corresponds to a dimension in the resulting vector space.

For instance, looking at a total of 4 shares, the portfolio vector p of an individual having 5 shares in share 1 and 10 shares in share 3, can be expressed as p = (5, 0, 10, 0).

The portfolio vectors are used to compute a cosine similarity measure between individuals. If the similarity value of two individuals is nonzero, an undirected link with weight according to the calculated similarity measure value is established between them. By the properties of the cosine similarity measure, if two individuals do not have a common holding in at least one share, the similarity measure value becomes zero, independent of possession type, and in these cases no link is established between them.

If two individuals hold shares in only one common share, then their similarity measure value becomes one, independent of possession type. The similarity measure is however affected by the possession values of the portfolio if none of the two previous cases occur, and these possession values are created differently for the three approaches.

5.3.1 Approach 1: Binary ownership

In this first approach the portfolio vector is considered binary, meaning that a holding in share i is represented by a value of 1 at element i, regardless of the holding size. A zero at element i in the vector indicates that the individual has no possession in share i.

5.3.2 Approach 2: Percentage of private shares

In this approach a nonzero element in the portfolio vector at index i represents the individual’s percentage of private shares in share i. If an individual for example holds 50 shares in share i with a total of 100 private shares, element i of the portfolio vector will be 0.5. Possession values in the vector will hence range between 0 and 1, where 0 at element i means that the individual holds no private shares in share i, and 1 means that the individual owns all the private shares in share i.

5.3.3 Approach 3: Holding value

In the third approach the elements of the portfolio vector represents the holding values of the individuals, i.e the number of shares times the price of one share. This means that the possession values for different elements can vary depending on share amount and share price. As before, a zero at element i indicates that the individual holds no shares in share i.

5.4 Analysis execution

In order to analyze the network structure and search for communities we use the In- foMap algorithm, described in Chapter 3. The reason for choosing this analysis method is because of its impressive results in a comparative analysis of different cluster algorithms, being the method recommended by the authors, see Ref. [20]. To be able to find the best representation of the network structure we use the algorithm with the hierarchical method. The hierarchical method compares the result with the ordinary two-level

4This can for example be the number of shares, the percentage in the company, etc.

Network analysis of the share ownership structure on the Swedish stock market

Network analysis of the share ownership structure on the Swedish stock market

Ludvig Bohlin

Network analysis of the share ownership structure on the

Swedish stock market

Ludvig Bohlin

Integrated Science Lab Department of Physics

Umeå University

June 21, 2012

Abstract

Sammanfattning

Preface

Contents

Chapter 1

Introduction

1.1 Background

1.2 Problem formulation

1.3 Outline of thesis

Chapter 2

Networks

2.1 Preliminaries

2.2 Network structure

2.3 Complex networks

2.3.1 Real-world networks

2.4 Bipartite networks

2.5 Community detection in networks

2.5.1 Hierarchical clustering

2.6 Similarity measures

2.6.1 Euclidean distance

2.6.2 Cosine similarity

Chapter 3

InfoMap

3.1 Information theory

3.1.1 Entropy

3.1.2 Huffman coding

3.1.3 Shannon’s source coding theorem

3.2 Random walks on networks

3.3 Two-level description

3.3.1 Node level

3.3.2 Module level

3.3.3 Summary

3.4 Map equation

3.4.1 Hierarchical map equation

Chapter 4

The Swedish stock market

4.1 Stock companies and shares

4.1.1 Securities institutions and nominees

4.1.2 Outsourcing share registration

4.2 Euroclear

4.2.1 Brief history

4.3 Trading and the stock exchange

4.3.1 Stockholm Stock Exchange

4.3.2 Other listings

Chapter 5

Method and dataset

5.1 Dataset

5.1.1 Share prices

5.1.2 Data for analysis

5.2 Aim of analysis

5.3 A network approach

5.3.1 Approach 1: Binary ownership

5.3.2 Approach 2: Percentage of private shares

5.3.3 Approach 3: Holding value

5.4 Analysis execution