Mapping flows on bipartite networks

(1)

Mapping flows on bipartite networks

Christopher Blöcker ^*and Martin Rosvall ^†

Integrated Science Laboratory, Department of Physics, Umeå University, SE-901 87 Umeå, Sweden

(Received 3 July 2020; accepted 10 October 2020; published 11 November 2020)

Mapping network flows provides insight into the organization of networks, but even though many real networks are bipartite, no method for mapping flows takes advantage of the bipartite structure. What do we miss by discarding this information and how can we use it to understand the structure of bipartite networks better? The map equation models network flows with a random walk and exploits the information-theoretic duality between compression and finding regularities to detect communities in networks. However, it does not use the fact that random walks in bipartite networks alternate between node types, information worth 1 bit. To make some or all of this information available to the map equation, we developed a coding scheme that remembers node types at different rates. We explored the community landscape of bipartite real-world networks from no node-type information to full node-type information and found that using node types at a higher rate generally leads to deeper community hierarchies and a higher resolution. The corresponding compression of network flows exceeds the amount of extra information provided. Consequently, taking advantage of the bipartite structure increases the resolution and reveals more network regularities.

DOI:10.1103/PhysRevE.102.052305

I. INTRODUCTION

Many networks are bipartite [1–3]. They model interactions between entities of different types, such as users watching movies, documents containing words, and animals eating plants. Bipartite networks can also represent many- body interactions in hypergraphs, such as authors writing papers, proteins forming complexes, and people attending meetings. Studying these networks with the naked eye is often infeasible because of their size and complexity. Therefore, to carry out further analysis, we must simplify them. We need to find coarse-grained descriptions that highlight their community structure [4].

Most community-detection methods are developed for unipartite networks but can be used for bipartite networks as they are, either by running them on unipartite projections or by applying them directly to bipartite networks [5,6]. How- ever, both these approaches have limitations. First, unipartite projections of bipartite networks cannot preserve all the information that is encoded in the bipartite network such that significant structure is lost [2]. Second, applying unipartite methods directly to bipartite networks ignores the regularities of bipartite networks and does not take into account the fact that links only connect nodes of different types [7]. What do

*christopher.blocker@umu.se

†martin.rosvall@umu.se

Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI. Funded byBibsam.

we miss by discarding this node-type information? How can we use it to understand the structure of bipartite networks better?

To explore the value of using bipartite information in community detection, we study the flow-based community- detection method Infomap [8], which uses an information- theoretic objective function, known as the map equation [9], to exploit the duality between compression and finding regularities in data. The map equation models network flows with random walks and relates the quality of a network partition to how well it compresses a modular description of the random walks. Modules with long flow persistence, such as cliques or clique-like groups, achieve the best compression. To derive a coding scheme, the map equation uses a hierarchical code that reflects the structure of the network partition. However, this coding scheme is designed for unipartite networks and assumes that any pair of nodes can be connected and visited one after the other; it does not take advantage of the structural constraints in bipartite networks where links only connect nodes of different types and random walks must alternate between them. Consequently, the map equation disregards bipartite information and provides suboptimal compression.

To address these issues, we developed a coding scheme that uses node-type information at different and adjustable rates. For a node-type remembering rate of zero, we recover the standard map equation; a remembering rate of one leads to a fully bipartite map equation and higher compression.

Through intermediate rates, we can analyze how the community landscape changes with available node-type information.

We implemented the bipartite coding scheme in Infomap [10]

and explored the community landscape of real-world networks from different domains.

In networks with community structure, we can compress flows beyond the extra information we make available through

(2)

the coding scheme. When we describe a network with all its nodes in one module, our coding scheme improves the compression by an amount equal to the entropy of the rate at which node types are used. In hierarchical partitions, the compression improves proportionally to the available node- type information. Generally, exploiting node types at higher rates increases the resolution and leads to deeper community structures with more and smaller modules, thus revealing more network regularities.

II. THE MAP EQUATION FRAMEWORK

To illustrate the duality between compression and finding regularities in network data, consider a communication game where the sender uses code words to update the receiver about the position of a random walker in a network. We assume that the sender and receiver remember the current module but not the current node of the random walker. The question is as follows: How can we devise a modular coding scheme to minimize the average per-step description length, which we refer to as the code length?

We start with all the nodes in one module and assign unique code words to the nodes based on their ergodic visit rates.

The sender needs to communicate exactly one code word per random-walker step to the receiver with this one-level approach. According to Shannon’s source-coding theorem [11], the lower bound for the code length is the entropy of the node visit rates.

If the network has a community structure, we can achieve a lower code length with a two-level coding scheme: We partition the nodes into modules and define a separate code book for each module. This coding scheme uses unique code words within modules, allowing nodes in different modules to reuse

short code words. To describe transitions between modules for a uniquely decodable code, we introduce an index level code book that assigns code words to modules and add exit code words to each module code book. We can generalize this approach and reduce the code length further with a recursive code structure in multiple levels.

With a two-level approach, the sender communicates either one or three code words per random walker step. For steps within a module, the sender uses one code word from the cur- rent module code book. For transitions between modules, the sender communicates three code words from three different code books:

(i) the exit code word of the current module code book, (ii) the entry code word of the new module from the index level code book, and

(iii) a node visit code word from the new module code book.

For a small example network [Fig.1(a)], we illustrate the code-book structure for a two-level partition according to the map equation [Fig.1(b)].

The map equation calculates the code length L for a given partition M as the average of the module and index level code lengths, weighted by the fraction of time a random walker uses each of the corresponding code books in the limit,

L(M)= qH(Q) +

m∈M

p_mH(Pm). (1)

Here, pm= qm+

n∈mpnis the fraction of time the random walker uses the code book for module m, where n∈ m are the nodes in m, pnis the ergodic visit rate of node n, and q_mis the entry and exit rate of m; q=

m∈Mqmis the rate at which the index level code book is used; Q = {qm| m ∈ M} is the set of module entry rates;Pm= {qm} ∪ {pn| n ∈ m} is the set

A B C D

1 2 3 4 (a)

Index Level Module Level L (M) =

(b)

q = 19

H (Q) = 1 +

A B 1 2 e

p1 = 59

H (P1) = 2.25 C D 3 4 e

p2 = 59

H (P2) = 2.25

= 2.61

L^0.1(M ) = (c)

q = 1 18 ,1

18

H Q ^0.1

= 0.47 0.47 +

A B 1 2 e

p1 = 5 18 ,5

18

H P₁^0.1

= 1.50 1.94 C D e

3 4

p2 = 5 18 ,5

18

H P₂^0.1

= 1.94 1.50

= 1.96

FIG. 1. Graphical representation of the code books for the standard map equation and the bipartite map equation withα = 0.1 in an unweighted example network where colors indicate modules. Block width corresponds to code word usage rate and block height to code-book entropy, a block’s contribution to the map equation is its area. Letters in the blocks indicate which nodes they refer to, and e stands for module exits. The horizontal gray bars show the contributions at index and module level. (a) The example network with color-coded modules. (b) The standard map equation calculates the code length as 2.61 bits. (c) Using node-type information worth I = 0.47 bits, the bipartite map equation with mixed node-type memory improves the compression by 0.65 bits to 1.96 bits.

(3)

of node visit rates in module m, including module exit; and H is the Shannon entropy. We assume undirected networks and therefore entry and exit rates are the same.

To minimize the map equation, we need to make a tradeoff.

On the one hand, we want to keep modules small for short code words within modules. On the other hand, we want to limit the number of modules for short code words at the index level. Further, modules should have long flow persistence and cannot be too small; otherwise, a random walker changes modules at a high rate and the sender is required to use the index level code book frequently. Under these restrictions, partitions with many links within modules and few links between modules give the best compression.

III. THE BIPARTITE MAP EQUATION

Since the map equation was developed for unipartite networks, its coding scheme can describe transitions between any pair of nodes. However, directly applying the map equation to bipartite networks leads to higher than necessary code lengths because transitions only happen between nodes of different types in bipartite networks. For a more efficient coding scheme in bipartite networks, we consider the communication game again. As before, the sender updates the receiver about the position of a random walker, but now both are aware of the bipartite network structure.

In a food web, for example, where herbivores are connected to plant species, random walks alternate between animal and plant nodes. If the current node is an animal node, the random walker must step to a plant node next, and vice versa. Therefore, we can use a bipartite coding scheme with two types of code books per module: one for animal-to-plant and one for plant-to-animal transitions. Since both these code books only address half of the nodes on average, code words can be shorter.

To derive the code length of a bipartite coding scheme, we apply Bayes’ rule to the standard map equation and obtain the bipartite map equation. Let M₁be a partition with all nodes in one module andP1be the set of ergodic node visit rates over two steps, that is, the visit rates we would obtain assuming a unipartite network. The standard map equation calculates the entropy of the random process X : current node fromP1. However, random walks on bipartite networks also provide information about a second process, namely, Y : current node type. In the bipartite map equation, we combine these two processes into one, X|Y : current node, given current node type, and determine its entropy with Bayes’ rule, H (X|Y ) = H (X )− H(Y ) + H(Y |X ). We know that H(Y ) = 1 bit be- cause the random walk alternates between nodes of different types and H (Y|X ) = 0 bits since the node fully determines the node type. LetP^LandP^Rbe the sets of visit rates for left and right nodes, respectively, that is the two types of nodes in the bipartite network, given that the current node type is known.

Then, we can express L(M₁) in terms ofP^LandP^R, L(M1)= H(P)

H (X )

= 1

H (Y )

+1

2H (P^L)+1 2H (P^R)

H (X|Y )

, (2)

to show that providing the node type reduces the description of one-level partitions by 1 bit.

To generalize to two-level partitions, we plug this equation into Eq. (1) and obtain the code length

L(M)= q

1+1

2H (Q^L)+1 2H (Q^R)

+

m∈M

pm

1+1

2H Pm^L

+1 2H

P_m^R

, (3)

where Q^L= {q^Lm| m ∈ M} and Q^R = {q^Rm| m ∈ M} are the sets of left and right module entry rates; P_m^L = {q^L_m} ∪ {pu| u ∈ m^L} and P_m^R = {q^R_m} ∪ {pv| v ∈ m^R} are the sets of left and right node visit rates in module m, including module exits; m^Land m^Rare the subsets of left and right nodes in m;

and pu∈ P^L and pv∈ P^R are the visit rates for left nodes u and right nodes v, respectively.

By separating the left and right visit rates in Eq. (3), we define the bipartite map equation:

LB(M)= q^LH (Q^L)+

m∈M

p^L_mH P_m^L

+ q^RH (Q^R)

+

m∈M

p^R_mH P_m^R

, (4)

where q^L=

m∈Mq^L_mand q^R =

m∈Mq^R_mare the usage rates for left-to-right and right-to-left code books at index level and p^L_m= q^Lm+

u∈m^Lpuand p^R_m= q^Rm+

v∈m^Rpvare the usage rates for left-to-right and right-to-left code books at the module level. Thus, the bipartite map equation calculates the code length for a given partition that describes a joint clustering of left and right nodes in a bipartite network (detailed derivations in AppendixA).

The bipartite map equation changes the communication game. As before, the sender uses one code word to encode transitions within modules and three code words for transitions between modules. But now, both sender and receiver keep track of the current node type to choose the correct code book—left to right or right to left—for their communication.

IV. THE BIPARTITE MAP EQUATION WITH VARYING NODE-TYPE MEMORY

The map equation is about compression with constraints:

Compression is not the only goal. As we use the regularities in a network more, we can increasingly compress its description, but higher compression does not necessarily mean that we find network structures that allow us to understand the network better.

For example, consider a version of the coding game where sender and receiver remember the location of the random walker. In this case, we would use a coding scheme with separate code books for each node with code words only for neighboring nodes. This would allow us to encode the walker’s path at the entropy rate of the corresponding Markov process [11] and provide a better compression than the map equation, but then nodes would not have unique code words anymore and, even though the code is efficient, it would not capture the modular structure of the network.

The key is that the map equation forgets at which exact node a random walker is and only remembers the current

(4)

module. With the bipartite map equation, we relax this constraint by remembering node types. However, in sparse bipartite networks, this comes close to remembering nodes and moves us toward encoding at the entropy rate of the Markov process without identifying modular structure. Therefore, it is useful to look at using node-type information at intermediate rates.

In the bipartite map equation with varying node-type memory, node types are fuzzy. While each node has a true type, either left or right, and the random walker alternates between types, we assume that we cannot determine types reliably. We model this uncertainty by introducing a node-type flipping rate α. When we inspect a node, we observe its true type with probability 1− α, and the opposite type with probability α. Then, on average, nodes appear both left and right to a degree determined byα. Node-visit rates change accordingly and become mixed; we describe them as pairs of left and right flow: Left nodes u with visit rate pu have a mixed visit rate p^α_u = [(1 − α)pu, αpu] and right nodes v with visit rate pv

have a mixed visit rate p^α_v= [αpv, (1 − α)pv].

Using Bayes’ rule again, we calculate the level of compression we can achieve when node types are fuzzy. Let M1be a partition with all the nodes in one module,P1 be the set of ergodic node visit rates, andP₁^α= {p^αn| n ∈ M1} be the set of mixed node visit rates. The entropy of Y : current node type is, as before, 1 bit because we observe left and right nodes with probability ¹₂ each. However, the entropy of Y|X :node type, given node is now the entropy of the node-type flipping rate, H (Y|X ) = Hα= H(1 − α, α). Overall, compared with the standard map equation, we can improve the compression by 1 bit, but node-type fuzziness increases the code length by H_α, the entropy of the flipping rate,

L(M1)= H(P1)

H(X )

= 1

H(Y )

− H_α

H(Y|X )

+ H P₁^α

H(X|Y )

, (5)

where H (P₁^α) is shorthand for the average component-wise entropies of the mixed node visit rates.

Plugging Eq. (5) into the standard map equation gives us the generalization to two-level partitions,

L(M)= q(1 − H_α+ H(Q^α))+

m∈M

pm

1− H_α+ H P_m^α

. (6) We define the bipartite map equation with varying node-type memory,

L^α(M)= q^αH(Q^α)+

m∈M

p^α_mH P_m^α

, (7)

which measures the code length for a partition M and node-type flipping rate α. Figure 1(c) illustrates how the code-book structure changes compared to the standard map equation [Fig.1(b)] for a fixed valueα in the same example network as before [Fig.1(a)]. We can generalize the bipartite map equation with varying node type memory to more than two levels by recursively expanding the code-book structure within modules. Then, each module within modules receives its own set of entry, node visit, and exit code words.

When node types are flipped at a rate of α = ¹₂, nodes become left and right in equal parts. With H_α= 1 bit, this

means that there is maximum uncertainty about node types.

Ignoring node types in this way is equivalent to using the standard map equation. The bipartite map equation is recov- ered forα = 0 and α = 1 because both values lead to H_α = 0.

However, they have different interpretations. Forα = 0, node types never flip and we can determine the true type of the nodes. Under a flipping rate ofα = 1, node types always flip and we determine the opposite of the true node type. This has no effect on the code length because it simply swaps the left and right entropy terms of the bipartite map equation.

Using the bipartite map equation with varying node-type memory, we are ready to answer the initial question: What more can we learn about a network by using node types in whole or in part? Because it is more intuitive to think about how much we know about node types than the probability of flipping them, we use entropy to connect these two quantities.

Flipping node types at rate α leads to an uncertainty of H_α about them. Consequently, I(α) = 1 − Hα is the available amount of information about node types, given that they are flipped at rate α. This formulation suggests an alternative interpretation of Eq. (5): we can reduce the code length of one-level partitions exactly by the amount of information that we have about node types. To investigate by how much we can reduce the code length of two-level and hierarchical partitions, we have applied the bipartite map equation to real-world networks.

V. APPLYING THE BIPARTITE MAP EQUATION TO REAL-WORLD NETWORKS

We have implemented the bipartite map equation for two- level and hierarchical partitions in Infomap [10]. The time complexity is the same as for standard Infomap, whose core algorithm is linear in the number of links.

We used the bipartite implementation to analyze the community landscape of 21 bipartite networks from different domains. Our results show that the bipartite map equation uses node-type information effectively and improves the compression beyond the provided information. The improved compression increases the resolution and lets us discover more regularities.

A. Networks

We selected 21 bipartite networks from different domains from the KONECT [12] and ICON [13] databases and other sources [14,15]. We preprocessed the networks with the python packageNETWORKX [16] and only kept their largest connected components. The resulting networks ranged from a few dozen to millions of nodes and edges in size; their domain, number of left nodes nL, number of right nodes nR, and number of edges m are listed in Table I. In weighted networks, marked with the superscript^W, the rate at which the random walker uses edges is proportional to their weight.

In all networks, left nodes represent subjects, such as users, documents, and animals, while right nodes represent objects that are acted upon, such as movies, words, and plants.

(5)

TABLE I. Properties of 21 bipartite test networks and their community landscape. The networks are sorted by number of edges, and weighted networks are marked with the superscript^W. For each network and amount of node-type information, we ran Infomap 100 times and selected the hierarchical partitions with the best code length.

Code length Effective module size

Name Ref. Domain nL nR m I = 0 I = 0.5 I = 1 I = 0 I = 0.5 I = 1

Wiktionary (en)^W [17] Authorship 26 719 2 091 461 5 569 967 12.14 11.53 10.67 27 934 15,520 36 Last.fm user-song^W [12] Interaction 992 1 084 620 4 413 834 12.33 11.70 10.94 2 102 2 037 91 Wikipedia excellent [17] Text 2 780 273 959 2 941 902 13.64 13.15 11.80 68 944 145 24 IMDb actor-movie [18] Affiliation 124 414 374 511 1 460,791 11.99 11.33 10.37 23 15 3.5 Stack Overflow user-post [19] Rating 524 670 80 492 1 280 982 11.83 11.04 9.99 28 24 7.3

Reuters story-word [20] Text 19 757 38 677 978 446 13.44 12.89 11.94 9 029 1 564 2.0

Wiktionary (de)^W [17] Authorship 5 354 144 710 686 661 11.23 10.65 9.82 3 690 2 490 3.0 Linux kernel mailing list^W [21] Interaction 34 490 330 155 591 199 9.61 9.02 8.22 419 384 46

GitHub user-project [22] Authorship 39 845 99 907 417 361 11.24 10.52 9.21 23 10 3.2

YouTube user-group [23] Affiliation 88 490 25 007 286 913 10.25 9.58 8.55 48 29 5.4

APSMM conference [15] Social 93 023 21 240 342 10.79 10.06 9.09 6 742 3 663 108

LVHK Meetup [14] Social 6 061 5 096 127 033 11.58 11.06 10.09 1 011 141 1.5

PGHF Meetup [14] Social 4 989 4 611 39 501 10.90 10.26 9.26 52 14 1.8

SIAM conference [15] Social 10 018 19 15 533 7.94 7.29 6.59 525 427 89

NIPS conference [15] Social 6 902 27 12 595 8.14 7.38 6.74 288 227 38

UC Irvine forum^W [24] Social 897 520 7 087 8.16 7.61 6.93 23 18 1.8

Norwegian directors [25] Economic 212 854 1 148 3.83 3.07 2.03 5.6 4.7 3.4

Virus-host interactome [26] Biological 41 288 433 4.89 4.21 3.24 17 13 5.7

Scottish directors [27] Economic 86 131 348 5.18 4.54 3.60 6.8 5.1 2.1

Arroyo Goye^W [28] Ecological 27 8 41 2.70 2.17 1.49 11 8.2 3.5

Fonseca Ganade^W [31] Ecological 19 10 38 2.24 1.68 1.06 5.2 4.4 1.7

B. Setup

We explored the community landscape of our test networks from no information atI = 0 bits to full information at I = 1 bit with a step size of 0.05 bits. For node-type information I, we calculated the corresponding node-type flipping rate α numerically. Because of its stochasticity, we ran Infomap 100 times for each network and value ofα, both with the flag --two-level, to search for two-level partitions, and without the flag to search for hierarchical partitions. Finally, for each α, we selected the partitions with the best code length for further analysis.

C. Structure and compression

We measured the extra compression provided by a partition M by using the corresponding one-level partition M₁ as a baseline. The one-level code length decreases by the amount of node-type information that is available [Eq. (5)], specif- ically L^α(M1)= L^0.5(M1)− I(α), where I(α) = 1 − H_α is the node-type information when node-types are flipped at rate α. We define the extra compression of M as L^α(M₁)− L^α(M) 0; it is always at least 0 because Infomap returns the one-level partition when it does not find any partition with lower code length. In partitions with more than one level, the extra compression depends on the code-book use rate, the total coding rate q+

m∈Mpm, and the amount of node-type information [Eq. (6)].

To measure the resolution of the community detection, we use the effective module size as a proxy. By only considering leaf modules—those modules that contain nodes but have no submodules—we can use the same measure for two-level and

hierarchical solutions. Let S be the set of leaf module sizes in partition M where size refers to the number of nodes in a module. Then the perplexity of the module sizes, 2^{H (S)} with H (S)= −

s∈Ss

s∈Sslog^s

s∈Ss, tells us the effective number of leaf modules, similar to how it can be used to calculate the effective number of sides of a (loaded) coin or die. Combining the effective number of modules together with the number of nodes N in the network, we calculate the effective module size as ₂H (S)^N .

The effective community size and extra compression capture two significant patterns in the analyzed networks.

First, the resolution increases and we detect more communities on different scales when we use node types. At lower levels in the community hierarchy, modules become more fine-grained, while on higher levels, they become more coarse. For example, in the weighted Fonseca-Ganade plant- ant web, the bipartite map equation withI = 0.65 bits reveals hierarchically nested modules with smaller modules at the finest level (Fig.2). With more node-type information, some nodes are assigned into singleton modules that form bridges between other modules. The flow-persistence time is not long enough to include them in either of the other modules and, therefore, it is better to assign them to their own modules (Fig. 2). When we approach full node-type information at I = 1 bit, it can lead to so many small modules that no useful structure is detected anymore. For example, leaf modules in the Las Vegas Hikers network (LVHK) contain only 1.5 nodes on average [Fig.3(a)]. In the IMDb actor-movie network, the effective module size decreases approximately linearly from 23 atI = 0 bits to 3.5 at I = 1 bit [Fig.3(b)]. In the Last.fm user-song network, the effective module size is around 2 000

(6)

(a) (b)

FIG. 2. Community structure at different scales in the weighted Fonseca-Ganade plant-ant web [31]. By providing more node-type information, we increase the resolution and detect finer modules on lower and coarser modules on higher levels in the community hierarchy. (a) Community structure forI = 0 bits (α =¹₂) with code length 2.24 bits and effective module size 5.27. (b) Community structure forI = 0.65 bits (α = ¹₆) with code length 1.5 bits and effective module size 4.35.

betweenI = 0 bits and I = 0.85 bits but then drops sharply and is 91 forI = 1 bit [Fig.3(c)]. We see a similar behavior in all the networks we analyzed (TableI), both for hierarchical and two-level partitions, with the difference being that sharp drops in module size are less common in two-level partitions (Fig.3 and Fig.4, AppendixB). However, as leaf modules become smaller, the community hierarchy becomes deeper such that higher levels still contain significant structures.

Second, the compression improves by more than the amount of node-type information we provide. With the dual-

(a)

10¹ 10² 10³

Effectivemodulesize

0.00 0.25 0.50 0.75 1.00 Available information (bits) 0.7

0.8 0.9 1.0 1.1

Extracompression(bits)

(b)

0 20 40 60 80 100

Effectivemodulesize

5.5 6.0

(c)

10² 10³

Effectivemodulesize

2.1 2.2 2.3

Extracompression(bits) hierarchical

two-level hierarchical two-level

(d)

4 6 8 10

Effectivemodulesize

0.95 1.00 1.05 1.10

FIG. 3. Compression and community resolution increase with available node-type information. The solid and dashed blue lines show the extra compression of the best hierarchical and two-level partitions, respectively. The solid and dashed orange lines show the effective module size of the best hierarchical and two-level partitions, respectively. (a) Las Vegas Hikers (LVHK) Meetup attendance.

(b) IMDb actor-movie network. (c) Last.fm user-song network.

(d) Arroyo Goye pollinator-plant web.

ity between compression and finding regularities in data, the bipartite map equation detects more structure in the bipartite networks. Because the entropy function is nonlinear, the extra compression generally increases faster with more available information. For example, in the IMDb actor-movie network, when the code length decreases from 11.99 bits atI = 0 bits to 10.37 bits at I = 1 bit, the extra compression improves from 5.7 to 6.3 bits, and the rate of improvement increases closer to full node-type information [TableI, Fig.3(b)]. In the Arroyo Goye pollinator-plant web and the LVHK network, the compression improves slowly at first, but faster once more regularities can be detected above I = 0.5 bits [Figs. 3(a) and3(d)]. However, since we ran Infomap independently for eachα, the extra compression sometimes decreased with more information [Figs.3(a)and3(b), and Fig.4, AppendixB]. In these cases, Infomap’s stochastic search algorithm did not find partitions that would have led to an increase in extra compression. For example, using the same partition over the whole range ofα guarantees a monotonic increase in all networks.

Nevertheless, by providing more node-type information, the regularizing effect of the standard map equation decreases and the compression generally increases.

Higher resolution and further compression also result from using shorter Markov times [29,30], but the map equation for varying Markov times [5] and the bipartite map equation work in different ways. Short Markov times correspond to a lazy random walker on a modified network with strong self-links.

With fewer steps between nodes, cheaper transitions between communities shift the optimal solution to smaller communities with shorter average code lengths. Instead, the bipartite map equation transforms node type information into compression with smaller node-type specific codebooks. Cheaper transitions between communities then shift the optimal solution to smaller communities and result in extra compression.

VI. CONCLUSION

We have extended the map equation framework for finding modules in network flows to use node-type information encoded in bipartite networks. Applied to 21 real-world networks, the bipartite map equation implemented in the search algorithm Infomap detects more, smaller communities at lower levels of the community hierarchy and fewer, larger modules at higher levels. The community-detection resolution increases because the bipartite map equation’s coding scheme exploits the alternating trajectories of random walks and compresses the description of network flows beyond the provided node-type information. In between ignoring and making full use of the node-type information, the bipartite map equation can use the node-type information at intermediate rates, offering a principled way to explore communities at higher resolution in bipartite networks.

ACKNOWLEDGMENTS

This work was partially supported by the Wallenberg AI, Autonomous Systems, and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation. We would like to thank Leto Peel, Vincenzo Nicosia, and Jelena Smiljani´c for discussions that helped to improve this paper. M.R. was

(7)

(a)

10¹ 10² 10³ 10⁴

Effectivemodulesize

2.1 2.2 2.3 2.4

(b)

10² 10³

Effectivemodulesize

2.1 2.2 2.3

Extracompression(bits) hierarchical

two-level hierarchical two-level

(c)

10² 10³ 10⁴ 10⁵

Effectivemodulesize

0.4 0.6 0.8

(d)

0 20 40 60 80 100

Effectivemodulesize

5.5 6.0

(e)

10 15 20 25 30

Effectivemodulesize

0.00 0.25 0.50 0.75 1.00

Available information (bits) 5.2

5.4 5.6 5.8 6.0

(f )

10¹ 10² 10³ 10⁴

Effectivemodulesize

0.2 0.3 0.4 0.5 0.6

(g)

0 1000 2000 3000

Effectivemodulesize

0.9 1.0 1.1 1.2

(h)

100 200 300 400

Effectivemodulesize

4.9 5.0 5.1 5.2

(i)

10¹

Effectivemodulesize

4.00 4.25 4.50 4.75

(j)

20 40 60 80

Effectivemodulesize

0.00 0.25 0.50 0.75 1.00

4.4 4.6 4.8 5.0 5.2

(k)

10² 10³ 10⁴

Effectivemodulesize

0.6 0.8 1.0

(l)

10¹ 10² 10³

Effectivemodulesize

0.8 0.9 1.0 1.1

(m)

0 10 20 30 40 50

Effectivemodulesize

0.00 0.25 0.50 0.75 1.00

1.4 1.6

(n)

100 200 300 400 500

Effectivemodulesize

1.7 1.8 1.9

(o)

50 100 150 200 250 300

Effectivemodulesize

1.6 1.8 2.0

(p)

5 10 15 20

Effectivemodulesize

0.95 1.00 1.05 1.10

FIG. 4. (Continued) supported by the Swedish Research Council, Grant No. 2016-

00796.

APPENDIX A: DERIVATION OF THE BIPARTITE MAP EQUATION

Consider an undirected, weighted bipartite graph G= (NL, NR, E, δ) with left nodes NL, right nodes NR, edges E ⊆ NL× NR, and edge weights δ : E → R. Let P^L= {pu| u ∈ NL} and P^R= {pv| v ∈ NR} be the left and right node visit rates. Since the graph is undirected, we can calcu- late the visit rates directly by pu=^v^∈NR_(G)^δ((u,v))for left nodes u and pv =^u^∈NL_(G)^δ((u,v)) for right nodes v, where (G) =

e∈Eδ(e) is the total edge weight in G. The visit rate of

a disconnected node is 0, but we exclude such nodes from our considerations because they could be assigned to any module without affecting the code length. Since the graph is bipartite, both P^L andP^R sum to 1, that is,

p_u∈P^L pu = 1 and

p_v∈P^Rpv= 1.

Let N = NL∪ NRbe the set of all nodes andP be the set of ergodic visit rates over two steps, that is, the visit rates we would obtain when we assume a unipartite network. For distinction between node types, we use u to refer to left nodes, vto refer to right nodes, and n when we talk about both types in combination. We denote left and right visit rates by puand pv, respectively, and ergodic visit rates over two steps by p_n. Since the graph is bipartite, the total weight of edges incident to left nodes is equal to the total weight of edges incident to right nodes and, therefore, the ergodic visit rate over two steps

(8)

(q)

4 5 6 7 8

Effectivemodulesize

0.00 0.25 0.50 0.75 1.00

5.75 6.00 6.25 6.50

(r)

10 15 20

Effectivemodulesize

0.00 0.25 0.50 0.75 1.00

2.4 2.6 2.8

(s)

2 4 6 8

Effectivemodulesize

0.00 0.25 0.50 0.75 1.00

2.6 2.8

(t)

4 6 8 10

Effectivemodulesize

0.95 1.00 1.05 1.10

(u)

2 3 4 5

Effectivemodulesize

0.00 0.25 0.50 0.75 1.00

1.70 1.75 1.80

FIG. 4. (a) Wiktionary (en): Left nodes represent authors, and right nodes represent articles on English Wiktionary. An edge connects authors to the articles they have authored. (b) Last.fm user-song: Left nodes represent users, and right nodes represent songs. Edges connect users to the songs they have listened to. (c) Wikipedia excellent: Left node represent excellent articles on Wikipedia, and right nodes represent words. An edge connects an article to a word if it contains it. (d) IMDb actor-movie: Left nodes represent actors, and right nodes represent movies. Edges connect actors to those movies they have played in. (e) Stack Overflow user-post: Left nodes represent users, and right nodes represent posts. An edge connects users to those posts they have marked as a favorite. (f) Reuters story-word: Left nodes represent stories in the Reuters Corpus, Volume 1, and right nodes represent words. An edge connects a story to a word if it contains it. (g) Wiktionary (de): Left nodes represent authors, and right nodes represent articles on German Wiktionary. An edge connects authors to the articles they have authored. (h) Linux kernel mailing list: Left nodes represent users, and right nodes represent threads in the linux kernel mailing list. An edge connects user to those threads where they contribute. (i) GitHub user-project: Left nodes represent users, and right nodes represent projects. An edge connects users to those projects where they are a member. (j) YouTube user-group: Left nodes represent users, and right nodes represent groups. An edge connects users to the groups where they are a member. (k) APSMM conference: Left nodes represent scientists, and right nodes represent editions of the APSMM conference. Edges connect scientists to the editions of the conference they have attended. (l) LVHK Meetup: Left nodes represent persons, and right nodes represent events of the VegasHikers group on Meetup. Edges connect persons to those events they have attended. (m) PGHF Meetup: Left nodes represent persons, and right nodes represent events of the Pittsburgh-free group on Meetup.

Edges connect persons to those events they have attended. (n) SIAM conference: Left nodes represent scientists, and right nodes represent editions of the SIAM conference. Edges connect scientists to the editions of the conference they have attended. (o) NIPS conference: Left nodes represent scientists, and right nodes represent editions of the NIPS conference. Edges connect scientists to the editions of the conference they have attended. (p) UC Irvine forum: Left nodes represent users, and right nodes represent topics in the UC Irvine online forum. An edge connects users to those topics where they have made a post. (q) Norwegian directors: Left nodes represent directors, and right nodes represent Norwegian companies. Edges connect persons to the companies where they are member of the board of directors. (r) Virus-host interactome: Left nodes represent virus proteins, and right nodes represent host proteins. An edge connects virus proteins to those host proteins they interact with. (s) Scottish directors: Left nodes represent directors, and right nodes represent Scottish companies. Edges connect directors to the companies where they are member of the board of directors. (t) Arroyo Goye pollinator-plant: Left nodes represent pollinators, and right nodes represent plant species. An edge connects pollinators to the plants they pollinate. (u) Fonseca Ganade ant-plant: Left nodes represent ant species, and right nodes represent plant species. Edges connect ant species to those plant species that they use as a source of food or housing.

for a node n is pn= ^p₂ⁿ. Then the set of ergodic visit rates over two steps is connected to the left and right visit rates by

P =

pu

2

p^u∈ P^L

∪

pv

2

p^v ∈ P^R

. (A1)

Let M be a partition of the nodes into modules. The standard map equation calculates the code length of M as the average of the module and index level code lengths, weighted by the fraction of time a random walker uses each of the code

books,

L(M)= qH(Q) +

m∈M

pmH(Pm). (A2)

Here, pm= qm+

n∈mp_n is the fraction of time the ran- dom walker uses the code book for module m and n∈ m are the nodes in m, and qm is the entry and exit rate of m; q=

m∈Mqm is the rate at which the index level code book is used. Q = {qm| m ∈ M} is the set of module entry rates,Pm= {qm} ∪ {pn| n ∈ m} is the set of node visit rates

(9)

in module m, including module exit, and H is the Shannon entropy. Note than entry and exit rates are identical since the network is undirected.

Let M₁be a one-level partition with all nodes in the same module. Then the code length according to the standard map equation is

L(M1)= H(P) = −

p_n∈P

p_nlog₂p_n= −

p_n∈P

p_n(log₂2p_n− 1) = −

p_n∈P

p_nlog₂2p_n+

p_n∈P

p_n

= 1 −1 2

p_n∈P

2pnlog₂2pn

A1= 1 − 1 2

p_u∈P^L

pulog₂pu−1 2

p_v∈P^R

pvlog₂pv= 1 + 1

2H (P^L)+1

2H (P^R). (A3)

To generalize, we plug Eq. (A3) into Eq. (A2),

L(M)= q

1+1

2H (Q^L)+1 2H (Q^R)

+

m∈M

pm

1+1

2H P_m^L

+1 2H

P_m^R

. (A4)

Here,Q^L= {q^L_m| m ∈ M} and Q^R = {q_m^R| m ∈ M} are the sets of left and right module entry rates; P_m^L = {q^L_m} ∪ {pu| u ∈ m^L} andPm^R = {q^Rm} ∪ {pv| v ∈ m^R} are the sets of left and right node visit rates in module m, including module exits. Further, m^L and m^Rare the subsets of left and right nodes in m.

Based on Eq. (A4), we define the bipartite map equation, LB(M)= q^LH (Q^L)+

m∈M

p^L_mH Pm^L

+ q^RH (Q^R)+

m∈M

p^R_mH Pm^R

. (A5)

Here, q^L=

m∈Mq^L_mand q^R=

m∈Mq^R_mare the usage rates for left-to-right and right-to-left code books at index level; and p^L_m= q^L_m+

u∈m^L puand p^R_m= q_m^R +

v∈m^Rpvare the usage rates for left-to-right and right-to-left code books at module level, respectively. As the total weight of edges incident to left nodes is equal to the total weight of edges incident to right nodes, we have q^L= q^R =^q₂ and p^L_m= p^Rm= ^p₂^m for all m.

Consider againP, the set of ergodic node visit rates over two steps and let α ∈ [0, 1] ⊂ R. For better readability and because specific nodes are not important, we refer to the visit rates over two steps simply as p in the following. Further, we use H_α= H (1− α, α) as shorthand for the entropy of α. We can then rewrite H(P),

H(P) = −

p∈P

p log₂p

= [(1 − α) + α]

⎛

⎝−

p∈P

p log₂p

⎞

⎠

= (1 − α)

⎛

⎝−

p∈P

p log₂p

⎞

⎠ + α

⎛

⎝−

p∈P

p log₂p

⎞

⎠

= −

p∈P

[(1− α)p] log₂p−

p∈P

αp log₂p

= −

p∈P

[(1− α)p] log₂ (1− α)p 1− α −

p∈P

αp log₂αp α

= −

⎧⎨

⎩

p∈P

[(1− α)p] log2[(1− α)p] − [(1 − α)p] log2(1− α)

⎫⎬

⎭ −

⎛

⎝

p∈P

αp log2αp − αp log2α

⎞

⎠

= −

p∈P

[(1− α)p] log2[(1− α)p] +

p∈P

[(1− α)p] log2(1− α) −

p∈P

αp log2αp +

p∈P

αp log2α

= (1 − α) log₂(1− α) + α log₂α −

p∈P

[(1− α)p] log₂[(1− α)p] −

p∈P

αp log₂αp

= −Hα−

p∈P

[(1− α)p] log2[(1− α)p] −

p∈P

αp log2αp (A6)