Dynamics in Boolean Networks

Full text

(1)Examensarbete LITH-ITN-ED-EX--05/012--SE. Dynamics in Boolean Networks Fredrik Karlsson 2005-04-28. Department of Science and Technology Linköpings Universitet SE-601 74 Norrköping, Sweden. Institutionen för teknik och naturvetenskap Linköpings Universitet 601 74 Norrköping.

(2) LITH-ITN-ED-EX--05/012--SE. Dynamics in Boolean Networks Examensarbete utfört i elektronikdesign vid Linköpings Tekniska Högskola, Campus Norrköping. Fredrik Karlsson Handledare Michael Hörnquist Examinator Michael Hörnquist Norrköping 2005-04-28.

(3) Datum Date. Avdelning, Institution Division, Department Institutionen för teknik och naturvetenskap. 2005-04-28. Department of Science and Technology. Språk Language. Rapporttyp Report category. Svenska/Swedish x Engelska/English. Examensarbete B-uppsats C-uppsats x D-uppsats. ISBN _____________________________________________________ ISRN LITH-ITN-ED-EX--05/012--SE _________________________________________________________________ Serietitel och serienummer ISSN Title of series, numbering ___________________________________. _ ________________ _ ________________. URL för elektronisk version http://www.ep.liu.se/exjobb/itn/2005/ed/012/. Titel Title. Dynamics in Boolean Networks. Författare Author. Fredrik Karlsson. Sammanfattning Abstract In this. thesis several random Boolean networks are simulated. Both completely computer generated network and models for biological networks are simulated. Several different tools are used to gain knowledge about the robustness. These tools are Derrida plots, noise analysis and mean probability for canalizing rules. Some simulations on how entropy works as an indicator on if a network is robust are also included. The noise analysis works by measuring the hamming distance between the state of the network when noise is applied and when no noise is applied. For many of the simulated networks two types of rules are applied: nested canalizing and flat distributed rules. The computer generated networks consists of two types of networks: scale-free and ER-networks. One of the conclusions in this report is that nested canalizing rules are often more robust than flat distributed rules. Another conclusion is that the mean probability for canalizing rules has, for flat distributed rules, a very dominating effect on if the network is robust or not. Yet another conclusion is that when flat distributed rules are applied, the probability distribution for indegrees has a strong effect on if a network is robust. The indegrees has a strong effect due to the connection between the probability distribution for indegrees and the mean probability for canalizing rules.. Nyckelord Keyword. Random Boolean Networks, Derrida plots, Genetic regulatory networks.

(4) Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under en längre tid från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/ Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a considerable time from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/. © Fredrik Karlsson.

(5)

(6) Abstract In this thesis several random Boolean networks are simulated. Both completely computer generated network and models for biological networks are simulated. Several different tools are used to gain knowledge about the robustness. These tools are Derrida plots, noise analysis and mean probability for canalizing rules. Some simulations on how entropy works as an indicator on if a network is robust are also included. The noise analysis works by measuring the hamming distance between the state of the network when noise is applied and when no noise is applied. For many of the simulated networks two types of rules are applied: nested canalizing and flat distributed rules. The computer generated networks consists of two types of networks: scale-free and ER-networks. One of the conclusions in this report is that nested canalizing rules are often more robust than flat distributed rules. Another conclusion is that the mean probability for canalizing rules has, for flat distributed rules, a very dominating effect on if the network is robust or not. Yet another conclusion is that when flat distributed rules are applied, the probability distribution for indegrees has a strong effect on if a network is robust. The indegrees has a strong effect due to the connection between the probability distribution for indegrees and the mean probability for canalizing rules..

(7)

(8) Acknowledgment First of all, I want to express my gratitude towards my supervisor and examiner Michael Hörnquist who has been a great support for me in the work with my master-thesis, by providing me with interesting ideas and articles. I would also want to thank my parents for their love and support..

(9)

(10) Contents CONTENTS........................................................................................................................................................... 1 AIM AND PURPOSE ........................................................................................................................................... 3 METHOD .............................................................................................................................................................. 3 LANGUAGE COMMENTS................................................................................................................................. 3 INTRODUCTION................................................................................................................................................. 4 THEORY ............................................................................................................................................................... 7 GRAPHS AND NETWORKS..................................................................................................................................... 7 BOOLEAN RULES ................................................................................................................................................ 12 Flat distributed rules..................................................................................................................................... 12 Canalizing and Nested canalizing rules........................................................................................................ 13 ENTROPY ........................................................................................................................................................... 15 HAMMING DISTANCE ......................................................................................................................................... 17 DERRIDA PLOTS ................................................................................................................................................. 18 NOISE ANALYSIS TOOL ....................................................................................................................................... 19 STANDARD ERROR ............................................................................................................................................. 21 IMPLEMENTATION ........................................................................................................................................ 22 IMPLEMENTATION OF THE NETWORK REPRESENTATION ..................................................................................... 22 IMPLEMENTATION OF THE UPDATING OF STATES ................................................................................................ 23 IMPLEMENTATION OF THE CREATION AND UPDATING OF THE BOOLEAN RULES ................................................. 24 IMPLEMENTATION OF THE REWIRING ................................................................................................................. 27 IMPLEMENTATION OF ER-NETWORKS GENERATING PROCEDURE ....................................................................... 28 IMPLEMENTATION OF SCALE FREE NETWORKS GENERATING PROCEDURE .......................................................... 29 RANDOM NUMBER GENERATOR ......................................................................................................................... 29 THE IMPLEMENTATION OF THE CALCULATION OF ENTROPY ............................................................................... 30 THE IMPLEMENTATION OF THE NOISE ANALYSIS TOOL ....................................................................................... 31 IMPLEMENTATION OF THE DERRIDA PLOT .......................................................................................................... 31 SIMULATIONS .................................................................................................................................................. 33 SIMULATIONS ON ENTROPY VERSUS CHAOS ....................................................................................................... 33 Simulation settings ........................................................................................................................................ 33 Simulation results.......................................................................................................................................... 33 Analysis of the result ..................................................................................................................................... 36 CALCULATIONS FOR MEAN PROBABILITY FOR CANALIZING RULES FOR DIFFERENT DISTRIBUTIONS ................... 37 Description of the calculations ..................................................................................................................... 37 Results of the calculations............................................................................................................................. 38 Discussion and analysis ................................................................................................................................ 39 SIMULATIONS ON ER-NETWORKS ...................................................................................................................... 40 Probability distribution................................................................................................................................. 40 Derrida plots ................................................................................................................................................. 41 Noise analysis tool ........................................................................................................................................ 43 Mean probability for canalizing rules........................................................................................................... 47 Discussion and analysis ................................................................................................................................ 47 SIMULATIONS ON THE FANG-NET ....................................................................................................................... 49 Probability distribution................................................................................................................................. 49 Derrida plots ................................................................................................................................................. 51 Noise analysis tool ........................................................................................................................................ 52 Mean probability for canalizing rules........................................................................................................... 56 SIMULATIONS ON THE LEE-NET.......................................................................................................................... 59 Probability distribution................................................................................................................................. 59 Derrida plots ................................................................................................................................................. 61 Noise analysis tool ........................................................................................................................................ 62. 1.

(11) Mean probability for canalizing rules........................................................................................................... 66 Analysis and discussion................................................................................................................................. 66 SIMULATIONS ON THE MILO-NET ....................................................................................................................... 69 Probability distributions ............................................................................................................................... 69 Derrida plots ................................................................................................................................................. 71 Noise analysis tool ........................................................................................................................................ 72 Mean probability for canalizing rules........................................................................................................... 76 Analysis and discussion................................................................................................................................. 76 SIMULATIONS ON REWIRED VERSIONS OF THE MILO-NET................................................................................... 78 Motif detection .............................................................................................................................................. 78 Derrida plots ................................................................................................................................................. 79 The Noise analysis tool ................................................................................................................................. 81 Analysis and discussion................................................................................................................................. 83 SIMULATIONS ON THE LASSO-NET ..................................................................................................................... 84 Probability distributions ............................................................................................................................... 84 Derrida plots ................................................................................................................................................. 86 Noise analysis tool ........................................................................................................................................ 87 Mean probability for canalizing rules........................................................................................................... 89 Analysis and discussion................................................................................................................................. 89 SUMMARIZING DISCUSSION AND ANALYSIS ........................................................................................ 91 SUGGESTIONS ON FURTHER RESEARCH................................................................................................ 92 FIGURE LIST..................................................................................................................................................... 93 EQUATION LIST............................................................................................................................................... 95 TABLE LIST....................................................................................................................................................... 95 REFERENCES.................................................................................................................................................... 96. 2.

(12) Aim and purpose The aim with this thesis is to study the robustness for different Boolean networks, which are used as models for biological networks. The purpose is both to examine which networks that are robust and to try to determine what factors that governs if a network is robust or not.. Method The method to examine the properties of the robustness is to simulate different types of Boolean networks. The programming to achieve the possibility to simulate the Boolean networks is done in C++.. Language comments Sometimes several different words are used to designate the same thing and sometimes can a word have several meanings. The usage of some of these words will be explained under this topic. The word input is sometimes used as a synonym to indegree. If it is used in context: input to vertex it is synonymous to indegree. A similar connection exists between output and outdegree. The terms ordered and chaotic regime is used to describe a network as robust respectively not robust. Instead of not robust the phrase, sensitive to initial perturbation, can be used. The word robust is also used in phrases such: robust against noise. The meaning of the phrase is that it is sensitive to noise. In some rare occasions, in connection to the article [23], the word stable is used as a synonym to robust.. 3.

(13) Introduction In 1969 Stuart Kauffman introduced random Boolean networks as a model for genetic regulatory networks. He did it in the article [1]. The main concept of this idea is to approximate the expression of the genes to on or off. This approximation is quite crude due to the fact that the genes are allowed to occupy positions in other levels than on or off. Random Boolean networks have gained some acceptance due to the fact that they have shown similar results as networks with multilevel rules [2]. The word expression will, from now on, be replaced with the word state in the context of Boolean networks. In random Boolean networks the state of one gene is determined by the states of the other genes. This is done with the help of logical rules also known as Boolean rules. The network determines the genes that influence a certain gene and the Boolean rules determines how they shall influence that gene [3]. Examples of typical Boolean rules are AND, OR and XOR. So one can say that the random Boolean networks is a digital-circuit where the actual genes are approximated with Dflipflops and the rules are approximated with combinatory circuits. In random Boolean networks another approximation is utilized besides the Boolean rules and it is that synchronous updating is applied. Synchronous updating means that all the genes are updated at the same time. This is an approximation due to the fact that genes, in the real world, are updated asynchronously. Asynchronous updating means that the genes are updated sequentially. The state of the gene is determined before the next gene is updated. The order of the updating is often arbitrary. Random Boolean networks can also utilize an asynchronous updating scheme and if they do they are often denoted asynchronous random Boolean networks. Now to the question: Why is synchronous updating used? The answer is because it is simple. If one uses asynchronous updating one must decide the order for the updating of the genes. One possible solution is to update the genes in a random order. This solution suffers from the problem that the system becomes non deterministic. The problem with a nondeterministic behaviour is that the cyclic behaviour that is typical for synchronous updating is destroyed [2]. This cyclic behaviour also exists in the real cells [4]. Which genes that affect which gene are decided randomly and the Boolean rules are also determined randomly. Is this type of model relevant? The answer is yes but it is not relevant for obtaining detailed information. The model can give information about generic properties such as the probable number of connection between the genes. Random Boolean networks, cannot disclose any properties of the real genetic regulatory network without a very central postulate, which says that a biological network must be robust. Robust means, in this case, that a biological network must be insensitive to disturbances. The postulate is essential due to the fact that a living organism is dependent, for its survival, on that certain tasks are performed in a certain order at a certain time. The cell is a good example that some tasks must be performed in an ordered way. The cell cycle is described in the text below. The cell cycle for the eucaryotic cell, which is a cell with a distinct nucleus and cytoplasm, consists of four phases. Cytoplasm is the contents within the plasma membrane but outside the nucleus. The four phases are the M-phase, the G1-phase, the S-phase and the G2-phase. The M-phase consists of two stages mitosis and the cytokinesis. Mitosis is the division of the nucleus and the cytokinesis is the division of the cytoplasm. The mitosis occurs before the cytokinesis and after the cytokinesis two new cells with two different nucleuses have been formed. During the G1-phase, G stands for gap, will the cell grow and if the appropriate conditions exist the cell will enter the S-phase. The S in the S-phase stands for synthesis and during this phase the cell copies the DNA in the nucleus. The copying of the DNA is a 4.

(14) necessary step before the mitosis can start. After the S-phase the chromosomes and their copies are tightly bound together and they will not separate before the mitosis has been performed. The G2-phase lies between the S-phase and the M-phase. During the G2-phase the cell will grow and when the appropriate conditions exist the cell will enter the M-phase. The phases must be executed in a certain order. The order is shown, in Fig 1, with the help of the circle arrow. For example the M-phase is executed before the G1-phase and so on. To make sure that the phases are executed in the right order the cell utilizes the cell cycle control system. Besides the execution order the cell cycle control system checks if the cell cycle shall continue and this is done with the help of feedback from the cell. Different things are tested, in so called checkpoints, before the different phases. Two things must be checked before the M-phase can start and these two things are the cell size and if the entire DNA has been copied. At the checkpoint before the S-phase three things must be tested. The three things are cell size, environment and if the DNA has been damaged. All the phases in the cell cycle is started and performed by different proteins.. M G2. G1 S. Fig 1: The Figure shows the control of the cell cycle. The importance of proteins for the cell cycle demands a short description on how they are manufactured. They are manufactured in the ribosome but the ribosome cannot manufacture a protein without a blueprint. The blueprint exists in the genes in the DNA. The information in the DNA is copied to a molecule called a messenger RNA. Messenger RNA is abbreviated mRNA. The entire DNA is not copied into the mRNA molecules. Only small parts of it are copied. The small parts contain the information necessary to manufacture the protein needed. The process between DNA and protein consists of many steps and it contains many complex chemical processes. It is not enough with mRNA one also need something that is called tRNA, which stands for translation RNA. Translation RNA helps binding the amino acids with components in the mRNA molecule. The translation RNA is only one example of the many components that are involved in the process of creating protein from the description in DNA.. 5.

(15) With the knowledge of how the cell cycle works and how the manufacturing of protein works it is easy to see that the genes cannot order the production of the same amount of proteins at all times. In a multi cellular organism the DNA contains information for the whole organism but the single cell do not need proteins that are used in other cell types. From the two previous sentences it is clear that all genes cannot be active all the time for every cell type. Therefore it is necessary to regulate how much protein different genes shall “produce”. It is possible for a cell to control the expression of the gene at different levels, so the random Boolean networks can be seen as a projection to the gene level. With other words the model does not contain any information on what level the control has been performed. It only contains information about the genes that caused the controlling mechanism. A short comparison between the genetic regulatory for the Eucaryotic cell and (random) Boolean networks will be given below. The word random is put in parenthesis due to the fact the common features with genetic regulatory for the Eucaryotic cell is not dependent on the fact that network is random. The first common feature is that a gene can affect any other gene no matter the position in the DNA.. This is also true for the (random) Boolean networks. The second common feature is that the genes are regulated by a combination of proteins, which are expression from several genes. This is consistent with the (random) Boolean networks that allow several genes to affect a single gene. [4] The networks simulated in this report are not all random as in random Boolean networks. Some has a fixed network, which has been proposed as models for different regulatory networks. In one sense these networks are also random due to the fact that the Boolean rules are assigned randomly.. 6.

(16) Theory Graphs and Networks The networks are represented by graphs or more precisely by so-called digraphs. Before the concept of digraphs is described some properties of graphs will be presented. The main components in a graph are edges and vertices (see Fig 2). The dots in Fig 2 represent the vertices and the arrows represent the edges. Arrows represent the edges due to the fact that the graph in Fig 2 is a digraph, which means that the edges are directed. In an ordinary graph simple lines represent the edges. The reason for using a digraph instead of an ordinary graph is that it is not always desirable for two vertices, which are connected by an edge, to have mutual influence on each other.. Fig 2: The figure shows an example of a digraph. A graph G is defined with the help of Eq1. V (G) and E (G) are the set of vertices respectively the set of edges which defines the graph G. One way of representing a graph defined by Eg1 is to use an adjacency matrix. A slightly modified version of an adjacency matrix is used in the implementation. For more information about the adjacency matrix and its implementation see under the topic Implementation of the network representation. [5] V (G ) (Eq1)   E (G ) Eq1 is not enough to describe the type of networks simulated in this report. The essential part that is missing is the Boolean rules and they have been added in Eq2, which gives a complete definition of the networks used. V (G )   E (G )  B(G ) . (Eq2). B(G) in Eq2 is the set of Boolean rules which together with V(G) and E(G) defines a network. The definition in Eq2 opens up for a vast number of possible networks. The formula in Eq3 gives a hint on the number of possible networks.  2 2 N!   =  ( N − K )!    k. No Net. N. (Eq3). N in Eq3 denotes the number of vertices in the network and k denotes the number of indegrees to every vertex. Indegrees is the number of inputs for a vertex. The formula in Eq3. 7.

(17) does not give the number of every possible network. It assumes that every vertex has the same number of indegrees. To get a picture on how large the number of possible network is consider a network with N=10 and k=2, and Eq3 will give the value NoNet≈3.83e31. The possible number of networks is even larger if k would represent the mean number of inputs, for an arbitrary probability distribution, instead of the actual number of inputs. [2] One type of graph is the Erdős-Rényi graph (ER-graph). From now on the word ER-network will be used instead of the word ER-graph. In an ER- network the edges are chosen from all possible edges with equal probability. In a digraph the number of possible edges is given by N(N-1), where N is the number of vertices. It follows from the statement in the previous sentence that the probability for one edge to be chosen is equal to 1/(N(N-1)). For large N the ER- network has both Poisson distributed indegrees and Poisson distributed outdegrees. The Poisson distribution is given in Eq4. p(k ) =. exp(−m) ⋅ m k k!. (Eq4). The letter k in Eq4 denotes a positive integer, which either represents the number of indegress or the number of outdegrees. The m represents the mean number of outdegrees or indegrees depending on what k represents. For example when k represents the number of outdegrees, m represents the mean number outdegrees. See Fig 3 to see the Poisson distribution for some different values on m. [6] Poissom distribution for different values on m. 0,7 0,6 0,5 0,4 p(K) 0,3 0,2 0,1 0 0. 1. 2. 3. 4. k. m=0.5 5. 6. m=1 7. 8. m=2 9. Fig 3: The figure shows the Poisson distribution for three different values on m. One interesting property for the ER-networks, with Boolean rules that are drawn from a flat distribution, is that above a certain value for the mean number of indegrees, the network will 8.

(18) be placed in the chaotic regime. The value for the mean number of indegrees is obtained with help of a simplified model. The simplification that has been made is called the annealed model. The annealed model makes it possible to get an analytic result on how the distance between two states evolves with time. In the annealed model new Boolean rules are assigned after every time step. In the networks simulated in this thesis the quenched model is used, which means that the Boolean rules are kept intact for every time step. The ER-network is, according to the annealed model, in the chaotic regime when the mean number of indegrees lies above two. The same value is achieved when numerical simulations of the quenched model are performed. The numerical results for the quenched model and the analytical result for the annealed model can be found in the articles [7] and [8]. The edge between the chaotic and ordered regimen lies not always at the value two for the mean number of edges. It only lies there when the distribution of Boolean rules are flat. Eq5 gives the real edge between the chaotic and ordered region. The constant K is the average number of indegrees for a vertex and p is the bias, which is a parameter that affects the distribution of Boolean rules. 2 p (1 − p )K = 1 (Eq5). Another type of network that is interesting is the one, which have a power law distribution. These networks are called scale-free networks and the power law distribution is given in Eq6.. [. p (k ) = ζ (γ ) ⋅ k γ. ]. −1. (Eq6). The two constants, in Eq6, k and γ are the number of indegrees or outdegrees respectively a constant. The normalizing constant ζ(γ) is given in Eq7. ∞. ζ (γ ) = ∑ k −γ. (Eq7). k =1. The constants in Eq7 have the same meaning as in Eq6. [2] The constant ζ (γ) cannot always be estimated by summing a large number of terms. For example if γ=1, then ζ (γ) is a harmonic series, which diverges and it is therefore impossible to find a finite number for ζ (γ). Values for γ that are slightly bigger than one will converge but they will converge very slowly and therefore it is not suitable to use the method of summing many terms. A better method is integral approximation and it works by summing the terms in the beginning and then integrate from the last summed term to infinity. A better integral approximation is to calculate a mean from two integrals, which have starting points that differs with one term (See Eq8). s ≈ s n* = s n +. An +1 + An 2. (Eq8). The s in Eq8 is the actual value for the sum over all the series. The symbol s n* denotes an integral approximation that has n terms that have been summed. Summed terms are denoted sn in Eq8. In Eq9 sn is given. n. s n = ∑ f (k ) (Eq9) 1. 9.

(19) In the case of ζ (γ) will f(k) in Eq9 be f (k ) = k −γ . An is described in Eq10. ∞. An = ∫ f ( x )dx (Eq10) n. The functions f(x) in Eq10 and f(k) in Eq9 are the same function but k consists of discrete values and x consists of continues values. In the case of An+1 the expression is similar to the expression for An but the lower bound is n+1 instead of n. Now to the question: How many terms should one sum over? To answer this question one needs a way to approximate the error of the approximation and an expression for the error is given in Eq11. s − s n* ≤. An − An +1 2. (Eq11). The expression in Eq11 guaranties that the real error is equal or less than the result from the expression. The result of applying Eq8 on Eq7 is shown in Eq12 and it is the approximation for ζ (γ). [9] n. s ≈ ∑ k −γ + k =1. n1−γ + (n + 1) 2(γ − 1). 1−γ. (Eq12). In a similar way it is possible to derive an expression for the error by applying Eq11 on Eq7. This expression is shown in Eq13.. n1−γ − (n + 1) 2(γ − 1). 1−γ. s − s n* ≤. (Eq13). An alternative to the function in Eq7 is Eq14, which allows that the value k equals zero.. [. p(k ) = g (γ ) ⋅ (k + 1). ]. γ −1. ∞. , where g (γ ) = ∑ (k + 1). −γ. (Eq14). k =1. If one applies Eq8 on Eq14 the result will be the expression shown in Eq15, which is the formula to approximate the sum. 1−γ 1−γ ( n + 1) + (n + 2 ) s ≈ ∑ (k + 1) + 2(γ − 1) k =1 n. −γ. (Eq15). The error approximation for Eq14 is shown in Eq16.. s − s n* ≤. (n + 1)1−γ − (n + 2)1−γ 2(γ − 1). (Eq16). There are at least two interesting things with scale-free networks. The first thing is that they occur in many real networks, such as in the world-wide web, internet and science 10.

(20) collaboration network [6]. The other thing that is interesting is that the chaotic regime does not dominate the parameter space, which is opposite to the case of ER-networks. The mean connectivity is not a relevant parameter to describe the topology of scale-free networks. So the parameter γ is used instead. In the article [10] an expression, to determine the critical value for γc, is found, which follows from simulations with the Derrida plot. The critical value γc is the value, which lies on the limit between the chaotic and ordered regime. The expression is given in Eq17. 2 p (1 − p ). ζ (γ c − 1) = 1 (Eq17 ) ζ (γ c ). The constant p in Eq17 is called bias and determines the probability for true or false to occur (for more information see the topic Boolean rules). The function ζ is given in Eq7. A property of the networks that can be interesting to study besides the probability distributions for outdegrees and indegrees is so called motifs. Motifs are small networks within the network. There are many different motifs. One example is the feed forward loop [11]. The interesting thing is that in real networks, biological networks and electronic circuits, the occurrence of some motifs are higher than in completely random generated networks [12].. 11.

(21) Boolean rules. Two types of Boolean rules are described under this topic. The two types are nested canalizing rules and rules drawn from a flat distribution. Flat distributed rules Flat distributed rules are described under this topic. Flat distributed rules are Boolean rules, which are drawn from all possible Boolean rules. The word flat refers to that all Boolean rules have the same probability to be drawn. The number of Boolean rules grows when the number of inputs is increased. Already at relatively small number of inputs the number of possible Boolean rules is quite large. An expression for calculating the number of possible rules is given in Eq18. [2]. N = 22. K. (Eq18). N is the number of Boolean rules and K is the number of inputs. See Tab 2 to get the picture of how large the number of possible Boolean rules is for different number of inputs. How are the rules chosen with equal probability? To answer this question consider for example a rule with two inputs A and B. Applying Eq18 gives the fact that the number of possible Boolean rules are sixteen. All possible rules are listed in Tab 1. One interesting property to look at is the number of ones and zeros. If one looks at this number for each row R1-R16 one can see that half of the numbers are one. From the fact stated in the previous sentence one can draw the conclusion that if one assigns either one or zero with the probability one half, one will achieve the goal to draw rules from a flat distribution. It is the values in the output column in a truth table that are assigned with one or zero with a fifty percents probability. Tab 1: The table shows all possible rules for two inputs.. A 0 0 1 1. B 0 1 0 1. R1 0 0 0 0. R2 0 0 0 1. R3 0 0 1 0. R4 0 0 1 1. R5 0 1 0 0. R6 0 1 0 1. R7 0 1 1 0. R8 0 1 1 1. R9 1 0 0 0. R10 1 0 0 1. R11 1 0 1 0. R12 1 0 1 1. R13 1 1 0 0. R14 1 1 0 1. R15 1 1 1 0. R16 1 1 1 1. The probability for assigning either one or zero is often referred to as bias. The term no bias can be used when the probability for true is fifty percent. Depending on this bias some rules have a higher probability of being drawn. The flat distribution is, as mentioned before, achieved with a fifty-percent probability to obtain a one and if the probability is less than fifty, rules with fewer ones will have a higher probability of being chosen. It can be seen in Tab 1 that R1, R2, R3, R5 and R9 have a higher probability of being chosen.. 12.

(22) Canalizing and Nested canalizing rules The concept of canalizing rules is described before the nested canalizing rules are explained. Canalizing rules are a subset of all possible Boolean rules and they have a special feature. The feature is that at least one input can determine the output regardless of the values on the other inputs. An example of a canalizing rule is a+bc, were a, b and c either have the value 0 (false) or 1 (true). The rule is canalizing because if a is one the output will become one regardless of the values on b and c. An example of a non-canalizing rule is exclusive or (XOR). XOR is non-canalizing, because it is impossible to determine the output value without having knowledge about the values at both the inputs. The feature, which makes canalizing rules interesting, is that they have the ability to repress chaotic behaviour. In other words, a network with a high degree of canalizing rules should lie in the ordered regime. Canalizing rules are present among the rules drawn from a flat distribution of all possible Boolean rules. The probability for canalizing rules is dependent on the bias and the number of inputs. The exact dependence is given in Eq19. [13]. (. Pr p (C ) = (− 1) p 2 + (1 − p ) n. n. 2n. )− 2np. (. 2 n −1. (1 − p )2. n  n n−k k +1  n  2n −2n − k + ∑  (− 1)  2 k p 2 − 2 + (1 − p ) k =1  k. n −1. ). (Eq19). . Eq19 will not be derived here but a good derivation exists in the source article [13]. The bias is denoted by p in Eq19. The variable n in Eq19 denotes the number of inputs. C in Eq19 denotes that the equation gives the probability for canalizing rules. In Tab 2 the number of possible canalizing rules is given for different number of inputs. The data in the table is fetched from [13]. The conclusion that can be drawn from data in Tab 2 is that the relative appearance of the number of possible canalizing rules is decreasing as the number of inputs is increasing Tab 2: The table shows the number of possible canalizing for different number of inputs.. n. |C|. Total number of possible Boolean rules. 1 2 3 4 5 6 7 8. 4 14 120 3514 1292276 103071426294 516508833342349371 108890357414700308266959 16769153787968498 4.168515213e78 5.363123172e155. 22 = 4 2 2 2 = 16 3 2 2 = 256 4 2 2 = 65536 5 2 2 =4294967296 6 2 2 ≈ 1.8e19 7 2 2 ≈ 3.4e38 8 22. 9 10. 13. 1. 9. 22 10 22.

(23) Now when the properties of canalizing rules have been described it is time to give a description of nested canalizing rules, which are a subset of canalizing rules. Nested canalizing rules have the advantage of being easy to generate. The nested canalizing rules guarantee that the rules are canalizing. The main components in nested canalizing rules are two lookup tables, which contain the output values respectively values that are used for comparison with the input values (see Fig 4). Capital o followed by an index denotes the values in the look up table with output values. Lower case o denotes the actual output value, which contains one of the values from the look up table with output values. Lower case i followed by an index is an input value, which is going to be compared with the values denoted by I followed by an index. The first thing that happens is a check if I1 equals i1 and if it does the value O1 is chosen as output. If I1 does not equal i1 the next position is compared, and this goes on until two values that are equal is found or all positions in the look up table have been tried. In the case when no match is found, the output is chosen to equal the value Od. Look up table with output values.. Look up table with values for comparison with the input values.. O1. Choose if I1=i1 else compare next.. I1. i1. O2. Choose if I2=i2 else compare next.. I2. i2. O3. Choose if I3=i3 else compare next.. I3. i3. Ok. Choose if Ik=ik else compare next.. Ik. ik. Od. Choose if no fit was found.. o=. Look up table with input values.. Fig 4: The figure shows a sketch over how the nested canalizing rules work.. The part that is left to describe about nested canalizing rules is how the output values and the values, which is compared with the input values, are generated. They are generated randomly but not from a flat distribution. The distribution is given in Eq20.. P(I m = true ) = P(Om = true ) =. (. ). exp − 2 − m α … (Eq20) 1 + exp − 2 − m α. (. ). The letter m in Eq20 denotes the index and is defined as m=1, 2,….k-1,k. The symbol α is only a constant and it is set to be seven if nothing else is written. The distribution in Eq20 shall according to the authors of [14] result in rules that are biologically relevant. Od is not assigned with the help of the distribution given in Eq20. It is assigned by giving it the inverted value of Ok. [14]. 14.

(24) Entropy. Under this topic the subject of entropy will be dealt with, not the entropy defined in thermodynamics, but the closely related Information-Theoretic definition of entropy. The first task is to define what information is. Given a discrete random variable X={xk | k=0, 1,…, K}, which can take on the value X=xk with the probability of pk. The information we would get by observing the event X=xk is defined by Eq21.  1  I ( x k ) = log  = − log( p k )  pk . (Eq21). It is possible to use a logarithmic function of an arbitrary base but in this report and in all simulations the logarithmic function with the base two will be used. For the definition in Eq21 to make any sense some restrictions on pk are required and these restrictions are given in Eq22.. p k ∈ [0,1] and. K. ∑p. k = -K. k. =1. (Eq22). The units used for the information I(xk) depends on the base of the logarithmic function and the units are therefore bits. Now when information has been defined it is time to give entropy its Information-Theoretic definition. Entropy is “a measure of the average amount of information conveyed per message” [14]. What a “message” is, in the context of simulating Boolean networks, will be discussed further down in the text. A formal definition of entropy is given in Eq23. H ( X ) = I ( x k ) = −∑ p k ⋅ log( p k ). (Eq23). In this report and in all simulations the units for the entropy are bits because, as mentioned earlier, the logarithmic function has the base two. The size of the entropy H(X) has a lower bound and an upper bound. The bounds for the entropy are given in Eq24. H ( X ) ∈ [0, log(K + 1)]. (Eq24). The lower bound 0 comes from the fact that if all the states, but one, have a zero probability to occur Eq23 will result in a zero result. It will give a zero result because the state with the nonzero probability must have the probability of one. The states with the zero probability will also give a zero result in accordance with the facts stated in Eq25. p k log p k → 0, when p k → 0 +. (Eq 25). The upper bound in Eq24 is based on the fact that all, K+1, states have the same, non-zero, probability to occur. [14] Now when entropy and some of its properties have been defined it is time to describe how it can be used for determining if a Boolean network is in the ordered or chaotic regime. In information-theoretic one talks about the uncertainty of a random variable [15]. The entropy 15.

(25) in thermodynamics is a measurement of the disorder in the system. A higher degree of disorder in the system will result in higher entropy [16]. The uncertainty of variable is an analogue to the disorder of a thermodynamic system. So the measurement of the entropy is a measurement of the degree of disorder in the “message”. Higher entropy means a lower degree of order, that is the “message” is more disordered. What is a “message” in context of Boolean networks? The answer depends on if one means the ideal “message”, which would contain information about all possible states generated by the network of interest, or the “message” which is practically possible to obtain. To start with the ideal “message” will be defined. The ideal “message” S(G) consists of the probability for all possible states generated by a graph with a certain set of vertices, set of edges and set of Boolean rules (see Eq26). All possible states means the states the network will generate for every possible initial state after an infinite number of time steps. V (G )   E (G ) ⇒ S (G )  B(G ) . (Eq26 ). From the description of the ideal “message” above it follows that it would be impossible to obtain the ideal “message” to calculate the entropy, because the number of states would be infinite. So the first step towards obtaining a practical “message” is to limit the number of time steps. What the limit should be set to, can be hard to say because it is probably dependent on the graph. For most graphs it is impossible to gather states from all initial states, because the number of initial states grows large even for relatively small networks. For example, a network with 100 vertices would result in 2100 ≈ 1.28e+30 initial states. It would be too time consuming to start gathering information from all initial states, so the solution is to choose a few initial states. The estimation of the entropy, which is based on the practical message”, will be denoted He and it will be calculated by taking the mean from a number of subsets of S(G) (see Eq27). Every Subset Pn(G) consists of the probabilities for every state, which as occurred for a single initial state and a definite number of time steps.  P1 (G ) ⊆ S (G )  H 1 (P1 (G ))  P (G ) ⊆ S (G )  H (P (G )) 1  2  ⇒ 2 2 ⇒ He =  N M M  PN (G ) ⊆ S (G )  H N (PN (G )). N. ∑ H (P (G )) n =1. n. n. (Eq27). There is no guarantee for that the estimation of the entropy will give a good picture of the degree of order in the network, because the entropy is only calculated from a limited number of subsets and these subsets might not give a justifying picture of S(G). Although this method of estimating the entropy suffers from some disadvantages it can still be justified. It can be justified due to a typical phenomenon, for chaotic systems, strange attractors. In the discrete systems simulated in this thesis one cannot talk about true strange attractors because systems have a finite number of states. The term strange attractors will be used any way to denote systems, which have a behaviour that seems to be random. Before a description of strange attractors is given attractors will be described. In ordered systems there are so called attractors and these attractors are states, which the systems are “drawn” to. An attractor is often an equilibrium state or some kind of periodic behaviour. An example of a system with an attractor is bowl with a marble. As long as the marble is placed within the bowl one can be 16.

(26) sure that no matter were the marble is placed it will roll down to the bottom of the bowl. The position at the bottom of the bowl is an attractor. Now when a description of attractors has been given it is time to describe strange attractors. Strange attractors are in one way the opposite of attractors. A strange attractor is associated with unpredictability of the motion of the state when there is a small uncertainty about the initial state. Very few examples of strange attractors are not chaotic according to definition based on Lyapunov exponents. [17] The question is, how can strange attractors and attractors justify the estimation of the entropy He? The answer is, in an ordered network there will be attractors, which “attracts” the initial states to either one state or a cycle of states and this will lead to relatively low entropy. In a system with strange attractors the network will not converge to a state or cycle of states and the states of the network will appear to be completely random, this will lead to very high entropy. The biggest disadvantage for this method is that it will not give an accurate measurement of the order of the system when system lies in the ordered regime, because depending on the initial state the network can converge to a single state or a cycle of states. A cycle will of course result in a higher entropy than a single state. The length of the cycle will also have effect on the entropy. It is impossible to know if the chosen initial states will result in representative number of cycles and single states and therefore it is not suitable to use this method as an exact measurement of the order in the system. Another disadvantage is that if a cycle, which the system has converged to, exceeds the number of measured states the entropy will take on its highest value. The highest practical entropy is dependent on the number of measured states. It is given by the logarithm of the number of measured states and has nothing to do with the real maximum entropy for the system. According to [18] numerical simulations of networks in the ordered regime have shown that the cycle length often is of the same order as the square root of the number of vertices. Later research has shown that the statement in the previous sentence is false [2]. The disadvantages mentioned above makes it important to have in mind, that one should use the method with caution. It is not recommendable to use this method as the sole indicator on if the system is in the ordered or chaotic regime. Hamming distance. Hamming distance is a measurement of the difference between two bit streams. It is the number of bits between the bit streams, which is not equal, that is measured [19]. Consider two vectors with values, either one (true) or zero (false), A=(a1 a2 … an) and B=(b1 b2 … bn). With these vectors the definition of hamming distance will take the form shown in Eq28 [2]. n. d ( A, B ) = ∑ ai − bi. (Eq28). i =1. To exemplify the use of Eq28 consider two vectors with the size 4, (1 0 0 0) and (1 1 1 0). The hamming distance for the two vectors is d((1 0 0 0),(1 1 1 0))=|1-1|+|0-1|+|0-1|+| 00|=2.. 17.

(27) Derrida plots. The Derrida plot is a tool for determining if a network lies in the chaotic or ordered regime. With other words it is a tool to determine if the network is robust. The robustness can be determined by examining how small perturbation of the initial state affects the dynamics of the network. In practice this is done by measuring the distance between the initial state and a perturbed version of the initial state and then update the two states and measuring the new distance. The distance between the two states that have been mentioned in the previous sentences is the Hamming distance. Given an initial state and a perturbed version of it, SA=(1 0 0 0) and SB=(0 0 0 0), and a network that is defined by a set of vertices V(G), a set of edges E(G) and a set of Boolan rules B(G). One can derive two new states, SA’ and SB’, by updating the network one time step. So for example let SA=(1 0 0 0)→ SA’(1 0 0 1) and SB=(0 0 0 0)→ SB’(0 1 0 0). This will give the hamming distances d(SA, SB)=1 and d(SA’, SB’)=3. These two hamming distances will, from now, be denoted d(T) and d(T+1). Where d(T) is the hamming distance before updating and d(T+1) is the hamming distance after updating. The value d(T) is plotted along the horizontal axis and d(T+1) is plotted along the vertical axis. So far only one point for the Derrida plot has been derived and to get more points new initial states, with new Hamming distances, must be sampled. [18] According to the article [18] the Derrida plot is the binary discrete counterpart of the Lyapunov exponents. The Lyapunov exponent is a well-accepted tool to diagnose if a system lies in the ordered or chaotic regime. Lyapunov exponents measure the sensitivity for perturbations of the initial state. A small distance between two initial states is chosen and when a small amount of time has elapsed the new distance is measured. The new distance can be written as the initial distance multiplied with an exponent with an arbitrary base. If the average exponent is larger than zero the system is in the chaotic regime and if it is smaller the system lies in the ordered regime. [17] To evaluate if the network is in the chaotic or ordered regime one compares the derived plot from the network with the diagonal of the graph. This diagonal corresponds to the situation when the Hamming distance for the two initial states is equal to the Hamming distance after one time step. In a situation, as in the last sentence, the system is considered to be in the ordered regime. The network is also considered to be in the ordered regime if the curve derived from the network lies under the diagonal. When the curve lies above the diagonal the network is considered to be in the chaotic regime [18]. To justify the three statements done in the three past sentences it can be interesting to compare with the Lyapunov exponents. The network is in the chaotic regime when the Lyapunov exponent is larger than zero and this means that the distance will grow larger than the initial state when some time has elapsed. This behaviour of growing distances is shown in the Derrida plot by the fact that the curve derived from the network lies above the diagonal. A Lyapunov exponent that is zero corresponds to a curve that lies on the diagonal in the Derrida plot. A curve, which lies below the diagonal in the Derrida plot, corresponds to a negative Lyapunov exponent. No rigid proof for that Derrida plots actually measures the presence of chaos exists but the similarities with the Lyapunov exponent, makes it a credible tool to measure if a network lies in the ordered or chaotic regime. Both Derrida plots and Lyapunov exponents measure the sensitivity for perturbation of the initial state.. 18.

(28) Noise analysis tool. Entropy and Derrida plots are in one way a noise analysis tool because they are indicators on the sensitivity for perturbation of initial states, but to get a picture on how the network is affected by noise over longer periods one needs another tool. One suitable method is to use a slightly modified version of the method presented in [7]. The method in [7] is based on that two initial states, with a certain Hamming distance, is chosen and then updated over several time steps. The Hamming distance between the states is calculated for every time step. To analyze the consequences of noise only one initial state is chosen. The state is updated in the same way as in [7], but instead of calculating the hamming distance between states that have originated from two different initial states, it is calculated between states, which have been updated with noise, and states, which have been updated without noise (See Fig 5). A 0. Calculate the Hamming distance. B 0. 0. 0. 0. 0. 0. Update one time step without noise 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 1. 0. 1. 0. d(A,B)=1. Update one time step with noise 0. 0. 1. Update one time step without noise 0. d(A,B)=0. Update one time step with noise. Update one time step without noise 0. 0. 1. d(A,B)=3. Update one time step with noise 1. 0. 1. 0. d(A,B)=4. Fig 5: The figure shows an example of how the Noise analysis tool works. The number of time steps taken is four.. The idea with the noise analysis tool is to look at a plot with the Hamming distances plotted against the time steps. To determine if one type of network is better than another type, one simply plots the Hamming distances for both the networks in the same plot. Features one should compare are the level of the curves and their slopes. The number of time steps can be chosen arbitrary. Now when the noise analysis tool has been described it is time to describe the different types of noise, which will be used for testing the network. Basically there are two main categories of noise, structural noise and dynamical noise. Dynamical noise is noise that affects the state. Structural noise is noise, which affects the topology of the network by for example randomly adding and removing edges. The simulations performed in this report do not include tests for structural noise. Dynamical noise has to do with the state or transfer of the state from one vertex to the input of another vertex. The Noise analysis tool has the ability to test three types of dynamical noise. These types are freezing of a state (Delay noise), randomly altering a state (State noise) and randomly disturb the transfer of states (Transfer noise). Freezing of a state will be described first. The freezing of a state means that for every vertex there is a. 19.

(29) probability that its state is not updated (See Fig 6). In Fig 6 the probability for freezing a state is denoted p and the states are denoted by the letters a to d. The index on the letters describes what time step they come from. It is possible for a state to stay freezed for more than one time step. In Fig 6 ct is an example of a state of that is freezed over two time steps. at. bt. ct. dt. p-1. p-1. p-1. p-1. at+1. bt+1. ct. dt+1. p-1. p-1. p-1. p-1. at+1. bt+2. ct. dt+2. Update the individual states one time step with a probability of p-1. Update the individual states one time step with a probability of p-1. Fig 6: Shows an example of how the Delay noise works.. The second type of noise is when the actual state changes with a certain probability p (see Fig 7). In Fig 7 the probability for a single state to invert is denoted with p and an inverted state is denoted with the state followed by an apostrophe. For every single state there is a probability that the state will be inverted. The updating of a state is performed after the noise has been applied. at. bt. ct. dt. p. p. p. p. at. bt’. ct. dt. Invert the individual states with a probability of p. Update all states one time step. Invert the individual states with a probability of p. at+1. bt+1. ct+1. dt+1. p. p. p. p. at+1. bt+1. ct+1. dt+1’. Fig 7: The figure shows an example of how the noise, which affects the actual states, works.. The first two noise types are applied directly to the states. The third noise type is not applied directly to the states. It is applied on the transfer of states between vertices. Transfer of states only occurs between vertices that are connected by an edge. The transfer of states is. 20.

(30) performed before the updating of the states. The behavior of Transfer noise for a transfer of a state between two vertices is shown in Fig 8. The probability for sending the state without any change is denoted p-1 and p is the probability for that the transfer goes wrong. True. p-1. Startvertex.. True Endvertex.. p p False. False. p-1. Fig 8: The figure shows a schematic sketch over how Transfers noise works.. One question that rises is, why is this third type of noise interesting? The answer is that in the other two types of noise the states are affected directly but in the third type they are only affected indirectly. If the Boolean rules are arranged in a certain way the third type of noise is not necessarily visible. Standard error. In all analysis methods described in this report it can be interesting to take means and standard errors over several Boolean rules and initial states. A formula for estimating the standard error is given by Eq29. s=. 1 n ∑ (x j − x n − 1 j =1. ). 2. (Eq29). It is easier to use another formula for estimating the standard error and it is possible to derive it from Eq29. The equation is not derived but it is given in Eq30.. 2  1  n  n   1  2    ⋅ ∑ x j −  ∑ x j    s=  n  j =1     n − 1  j =1   . (Eq30). Eq30 is more practical to use because one must not calculate the mean before one calculates the standard error. [20]. 21.

(31) Implementation This section consists of a description of how the essential parts of the program have been implemented. The essential parts consist of those parts, which can be considered to be necessary to estimate the validity of the results. The programming was done in C++ using Dev-C++ 4.0. Implementation of the network representation. The network structure is stored in a slightly modified version of the adjacency matrix. In the adjacency matrix every position in the matrix represents a possible edge. The position in the column and the row vector determines which vertices the edge should be drawn between. A number can represent the number of edges but this feature is not implemented [5]. In this implementation the row vectors represent the start vertices and the column vectors represent the end vertices (See Fig 9). b. a. a 0  A(G ) =  0 1 . c. b 1 0 0. c 0 a  1 b 0  c. Fig 9: The figure shows a graph and the corresponding adjacency matrix.. Every column and row vector does not need to be stored due to the fact that an adjacency list representation, which consists of several single linked lists, is used. It saves memory by only storing the positions in the matrix that actually represent an edge. A head node represents every vertex. Nodes with a number that refers to a position in the row in the adjacency matrix are connected to the head nodes (See Fig 10). The head nodes, which have no indegrees, do not have any nodes. [21]. Head Nodes. 0. 1. 1. 2. 2. 0. Single linked lists. Fig 10: The figure shows the adjacency list representation for the graph in Fig 9.. The head nodes correspond to the end vertex, for the edges, and the single linked list connected to each head node corresponds to the start vertices for every edge that has the same end vertex. In Fig 10 the first head node has the number zero and this corresponds to the first position in the column (see the adjacency matrix in Fig 9). The node that is connected to the first head node in Fig 10 has the number one, which corresponds to the second position in the row. The adjacency matrix is represented in this way because it is not necessary to store all the zeros.. 22.

(32) Implementation of the updating of states. The updating of states is synchronous and to get a synchronous updating, input-buffers are used. The input buffer is placed in “front” of every vertex. For example, the vertex (head node) zero in Fig 10 has a buffer, which contains the state from vertex one. All the inputbuffers are updated before the states (see Fig 11). The updating of the input buffers is performed by a loop, which steps through the vector with vertices (head nodes). For every vertex (head node) a loop steps through the single linked list, which contains the positions of the vertices with the states that are used for updating the input buffer. The updating of the input buffer is conducted after the single linked list has been stepped through.. Updates the input buffers for every vertex.. Use nestled canalizing rules?. No. Update states by using flat distributed rules.. Yes Update states by using nestled canalizing rules.. Fig 11: The figure shows a simplified flow chart for how the states are updated.. The procedure for updating the states is a bit different depending on if nestled canalizing rules or flat distributed rules are used. For flat distributed rules the output value is stored in a vector. The input buffer decides the value that is going to be used as output. It is decided by the position in the vector with output values. The input-buffer can be seen as a binary number and this number can be transformed to a decimal integer, which constitutes the position in the output vector. The first position in the input-buffer is defined as the least significant bit and the first position contains the state from the vertex, which is first in the single linked list. A higher position in the input buffer means a more significant bit and from this follows that the last position is the most significant bit. In the case with nestled canalizing rules the output is also dependent on the position in the vector with output values but the position is not determined in the same way as in the case of flat distributed rules. The position is determined by comparing a predefined vector with the input-buffer (for more details see under the topic Boolean rules in the Theory part).. 23.

(33) Implementation of the creation and updating of the Boolean rules. The first subject to be dealt with under this topic is the creation of flat distributed rules. The task that is performed first when creating the flat distributed rules is to determine the number of indegrees (inputs) for the vertex. The number of inputs is determined by the size of the input-buffer. If the number of inputs is higher than fifteen no Boolean rule will be created and the program will be terminated. It will be terminated due to the fact that the vector with output values would grow very large and cause memory problems (See Fig 12). After the vector with output values has been created a value, either true of false, is assigned to every position in the vector. To draw a Boolean rule from a flat distribution of all possible Boolean rules one assigns true or false, with fifty percents probability, to every position in the vector with output values (for more details see under the topic Boolean rules in the Theory part).. Get the number of indegrees (inputs).. No No. of inputs <15?. To many inputs!. Yes Create the vector for the output values with size 2(No. of inputs).. Assign random values from a flat distribution to the vector with output values. Fig 12: The figure shows a flowchart for the process of creating Boolean rules from a flat distribution. The creation process for nestled canalizing rules is somewhat similar as for flat distributed rules (see Fig 13). It starts in the same way by determining the number of inputs but there exist no upper limit for the number of inputs. Instead of creating one vector, two vectors are created, one with the same size as the input buffer and another that is one position bigger. The first vector shall contain values which are going be compared with the values in the inputbuffer. The second vector contains the output values. Both vectors are assigned with, either true or false, randomly from a special probability distribution given in Eq20. After the values have been assigned, the last position in the vector with the output values is determined by equaling it with the inverted value at the second last position.. 24.

(34) Get the number of edges, which has the current vertex as end vertex (number of inputs).. Create two vectors. One with a size that equals the number of inputs and another, which is one position bigger.. Assign random values to the two vectors.. Assign the last value in the vector, which is one position bigger by equaling it with the inversion of the second. Fig 13: The figure shows a flowchart for the process of creating nestled canalizing rules.. When loading or creating a network the Boolean rules are not created instantly but they evolve as new edges are added to the network. The two earlier methods described under this topic are used for simulations, which need to draw statistics from more Boolean rules than the original ones. The process of updating creates the original Boolean rules. The first step is to determine the number of inputs for the end vertex of the newly added edge (see Fig 14) and this is done in the same way as in the processes of creating Boolean rules. Depending on if nestled canalizing rules or Boolean rules drawn from a flat distribution are used, one of two different courses of events is executed (see Fig 14). Memory problems can occur if rules from a flat distribution are used. A temporary vector with the size 2x is created and it will be used to store the output values, which existed before the new edge was added. The letter x equals the number of edges before the new edge has been added. When the output values have been stored in the temporary vector the vector, which originally contained the output values, is deleted and a new vector is created with the size 2y. The letter y equals the number of edges after the new edge has been added. After the new vector with output values have been created the values from the temporary vector are copied to the first positions in the new vector and the positions, which are left are assigned to either true or false with a fifty percent probability. The process for nestled canalizing rules has no check on the number of inputs due to the fact that the length of the vectors created is linearly dependent and will therefore cause no memory problems. Two vectors are copied, one with outputs values and one with values that is going to be compared with the input, to two temporary vectors. The two original vectors are deleted. After the two vectors are deleted new ones are created but they are one position bigger and the values from the temporary vectors are copied to the new vectors. New values for the last positions are created in the same way as for the process of creating nestled canalizing rules.. 25.

(35) Get the number of inputs (indegrees).. Use nestled canalizing rules?. No. Yes. No. Nr of inputs<15?. Did there exist any input before?. Too many inputs. Yes. Yes Create two vectors with one position and one with two positions. They are assigned the values true or false according the distribution in Eq20.. No. Did there exist any input before?. Create two temporary vectors. One for the values, which is compared with the values in the input buffer and another for the output values. All the values are copied to the temporary vectors and new versions are created which is one position bigger. Create temporary vector for the output values with size 2(No of edges-1) and copy the output values to the temporary vector. Delete the vector with output values and create a new one with the size 2(No of edges) .. All the values except the last in the temporary vector with output values are copied to the new vector. The two last positions are assigned randomly respectively by the inverted value of the second last. All the values in the other temporary vector are copied to the other new vector. The value of the last position is determined randomly.. No. Create the vector for the output with two positions and assign the two positions with either true or false with fifty- percent probability. The values of the temporary vector are copied to the first positions in the new vector for output values. The rest of positions are assigned the value true or false with 50 percent probability.. Fig 14: The figure shows the flowchart for the process of updating the Boolean rules after an edge has been added.. 26.

No results found