Peer-to-peer supercomputing: de-centalized data tracking and processing

(1)

Peer-to-peer supercomputing:

de-centralized data tracking and processing.

MARTIN ANDERSSON

Master of Science Thesis in Computer Science Stockholm, Sweden 2011

(2)

(3)

Thesis in computer science for a Master’s degree in Computer Science.

Examensarbete inom ämnet datateknik för avläggande av civilingejörs-examen inom utbildningsprogrammet Datateknik.

Martin Andersson, March 2012 Erik Lindahl, Sander Pronk

Typeset in LA_TEX

(4)

Abstract

This project focuses on several workflow optimizations, as well as a message deliv-ery system guaranteeing transactional message delivdeliv-ery with persistence and in-order server-side handling for Copernicus, an application for parallel adaptive molecular dynamics developed by the department of Theoretical Physics at the Royal Institute of Technology in Stockholm in 2011. We introduce and briefly explain the Copernicus workflow and show how the optimizations increase performance. We also conclude that the message delivery system performs with a negligible overhead in worst-case scenarios.

(5)

Acknowledgments

I would like to thank Erik Lindahl, my supervisor Sander Pronk, and Iman Pouya for all the help during development and for providing knowledge about molecular dynamics and the mathmatical background behind it.

(6)

Acknowledgments . . . v

1 Introduction . . . 1

1.1 Copernicus . . . 1

1.2 What is molecular dynamics? . . . 2

1.3 Markov state modeling . . . 3

1.4 Adaptive sampling . . . 3 1.5 Copernicus technologies . . . 3 1.5.1 Postprocessing . . . 3 1.5.2 Copernicus network . . . 4 1.6 Related Technologies . . . 4 1.6.1 Folding@Home . . . 4 1.6.2 BitTorrent . . . 5 1.6.3 Apache Hadoop . . . 5

1.7 Motivation and goals . . . 6

2 Optimizations . . . 7

2.1 System design and Implementation details . . . 7

2.1.1 Command completion optimization . . . 7

2.1.2 Asset tracking . . . 7

3 The Message Delivery System (MDS) . . . 9

3.1 System design and Implementation . . . 9

3.1.1 Entities . . . 9

3.1.2 Variables and Constants . . . 10

3.1.3 Message headers . . . 10

3.1.4 Sending and receiving procedure . . . 11

3.2 Results . . . 14 3.2.1 Implementation Details . . . 14 3.2.2 Performance evaluation . . . 15 4 Future Work . . . 20 5 Conclusion . . . 21 vi

(7)

Bibliography 23

(8)

1 Introduction

1.1 Copernicus

Copernicus is a platform for supercomputers that allows for distributed simulations of molecular dynamics with kinetic clustering and statistical monitoring.[7] Users can submit jobs (termed projects) and a controller automatically allocates resources to man-age the projects and the simulations derived from it.

A Copernicus network consists of servers and clients. Servers form the topology in the Copernicus network and act as routers and storage. Projects are submitted to a server, which becomes the controller server for that specific project.

There are two types of clients, workers and regular clients. Workers can range from a single node to a cluster of hundreds of thousands of nodes in a supercomputer. It is the workers’ responsibility to handle all simulations as delegated by a controller. Workers usually have at least one low-latency connection to a server, whereas connections be-tween two servers can be of any quality/latency.

Regular clients are used by users to submit projects to a server via a command interface.

An example of a Copernicus network is shown in Figure 1:

Figure 1: An example of a small Copernicus network architecture with six servers, with two of these servers acting as controllers. The overlay network topology is shown in the center, with dark solid lines.

The network is encrypted using SSL and its de-centralized nature makes it straight-forward to add your own workers and servers.

(9)

2

A brief description of how a project is handled follows:[7]

1. A project is submitted to a server either from the web interface or from a command-line client. Depending on the project type, a relevant controller plugin is called. The controller partitions the project into several subprojects and simulation jobs (termed commands).

2. As workers are ready they broadcast their availability to the servers in the net-work. A controller receives such a broadcast and replies by sending a command to each worker.

3. The workers starts the job described in the command received. Upon finishing the simulation each worker sends the result of the simulation (command output data) to the controller server.

4. The controller processes the data and decides whether more simulations are needed or if the project is considered finished.

1.2 What is molecular dynamics?

Proteins, the work horses of the cells in the body, are assembled from amino acids and RNA by ribosomes. When created, proteins reside in a native state called a random coil. For the protein to have a function, for example as an enzyme or antibody, it must first take on a particular three-dimensional structure[8]. This self-assembly is called folding. A protein can fold into a huge range of conformations, many of whom are only meta-stable, and exists solely for the purpose of moving between states. This is where molecular dynamics come in; it allows us to predict the physical movements of single atoms by simulating interactions between the atoms in very short time steps (on the order of magnitude of 10−15seconds). A molecular dynamics algorithm consists of the following steps:

1. Give the atoms initial positions and choose a short time step.

2. For each atom, calculate the forces acting upon it and the resulting acceleration. 3. Move each atom according to its acceleration and initial velocity.

4. Move time forward one time step.

5. Repeat the process from step 2 for as long as needed.

The interactions between the atoms are simplified, and which forces are used and how they are calculated greatly depends on the application of the simulation. It is also possi-ble to let some forces use a longer a time step than others to reduce the total simulation

(10)

time.

Individual simulations of the smallest biomolecules usually have 10,000 to 500,000 atoms, and simulating a time lapse of 1µs requires on the order of magnitude of 109 simulation steps.[7]

1.3 Markov state modeling

By simulating how proteins fold and misfold, scientists hope to achieve a better under-standing of numerous diseases such as Alzheimer’s, cystic fibrosis and some forms of cancer, as these diseases and more are believed to be caused by misfolded proteins.[1] The computational requirements of using molecular dynamics to simulate protein fold-ing introduce huge difficulties, which Copernicus addresses by creatfold-ing a model where geometrically similar conformations are grouped together into micro-states. The process is called clustering. The distances between conformations inside a micro-state are on the order of magnitude of one Ångström. The close proximity of the conformations inside a micro-state implies a kinetic similarity, meaning that moving between these confor-mations happen rapidly and with only a few molecular dynamic steps.[7]

In the early phases of the simulation the main task is to identify the micro states. As it is by no means sufficient to run just a single simulation from a state to discern its trajec-tories, you instead sample the state by running several such simulations. This way you can gain a statistical probability of how often and at what rate a given state converts to another.

The sampling is randomly over the clusters to limit the uncertainties of the state defi-nitions themselves.

1.4 Adaptive sampling

As the states stabilize, the sampling efficiency can be improved by using adaptive sam-pling. In adaptive sampling, new simulations are started by cherry-picking confor-mation transitions. You can either sample evenly or, more efficiently, distribute the sampling over the most uncertain ones. This makes it possible to converge to a set of states with kinetic properties to the transitions between them (that is, at what rate a certain conformation moves to another conformation).

1.5 Copernicus technologies

1.5.1 Postprocessing

After each batch of sampling, the new data must be merged with the model so that the controller can properly evaluate where to continue the sampling. This process is

(11)

4

referred to as post-processing. In accordance with adaptive sampling, the next batch of simulations is chosen to improve knowledge of the rates of transition between states. Post-processing pose a bottleneck in molecular dynamics, as all the data from previous simulations need to be present at one location, and the post-processing itself is very difficult to parallelize.

1.5.2 Copernicus network

Copernicus servers communicate using HTTPS and a connection pool, allowing for the same connection to be used between several projects. When sending a message through the network, each Copernicus server routes the message to the next server in the network path from the source to destination. A server will most likely not be able to connect to all other servers in the network, meaning that the routing functionality pro-vided by Copernicus servers function in a way similar to the network layer in the OSI model, meaning that a server is not responsible for ensuring that the message reaches its destination, only that the message is sent to the next server in the path.

It might seem like Copernicus duplicates the network traffic routing functionality pro-vided by the network layer, but there are several reasons why this have been necessary. Most importantly, routing in a typical network layer protocol, such as IPv4, only use knowledge of the network topology and addresses of the network nodes when de-ciding where to route a packet. In comparison, a Copernicus server uses application domain data to decide where to route a message, making it impossible to acquire the same functionality with a regular network layer protocol without introducing added complexity in the form of a lookup service that would be able to translate abstract Copernicus data to a network address.

1.6 Related Technologies

1.6.1 Folding@Home

It takes approximately 10’000 CPU days to simulate a protein folding, even though the upper timescale of a protein fold is one tenth of a microsecond[1]. As a consequence, the developers behind Folding@Home designed a new way to simulate protein folding by dividing the work between multiple processing nodes and creating an application where normal PCs are linked together to form one of the largest supercomputers in the world; the Folding@Home network can at any time field half a million cores[2].

Copernicus can be thought of as a branch from the idea of distributing your simu-lations, where instead of distributing simulations over PCs you distribute them over

(12)

supercomputers and use Copernicus as a platform for your simulations. As supercom-puters have grown more common it becomes easier to duplicate results.

1.6.2 BitTorrent

The BitTorrent protocol is peer-to-peer file sharing protocol used for distributing large sets of data. It was first released by programmer Bram Cohen in July 2001[3].

The protocol aims to solve the problem of distributing data from a single source server. It does this by splitting the file to be distributed into several fixed-sized pieces and allowing each bittorrent node (also called a seed) to share pieces as they are available locally with the rest of the bittorrent network.

The file also has an associated torrent descriptor file that contains the cryptographic hash of each piece of the file[4]. This ensures that accidental or malicious modifications of files are easily detected and not propagated.

A tracker server must also be available to coordinate the network and maintain a list of seeds for a given torrent.

1.6.3 Apache Hadoop

Apache Hadoop is a project that aims to offer an open-source software framework for reliable, scalable and distributed computing over large sets of data[5]. It is designed to scale up to several thousand servers with petabytes of data and aims to run work in the form of map/reduce jobs near the actual data[6].

Hadoop includes the following subprojects:

• Hadoop Common: The common utilities that support the other Hadoop subpro-jects.

• Hadoop Distributed File System (HDFS): A distributed file system that provides high-throughput access to application data and runs on top of the file systems of the underlying operating systems.

• Hadoop MapReduce: A software framework for distributed processing of large data sets on clusters.

(13)

6

1.7 Motivation and goals

In Copernicus, whenever a command finished, the command output data was previ-ously sent directly from the worker to the controller server. This was not optimal for several reasons:

• It introduced inconveniences if post-processing were to be de-centralized as the command output data would have to be re-sent from the controller to the server designated to handle the post-processing.

Also, the aggregated output data from simulations using Markov state models often measure several terabytes in size within a few days. Having all workers send their command output data to a single source could saturate the network and create a bottleneck.

• Because workers and a controller server can be separated by one or several low-bandwidth connections, sending command output data several gigabytes in size can take quite some time. As a consequence, the workers were occupied with a trivial task for a longer period of time (as sending the data is a blocking call). It is also imperative that no important communication is lost in the network. More specifically, communication needs to be persistent and transactional. That is, on a given network, a message should eventually always reach its destination and should be served exactly once.

On top of that, all messages should be handled in the order they were sent. To enforce this it is necessary to, from the sender-side, define a set of previously sent messages that the message depends on. This means that the handling of messages received out-of-order will have to be stalled until all the relevant messages have been received and handled.

The mechanics responsible for this will need to effectively handle chains containing on the order of 100 messages that depend on each other in a scalable manner.

(14)

2 Optimizations

2.1 System design and Implementation details

2.1.1 Command completion optimization

The solution to the problem of how to increase the performance of completing a com-mand becomes apparent when recalling that workers usually have a low-latency con-nection to the nearest server. Sending command output data to this sever is an opera-tion up to several orders of magnitude faster than sending the same data over a high-latency connection directly to the controller server. By not having to send the data over a high-latency network, the worker can continue with another command earlier than it otherwise would be able to.

The scheme for sending the command output data was therefore changed from an "ea-ger" transmission protocol to a "lazy" one:

Figure 2: A graph of the command completion procedure.

1. Upon completion of a command, the worker sends the command output data di-rectly to its associated server. The worker then immediately broadcasts its avail-ability to the Copernicus network.

2. The worker’s associated server sends a data availability notification to the com-mand’s controller server that the command has finished and that the comcom-mand’s output data is available.

3. The project server notes the command as completed. The project server then requests the data from the server holding it as needed.

2.1.2 Asset tracking

Any server should at all times know what relevant data is stored at remote locations (as in the example of the controller server needing to know the location of all com-mand output data) and what data is stored locally. To facilitate this I introduced a new concept asset with the following properties:

(15)

8

1. An asset should be generic. At the moment we are only interested in tracking command output data, but this will probably change in the future.

2. Assets can be either remote or local.

3. Assets should be accessible by an ID that, together with a combination of its as-sociated project and controller server hostname, should be unique.

4. Copernicus should provide reliable means to pull any asset from a remote loca-tion.

5. Both local and remote assets should be stored persistently so that they are recov-erable in case of a server shutdown.

6. It should be possible to, in the future, enforce ownership so that a controller for a given project can’t remove or modify project data belonging to another controller without permission.

When a server receives command output data from a worker it stores the data as a lo-cal asset. Similarly, when a server receives a data availability notification from another server, it stores an entry for that asset, including the asset location, in a remote assets list and pulls the data from the relevant server when the data is needed.

As mentioned earlier, this opens for the possibility of distributing or relocating post-processing from the controller server to a location that is closer to the center of com-mand output data needed for the analysis.

(16)

3 The Message Delivery System (MDS)

3.1 System design and Implementation

Communication in Copernicus can be divided into two main groups:

• Project-to-Project messages. These are project/controller-level asynchronous mes-sages sent exclusively between servers. There is no data associated with a re-sponse to these messages. The sender need only know that the message has been received, upon which it will denote the message as successfully sent. The asyn-chronous nature of these messages means that handling the message on the re-ceiving end should be deferred until any resources needed to handle the message are available.

The command output data availability notifications and asset communication de-scribed in section 2.1 belong in this group.

• Client-to-Server requests. These are lower-level synchronous requests that need to be handled real-time, and can also return data.

An example of a client-to-server request is the broadcast done by worker-clients to poll the network for a new command.

These two fundamentally different message types introduce different requirements and constraints on the network; while Copernicus is designed to handle some messages ar-riving out of order or not at all, project-to-project messages should always reach the destination exactly once, and must always be handled in the correct order.

For example, in the case of a server with an attached worker that needs to notify the controller server that command output data is available at a certain location, we don’t want the routine responsible for sending the notification message to also take into ac-count all possible scenarios where the message was corrupted in some way. An efficient and guaranteed-to-work message delivery system for sending these kinds of messages in a transactional and persistent manner was therefore needed. With this system, the routine mentioned above can pass the message to the MDS and continue execution un-der the assumption that the message will reach its destination eventually (unun-der the assumption that both sending and receiving server is a part of the same Copernicus network), and that the messages are handled at the destination in the order they were sent.

3.1.1 Entities

Following are some conventions used to present information about the MDS design specifications in an unambiguous fashion.

(17)

10

• message: A message is referred to as a project-to-project message described above. • reply: A reply to a message, from the receiving end to the sender using the same

connection.

• message input queue: A first-in-first-out queue; a part of the message delivery sys-tem and used to sys-temporarily store incoming messages.

• message output queue: Each Copernicus server, as a part of the message delivery system, has a message output queue used for sending messages.

• Queue handler (QH): A designated thread that acts as a helper in the message delivery system, responsible for sending or receiving messages.

• inqueue history: The IDs of all messages received are stored in this data structure. • served message history: The served message history stores the ID of all messages

that have been successfully passed to the application layer.

• unsatisfied dependencies graph: A directed graph of all received messages that have at least one unsatisfied dependency, as described by the mds-dependencies header (see below). An edge from node A to B denotes that message A depends on B and implies that B is not served (the QH responsible for handling B is also responsible for removing B and the edge from A to B from the graph, see subsection Message headers below).

3.1.2 Variables and Constants

These variables are used by the message queues.

• sequence#: A local counter of the number of messages sent to each destination. Each time a message is passed to the MDS, the value associated with the destina-tion is incremented by 1.

• hostname: Each server has a constant, globally unique name.

• resend-lag: The minimum time between two attempts to send a message. The value of this constant is implementation specific.

3.1.3 Message headers

A number of headers are needed to determine appropriate courses of action when re-ceiving and sending messages and replies. These headers should be added to the al-ready existing set of headers in any message and reply. All headers are compulsory unless otherwise specified.

(18)

• mds-id: The message ID, on the form of the sender port number concatenated with sender hostname and the sequence# associated with the destination. Having IDs on this form guarantees that for each pair of source and destination servers, each message will have a unique ID (as a server’s hostname paired with its port is globally unique).

• mds-active: Header with only one possible value ’true’, indicating a message or reply should be handled by the MDS at the destination and source. Although this header is redundant, it is used to minimize constraints for future protocol changes.

• mds-dependencies: A comma-separated list with the message ID of previous messages that needs to be handled at destination before this message can be han-dled. Using this header guarantees that messages are processed in an arbitrarily specified order. This header is not compulsory and should not be used for replies. • mds-transaction: Used for replies to denote a message was received properly. Has only one possible string value, ’true’. This header is always (and only) used for replies. Although this header is redundant, as merely receiving a reply is proof in itself that a message was received successfully, it is used to minimize constraints for future protocol changes, as with the mds-active header.

3.1.4 Sending and receiving procedure

Sending a message always starts with a message being put in the server’s message output queue that acts as a list of messages that should be sent. The message is then picked up by the MDS and assigned an ID and a time-to-live (TTL) value. The sending server’s sequence# is also incremented by one. A signal is then sent to notify a vacant queue handler (QH) that a new message is available in the queue, upon which: (This is also done periodically to ensure previously failed messages are re-sent)

Source:

1. The QH (just notified) removes a message from the top of the queue and checks when the message was last sent.

2. If the QH failed to pop a message (the queue is empty), it suspends (becomes vacant). Else, if the message should not yet be re-sent as indicated by the resend-lag value and the message’s TTL is above zero, the QH pushes the message back to the bottom of the queue. (The TTL value for a message can be set to a

(19)

nega-12

tive value, indicating that this message should always be re-sent until properly received.)

3. Else the QH reduces the message’s TTL value by one and sends it to its destina-tion with the addidestina-tional headers mds-active, mds-id and mds-dependencies. 4. The QH waits for a reply acknowledging the message was successfully received.

If the connection times out, the message is pushed back to the output queue. 5. QH waits for a notification from the MDS, then repeats from step 1.

Figure 3: A graph of the MDS sending procedure

Destination:

6. At destination, if message received has a header mds-active value ’true’, an ac-knowledgment is returned on the same connection with the headers mds-transaction and mds-id.

7. The inqueue history is checked for a message with the same ID. If no message with the same ID is found, the message received is put in the message input queue and the message’s ID and hash are saved in the inqueue history.

8. An available QH pops a message from the message input queue or waits until one is available.

9. If the message has a header mds-dependencies, the served message history is checked to see if each message the original message depends on have been served. If at least one has not, the message is stored as a node in the unsatisfied depen-dencies graph and the QH returns to step 8.

(20)

10. Else the QH passes the message on to whatever entity responsible for serving the message and marks the message as served in the served message history (in that order).

11. The QH looks up any messages in the unsatisfied dependencies graph that de-pends on the current message (referred to as dependees) and deletes its own node and any edges leading to it. Then for each dependee, if it has no unsatisfied de-pendencies (no outgoing edges), it is pushed to the back of the message input queue.

12. The QH returns to step 8.

Figure 4: A graph of the MDS receiving procedure

As the inqueue history, served message history and unsatisfied dependency graph are all shared by the QHs, the methods of these objects needs to be thread-safe. Also, any implementation should ensure that all manipulation of the unsatisfied dependency graph only locks the part of the graph needed for the operation. This ensures that disjoint pairs of messages with unsatisfied dependencies can be handled concurrently without excessive blocking.

Ensuring messages are sent timely can be done by using a condition variable. A thread waiting on the condition variable is suspended until the thread pushing messages to the output FIFO queue awakes it. A timeout should be used on the wait-operation to ensure that the QHs periodically check the output queue for messages that needs to be

(21)

14

resent.

The protocol relies on some book-keeping to function properly over long periods of uptime and after a server restart. Old entries in the inqueue history and served mes-sage history should be removed periodically to ensure efficiency and limit memory usage over long periods of uptime.

3.2 Results

3.2.1 Implementation Details

The message delivery system was implemented in Python using the threading library for synchronization primitives such as condition variables and locks. The queue han-dler threads were also implemented using this module. Due to the python global in-terpreter lock (GIL), only one python thread can execute code at any time[10] and this limits any possible performance gains from having several threads handle incoming and outgoing messages. The multiprocessing module can be used to create subpro-cesses instead of threads, effectively side-stepping the GIL[11]. However, the multipro-cessing module places some constraints on the host operating system. As Copernicus is meant to be completely OS independent, and the queue handlers have only minor performance demands, the threading module was chosen over the multiprocessing al-ternative.

Both the input and output side of the MDS was implemented using a singleton class. When instantiated, state and synchronization objects are created and the queue handler threads are spawned with these objects as arguments. All state and synchronization ob-jects had to be made thread-safe as they would need to be accessed by several threads simultaneously.

The unsatisfied dependency list was implemented using a directed graph. Nodes and edges in the graph represent a message and an unsatisfied dependency, respectively. For any message, it had to be possible to list all messages that depended on it, and the other way around; for any message it had to be possible to list all messages that it depended on. To effectively implement these operations, an edge had to been known from both nodes it was connected to.

A hashed table was used to acquire constant-time lookup on nodes in the graph.

(22)

representation of a graph where each node is represented by an element in an array. In its simplest form, each node/element in the array points to a list of all nodes it has an edge to.

The alternative would be to use an adjacency matrix, but that would require the graph size to be constant, which would introduce a maximum on the number of unsatisfied messages that could be kept at any one time. It would also be quite memory insuf-ficient as resizing a matrix is O(n) times more demanding, where n is the number of nodes in the graph, than resizing an array. The edge lists in an adjacency list can also vary in size depending on the number of edges in the graph, making it more memory efficient in this case than the adjacency matrix.

The unsatisfied dependency graph also had to be thread safe; the trivial way of do-ing this was to use a lock. However, this would lead to the whole graph bedo-ing locked by one thread at a time, which could severely limit performance. A better way of mak-ing the graph thread-safe was to lock only the part of the graph you wanted to use for the more complex algorithms. That way several threads can work concurrently on disjoint sets of the graph concurrently without blocking each other.

3.2.2 Performance evaluation

With the implementation described above, looking up a node’s dependencies/dependees in the unsatisfied dependency list (u.d.l.) is an O(n) operation, where n is the number of dependencies and dependees for that node. The presence of any outgoing edge from a node means that there is at least one unsatisfied dependency associated with that node. Thus, traversing the u.d.l. looking for messages with resolved dependencies have a worst-time complexity of O(m), where m is the number of nodes in the u.d.l. This means that the worst-case time complexity of resolving the unsatisfied messages for all messages in a chain of messages that depend on each other is O(m2), where m is the number of messages the chain.

A test was designed to evaluate the overhead and efficiency of the message delivery system: it would send a number of non-persistent messages and compare this with sending the same messages in a persistent manner using the MDS. For further compar-ison the same messages were sent persistently with the order scrambled by 5%, 10%, 20%, 50% and 100%. Each burst comprised of 100 messages and was sent 100 times for a total number of 105messages. The bursts did not overlap on sender and/or receiver side, and each consecutive burst was sent without a restart of any server:

(23)

16

Figure 5: Sampling of time taken to send a burst of 100 messages using the MDS and sending the messages in order.

Figure 6: Sampling of time taken to send a burst of 100 messages using the MDS and sending 10% of the messages out of order.

(24)

(25)

18

We notice how there is no apparent increase in the time taken to send each burst as the number of bursts previously sent increase. This is because the MDS is stateless between disjoint sets of messages groups; that is, the MDS will not take proportionally longer time to receive messages as the number of messages received increase. Also, the stability is apparent as the difference between the lowest and highest time elapsed is a mere 0.03 seconds, which is only 0.5% of the time taken on average to send a burst.

Another way of measuring the performance of the message delivery system is to pare the average time taken to send one burst. The following figure shows this com-parison:

(26)

Figure 10: A comparison of the average time taken sending one burst without the MDS and with the MDS on at different re-order scramble levels.

We can see that although there is no definite time increase between for example 10% and 20% re-ordering, the trend is that the more of the messages that arrive out of order, the longer time it takes for the server to process the burst. This is to be expected; messages received out of order can not be handled right away, but are stored in the unsatisfied dependency list until the dependency is received, upon which the u.d.l. will have to be traversed. It should also be noted that both the relative and absolute differences between the average time to send a burst using the MDS compared to the time to send a burst with the MDS at 100% scramble is low; the difference is 4,6ms and 0,8%, respectively.

(27)

20

4 Future Work

Albeit simple, the idea of a controller server tracking project resources leaves little or no room for sharing resources on a network level, between independent projects. A distributed hash table (DHT) could prove useful in sharing data in a distributed and redundant fashion, and could be used to store anything from worker-client executables to the location of project assets. There are several implementations of DHTs available, Chord being one of them. Chord is very useful in the sense that it’s fairly straight-forward to understand and implement while also providing logarithmic scaling with the number of nodes in the network[9].

Making project resources available over the network introduces new possibilities to how post-processing is managed. As of now, post-processing is always done by the controller server when in fact it can be more efficient to do this wherever the command output data is located. This way you would save time and network resources by re-ducing the amount of data sent between servers.

This can be improved further by taking advantage of the fact that some servers are re-siding on the same machine or are at least sharing a virtual file system. Additional time and network resources can be saved by making Copernicus servers take advantage of this.

(28)

5 Conclusion

The optimizations and improvements presented in this paper augments Copernicus in several ways; the improvements made to how a worker finishes commands increase throughput of simulations over time by moving the burden of sending output data from the worker to a nearby server.

Asset tracking allows project controllers to "lazy-load" resources. This has the potential to reduce peak stress on the network by letting controllers pull data when they are ac-tually needed. It also enables future improvements such as decentralized/distributed post-processing.

The message delivery system enables transactional support for single or entire chains of messages, and guarantees that messages are handled in any arbitrary order server-side, as specified by the entity responsible for sending the messages. In this way message sending can be done in a more abstract manner with no worries of message corruption and network reliability.

We have also shown that the message delivery system only has a minute performance impact, with a mere 0,8% increase in time taken to send message chains sent in reverse order compared to sending the same messages without using the MDS.

(29)

Bibliography

[1] Folding.stanford.edu (2000) Folding@home - Science. [online] Available at: http://folding.stanford.edu/English/Science [Accessed: 11 Apr 2012]. [2] Folding.stanford.edu (2000) Folding@home - Papers. [online] Available at:

http://folding.stanford.edu/English/Papers [Accessed: 11 Apr 2012].

[3] Finance.groups.yahoo.com (2001) BitTorrent - a new P2P app. [online] Available at: http://finance.groups.yahoo.com/group/decentralization/message/3160 [Ac-cessed: 11 Apr 2012].

[4] Bittorrent.org (2008) The BitTorrent Protocol Specification. [online] Available at: http://bittorrent.org/beps/bep_0003.html [Accessed: 11 Apr 2012].

[5] Hadoop.apache.org (2012) What Is Apache Hadoop? [online] Available at: http://hadoop.apache.org/ [Accessed: 11 Apr 2012].

[6] Wiki.apache.org (2009) Project Desription - Hadoop Wiki. [online] Available at: http://wiki.apache.org/hadoop/ProjectDescription [Accessed: 11 Apr 2012]. [7] S. Pronk, P. Larsson, I. Pouya, E. Lindahl, P. M. Kasson, V. S. Pande, B. Hess, K.

Beauchamp, I. S. Haque and G. R. Bowman.

Copernicus: A new paradigm for parallel adaptive molecular dynamics.

Proceedings of the Conference on High Performance Computing, Networking, Storage and Analysis (SC11), Seattle, Washington, 2011.

[8] B. Alberts et al. (2002) Molecular Biology of the Cell. 4th ed. New York: Garland Science.

[9] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek and H. Balakrishnan. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.

Proceedings of the 2001 conference on Applications, technologies, architectures, and pro-tocols for computer communications, New York, NY, 2001.

(30)

[10] Docs.python.org (2012) Python threading module. [online] Available at: http://docs.python.org/library/threading.html [Accessed: 11 Apr 2012]. [11] Docs.python.org (2012) Python multiprocessing module. [online] Available at: