Availability of Smart Contracts that Rely on External Data

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Computer Science

2020 | LIU-IDA/LITH-EX-A--20/054--SE

Availability of Smart Contracts

that Rely on External Data

Tillgänglighet av Smarta Kontrakt Beroende av Extern Data

Tjelvar Guo

Daniel Han Herzegh

Supervisor : Felipe Boeira Examiner : Mikael Asplund

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligsäker-heten ﬁnns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dok-umentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för up-phovsmannens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Abstract

Smart contracts are digital agreements stored and executed on a blockchain. With smart contracts, multiple parties can enter into an agreement whose correct execution is guaran-teed by the underlying blockchain. However, there is no inherent way for smart contracts to access data APIs external to the blockchain they reside on, which is needed in order to expand their usefulness. This thesis investigates two approaches of feeding data to a smart contract on the Ethereum blockchain, with regard to the performance metrics gas cost, block delay, and network latency. The investigation is set in the context of train travel where the time of arrival of a train at its destination is retrieved to the smart contract. Re-ceiving and aggregating submissions from varying numbers of passengers is compared to retrieving this data from the dedicated open API provided by Trafikverket using the Chainlink framework. Through experiments, involving issuing transactions and monitor-ing them, it was found that in the case of passenger submissions with aggregation the gas costs had a linear relationship to the amount of passengers submitting data. Compared to lower amounts, the block delay for passenger submissions started seeing an increase at 100 to 300 submissions whereas it was more or less constant for aggregations. Further-more, there was no noticeable trend for increase in network delay with increase of submis-sions. Conclusively, with regard to all performance metrics, it was found that in all cases involving more than five passengers it was cheaper to use Chainlink to fetch data from Trafikverket.

(4)

Acknowledgments

Conducting a master thesis in the midst of a global pandemic has been a unique experience. One which has tested our ability to work independently. However, like any other major project it could not have been carried out without the help of others. First of all, we would like to thank our supervisor Felipe Boeira and our examiner Mikael Asplund for taking us on and assisting us during this project by giving us valuable feedback. In a similar vein, we would like to extend our gratitude to our opponents Samuel Blomqvist and Björn Detterfelt for helping us improving the thesis with their feedback.

We would also like to thank Kevin Söderberg at HiQ ACE for sharing his time and for being available for discussions as well as for providing us with educational content on smart contract development.

Appreciation is also extended to the various online communities in which we have par-taken. We would like to thank two communities in particular; The Ethereum stack exchange community for always being available for question answering and for all the interesting discussions, and the Chainlink discord community and the Chainlink developer team for being generous with their time and answering questions regarding the framework and how it is used. Lastly, we would like to thank our friends and family for always being there and checking up on us.

Tjelvar Guo

Daniel Han Herzegh September 2020

(5)

List of Figures

2.1 The proof-of-work process . . . 6

2.2 Chainlink flow . . . 10

3.1 Structural representation of the system with additional components used for the evaluation. . . 18

3.2 Retrieving data from passengers . . . 25

3.3 Retrieving data from trafikverket . . . 28

4.1 DeRail contract structure . . . 30

4.2 Gas usage for submissions part of retrieving data from passengers . . . 34

4.3 Gas usage for aggregation part of retrieving data from passengers . . . 34

4.4 Submission block delay . . . 35

4.5 Submission network latency . . . 35

4.6 Aggregation block delay . . . 36

4.7 Aggregation network latency . . . 36

5.1 Retrieving data using multiple oracles . . . 41

5.2 Retrieving data using multiple oracles with one oracle contract . . . 42

C.1 Total gas usage for retrieving data from passengers . . . 58

C.2 Total block delay for retrieving data from passengers . . . 59

C.3 Total Network latency for retrieving data from passengers . . . 60

(8)

List of Tables

4.1 Gas cost - frequency pairs . . . 36

4.2 Block delay . . . 37

4.3 Network latency . . . 37

A.1 Machine configuration . . . 55

A.2 Boot disk settings . . . 55

A.3 General firewall rules . . . 55

A.4 Custom firewall rule settings . . . 56

(9)

Acronyms

EOA Externally Owned Account EVM Ethereum Virtual Machine TAL Time At Location

(10)

1 Introduction

This chapter introduces the concepts of smart contracts and blockchain technology. These concepts serve as context leading into the presented problem and research questions of this thesis.

1.1 Motivation

The term smart contract was coined by Nick Szabo in his 1994 essay Smart Contracts [37]. Szabo describes a smart contract as a “computerised transaction protocol that executes the terms of a contract” in a way that "minimises the need for trusted intermediaries". In recent years this concept has seen a rise in interest due to the invention of blockchain technology. A blockchain is a form of distributed ledger technology and is essentially a digital, distributed ledger whose contents are calculated and stored by multiple nodes in a network, where a node is a server or computer running a specific program. To paraphrase, every node in the network has its own local copy of the ledger. The agreed upon state of the blockchain is determined by the majority of nodes reaching consensus through some mechanism. This decentralisation is one of the key components which contributes to giving the blockchain the properties of high tamper resistance and assurance of uptime by minimising trust needed for any single node.

The first blockchain use case, Bitcoin, is used to store a shared record of transactions of the native currency with the same name [31]. In 2014, the capabilities of a blockchain were further explored by Vitalik Buterin with the introduction of the Ethereum blockchain [4, 42]. His proposition was a blockchain which allowed for storage and execution of programs writ-ten in the native Turing complete programming language upon which the blockchain itself is built. A program, whose design and execution is reliable, tamper resistant and minimises the need for trusted intermediaries is essentially a smart contract as it was described by Nick Szabo. Henceforth, in this paper, the term smart contract is used to describe such a program which is stored and executed on a blockchain.

While the notion of a smart contract does sound like an attractive proposition, currently available solutions are still not without their imperfections. For many of the potential use cases there are still significant problems to consider and hurdles to overcome. A problem

(11)

1.1. Motivation

which has until recently garnered relatively little attention in the academic literature is the validity and availability of decentralised data inputs.

Assuming code execution on a blockchain is trustworthy enough for certain use cases, these use cases might still rely on the values of external data. For example, consider a smart contract which depends on some input. If the value of the input is X it will execute A, and if instead the value of the data is Y it will execute B. In such a case it does not matter how secure the code executions of a smart contract are if the data it receives is faulty or if the data is not received at all. If a lot of interest lies in the outcome of the smart contract execution, it is important that the correct execution is guaranteed. Thus, it is important that the availability and security of the data source and the data retrieval is sufficient enough to resist possible malicious behaviour or other vulnerabilities which could otherwise cause the resulting data to be unavailable or faulty.

In the academic literature, there are multiple proposals of how to securely relay data between a data source and a smart contract [28, 33, 44]. Less consideration is given to the data sources themselves within the context of smart contracts and decentralised systems. It is not unreasonable to assume that the data source is treated as a trusted third party, even in a decentralised context. On the other hand, it could be argued that having a single data source, conflicts with the security logic of the blockchain and smart contract space where a major component is to minimise trust needed for any single third party. However, in a lot of cases there are few traditional sources, in some cases just one, which provide the desired data. Nevertheless, a lack of traditional data sources does not necessarily imply a lack of potential data sources. This is especially true in cases where there exist independent entities which are able to report their own respective values. These entities create a new decentralised data source of sorts which is comparable to the already established trusted data source. Thus, it would be interesting to explore and investigate a case which is dependent on some piece of data, where there is both an established single trusted data source accessible via an API and an alternative, decentralised approach to obtain this data, and compare these.

While it can be discussed whether one data retrieval solution is better than the other for ensuring the validity of external data, an equally important aspect is performance. The component of decentralisation, which contributes to the security of the blockchain, also has a drawback. Due to all operations being carried out by a network of nodes, a much larger computational cost is generated in a blockchain environment compared to traditional sys-tems. Therefore, a computationally expensive data retrieval method can result in the data being unavailable for an extended period of time. In turn, this would affect the availability of a smart contract system dependent on that data. Therefore, the dimension of measuring the performance costs is crucial when evaluating and comparing two data retrieval methods. In this thesis the investigation is set within the context of train travel and train ticket systems. As per Swedish law the government-owned railway company, SJ, has a set of conditions, using the time at location (TAL), for determining refunds in case of delays [24]. The TAL is simply the time at which a train arrives at its destination. There are several reasons as to why a train ticket system is suitably implemented as a smart contract that based on a TAL automatically issues correct refunds. It could reduce the administrative time for passengers and SJ spent on claiming and processing refunds, respectively. Passengers would not act-ively need to apply for refunds and SJ would not need to handle them. A smart contract on the Ethereum blockchain acts autonomously on a transparent platform which decreases the amount of trust passengers need to place on SJ for correctly receiving refunds eligible to them. In essence, the smart contract acts as an independent escrow between SJ and its paying customers.

(12)

1.2. Aim

1.2 Aim

In order for a system, as described in Section 1.1, to function correctly it is reliant upon receiving the correct TAL. Currently, the Swedish Transport Administration, Trafikverket, is the only entity openly providing this data. However, the passengers carrying internet connected mobile phones also have the ability to provide this information and thus act as a decentralised data source.

The aim of this thesis is to investigate and compare two approaches of retrieving data to a smart contract. In this case this entails retrieving the time, at which a train arrives at its destination, to a smart contract system with regard to system design, availability and performance costs. The first approach uses a trusted centralised data source to feed a smart contract with data and the second approach uses a decentralised source in the form of direct submissions of data from passengers of a trip to the smart contract.

1.3 Research Questions

This thesis is an empirical study utilising experiments to conduct exploratory research where the following research question is investigated.

What are the differences in a smart contract system, in terms of performance costs, between having the members of a decentralised data source individually submit data and fetching data from a conventional data source?

1.4 Method Overview

In this study, a simplified train ticket system was implemented as a smart contract on the Ethereum blockchain. Functionality was implemented in order to retrieve external data ac-cording to the two approaches specified in the research question, where the external data is the arrival time of a train at its destination. The first approach with the decentralised source consisted of simulated passengers submitting this data which was then aggregated to a single value by the contract. The other approach consisted of a system which fetched the data from Trafikverket’s API. Using these implementations, multiple tests were run in which data was retrieved to the smart contract. Based on the results of these tests, the two approaches were then compared with regard to the performance metrics gas cost, block delay, and network latency.

1.5 Delimitations

Due to monetary and time related constraints, delimitations have been made to the project. • Research conducted within this thesis is constrained within the Ethereum ecosystem. • Investigating the correlation between the user specified gas limit and the performance

of blockchain transactions is outside the scope of this project.

• Effects on perceived levels of trust between parties involved, as a consequence of im-plementing the smart contract system, are not taken into account or measured.

• Considerations have mainly been taken with regard to the context area of the thesis i.e train travel.

• The smart contract system built does not verify the legitimacy or validity of data re-ceived regarding the TAL.

(13)

2 Theory

This chapter presents background theory about the context areas of blockchain, smart con-tracts, external data retrieval, and train travel in Sweden. Furthermore, domain specific de-velopment tools applied in this thesis are described. The last section consists of a presentation of related work.

2.1 Blockchain

In 2008, the first blockchain was invented by one or several people using the pseudonym Satoshi Nakamoto [31]. Its purpose was to serve as the public transaction ledger for the cryptocurrency Bitcoin and its network of participating nodes. Since the creation of Bitcoin, multiple blockchains and blockchain networks have been created, with varying degrees of adoption. In this thesis, however, the majority of the focus lies on the public blockchain Ethereum.

A blockchain can be thought of as a distributed, public ledger which stores all transac-tions that have been conducted by participants in the network [39]. The name blockchain originates from the design of this ledger, which can be viewed as a growing linked list of blocks, where each block contains a group of permanently stored transactions. Identical copies of the ledger are stored by participants in that blockchain network. Henceforth, these participants will be referred to as nodes.

2.2 Transactions

A transaction can be thought of as an externally triggered event that changes the current state of the blockchain. It is “externally triggered” in the sense that an entity “outside” the block-chain, off-block-chain, must initiate it using an externally owned account (EOA) [14]. In many cases transactions are about the movement of digital tokens between two accounts. For example, consider that Lisa wants to send 2 ether, which is the native token of the Ethereum block-chain [25], from her EOA to Michael’s EOA. An EOA can be thought of as a pair consisting of a public key and a private key. For Michael to receive a transaction, Lisa has to know what the public key to his account is (from which she can derive an address). In order for

(14)

2.3. Proof-of-Work and Mining

Lisa to submit the transaction she has to sign the transaction with her private key. She then propagates this transaction to the nodes of the Ethereum network. Assuming that her sig-nature shows that she has the valid private key, and that she possesses the amount of ether which is to be moved, the nodes of the network further relay the valid transaction to other nodes. It is, however, not stored on the blockchain yet as all nodes must first reach consensus on the content and order of the blocks. When this is done, Michael receives the ether. In order to ensure that all nodes store identical copies of the ledger a consensus mechanism is used. In the cases of both Bitcoin and Ethereum, this is currently achieved through a process called proof-of-work [23].

2.3 Proof-of-Work and Mining

While the work of most nodes in the Ethereum network consists of storing a copy of the ledger, and to validate and relay newly created blocks to other nodes, there are some nodes known as miners [15]. Miners are nodes that compete with each other in order to create the next block of the blockchain. This is where the concept of proof-of-work comes into play. Each block in the blockchain has an associated hash sum which has been computed by the miner of that block, using the transactions in the block, the hash sum of the previous block, and an arbitrary number, referred to as the nonce. Before a new block is created and added to the blockchain, each miner groups pending valid transactions into a block. The miner then computes a hash sum using the previously mentioned parameters. In order to create an accepted block, the computed hash sum, which is interpreted as an integer, must be smaller than some target value set by the underlying blockchain protocol. Most of the mining process therefore consists of generating new hash sums by trying different nonces until one miner eventually finds a valid hash sum. That miner signs the block and propagates it to the network of nodes, which verifies the validity of the solution and the contents of the block. If successful, the node adds the block to their copy of the ledger and relays it further to other nodes [45]. The transactions included in the newly mined block that is part of the blockchain, have thus received a block inclusion. The miner of the block receives a block reward, in Ethereum these are newly created ether, as well as the transaction fees associated to the transactions of the block.

The proof-of-work process is what makes the blockchain tamper resistant. Changing a block in the past by, for example, altering the transactions within it will cause the hash sum of that block to change. This will consequently cause all subsequent blocks after it to become invalid since they store the hash of the previous block. The process is illustrated in Figure 2.1.

Figure 2.1: The proof-of-work process

2.4 Blockchain Forks

There are occasions when multiple miners each find a valid hash sum at the same time and propagate their respective block to the network [25]. Depending on the positions of the miners in the network their proposed blocks reach different nodes at different times, but

(15)

2.5. Block Limits and Gas Usage

since both blocks are valid they are temporarily accepted by all nodes. These blocks share the same predecessor, or parent block, and thus the situation results in a blockchain fork. This is a problem since the consensus and determinism of the blockchain is based on there being one uniform chain, which all nodes agree upon. To account for and solve this problem, the blockchain protocol has the rule that the longest blockchain prevails. This is where the no-tion of block confirmano-tions arise. In order to determine the main chain in the occurrence of a fork, nodes simply wait a number of blocks before considering a block of transactions to be valid. The more blocks added since the block in question, the more certain a node can be that the block is part of the main chain. Accordingly, the fork which first becomes longer than the other “wins” and becomes the main chain. The discarded blocks in the shorter chain are called uncles. If the transactions in an uncle has not already been included in the main chain the transactions are considered pending transactions, which are waiting to be included in the main chain. This kind of forking is not a rare occurrence. At the time of writing this report, the uncle rate, which is the rate at which uncles are produced, for the Ethereum mainnet is between 6-7%1.

2.5 Block Limits and Gas Usage

Conducting operations on a blockchain network, which is what constitutes a transaction, is both inefficient and resource expensive. This is the case since all nodes in the network must compute the operations of the transaction and also store all transactions that have ever been conducted on the blockchain. Thus, in order to avoid congesting the network and exceed the storage capacities of nodes, there is both a target time interval at which new blocks are generated as well as a set limit on the amount of operations that can be included into each block. The target time interval, often referred to as block time, is 12 seconds [15, 40]. In or-der to express the finite resource of available computations in each block, there exists a unit called gas. For all operations there is a corresponding value of gas which represent the com-putational cost to perform that operation. These values are set by the Ethereum developers. The earlier mentioned transaction fee is the amount of ether the initiator is ready to pay per unit of gas, gas price, which the transaction requires. The limit of computations that can be included in each block is expressed as the block gas limit. The amount of gas a transaction requires is referred to as gas cost or gas usage. Sometimes the total accumulated gas cost for pending transactions, i.e. transactions that have not yet been included into the blockchain, exceeds the block gas limit. In these cases it is up to the miner of the block to decide which transactions are included. Since the miner gets rewarded with fees, they have an incentive to choose transactions with the highest set gas prices. Thus, instead of being “first come, first served”, the throughput of transactions on the blockchain can oftentimes be more reliant on economic incentives at times of high traffic.

2.6 Smart Contracts

While many transactions consist of moving digital assets between two accounts, there are multiple other types of operations which can constitute a transaction. Ethereum was the first blockchain which allowed for Turing complete computing using the Ethereum Virtual Machine (EVM) [42]. Programs (i.e smart contracts) are stored on the blockchain as EVM bytecode [20] and then executed by nodes, using the EVM. To write smart contracts, it is most common to use the programming language Solidity2, which is a high level language being developed for this purpose. Similarly to an EOA, a smart contract has a contract account which can receive, store and send digital tokens, such as ether. Transactions involving a smart contract must be initiated by an EOA.

1_{https://ycharts.com/indicators/ethereum_uncle_rate (visited on 12/05/2020)} 2_{https://solidity.readthedocs.io/}

(16)

2.7. Oracles

2.7 Oracles

In order for all nodes to maintain consensus, the execution of the smart contract code in the EVM “must be totally deterministic and based only on the shared context of the Ethereum state and signed transactions” [1]. This property has a significant effect on the possibilities of bringing external data on-chain. Among other things, it means that a smart contract cannot make calls which request data from an external API. Consider what effects such a feature would have on the network consensus. For two identical requests, the same API can at two different times yield two different results. Thus two nodes, computing transactions in the Ethereum network, run the risk of getting different results when executing the code retriev-ing the data. Their end states will consequently differ from each other despite havretriev-ing run the same code under the same on-chain context. This will cause an inability for the network to reach consensus.

Since all nodes can agree on and reach consensus on the contents of transactions signed by an EOA, external data can be introduced to the Ethereum blockchain in such a way. An entity which uses this method to bring data to smart contracts can be considered an oracle [1]. Using this broad definition, oracles can be human, software, or hardware.

Within the blockchain space, a known issue is the oracle problem. Ideally, an oracle will bring required data to a smart contract trustlessly, i.e. data should always be delivered on time and be valid. There are several proposals in the literature as to how this can be achieved. These are presented in Section 2.10.2.

2.8 Development Tools

In this section, domain specific development tools applied for the work conducted in this thesis are presented.

2.8.1 Solidity

As mentioned in Section 2.6, smart contracts are stored as EVM bytecode. However, they are seldom written directly as such, but rather they are written through the use of some high level language. In this project the programming language called Solidity was chosen for this task. While there are alternatives3, Solidity was chosen for multiple reasons. It is developed by the core maintainers of Ethereum, it is the most mature language currently available, it has a large community support, and is the only language that has a pinned repository in the official Ethereum GitHub. At the time of writing the latest version of Solidity is v0.6.34. However, due to the vast majority of tutorials and online material being of version v0.5.x the version of Solidity used in this project was v0.5.1.

Solidity is referred to as a contract-oriented language [30]. This means that contract instances are represented similar to classes in object oriented programming. Contracts, just as classes, specify data members and methods pertaining to an instance. A differentiating feature of Solidity compared to other object oriented languages is the inclusion of additional primit-ive types that are used to access information about transactions and blocks. For example, msg.senderis used to determine the address of the account which invoked the transaction. Solidity also supports the concept of events. Events are transaction logs associated with the address of the contract. These logs can be subscribed to and can be used to determine the occurrence of smart contract function invocations and state changes.

3_{https://vyper.readthedocs.io/, https://github.com/CornellBlockchain/bamboo} 4_{https://github.com/ethereum/solidity/releases (visited on 20/02/2020)}

(17)

2.8. Development Tools

2.8.2 Test Networks

Ethereum has a main network, also called mainnet, which is currently secured by over 6000 nodes5_{. It is on this network where all transactions containing real economic value are}

taking place. However, there are also multiple test networks, also called testnets, which al-low developers to test their smart contracts in a similar environment as the main network without having to spend real money for transactions. Currently four different test Ethereum networks exist, namely, Rinkeby, Kovan, Görli and Ropsten. The first three of these are proof-of-authority networks. This means that there exist pre-approved participants in the network that are allowed to mine transactions and add blocks to the blockchain. This has the benefit of increased throughput at the cost of decentralisation [35]. These networks were disregarded as testing networks for this project because they do not mimic the behaviour of the Ethereum mainnet which uses a proof-of-work consensus mechanism. Ropsten on the other hand is a public live test network which reproduces the behaviours and conditions of the main network. It similarly uses proof-of-work as the consensus mechanism.

It should be noted that while the block gas limit is to some extent dynamic, on Ropsten it was close to 8,000,0006during this project. Meanwhile, the mainnet had a block gas limit closer to 10,000,0007at the time of this research. Also note that due to lesser incentive, this network has fewer nodes and miners securing it’s history and state than the main network. Even so, the target block time is equal to that of the main Ethereum network.

2.8.3 Remix

Remix is an open source tool that enables developers to write, test, debug and deploy Solidity contracts through supported web browsers. The reason for using Remix during this project is its streamlined facilitation of deploying smart contracts to any Ethereum test network and its fast setup time. With a few clicks a smart contract, written in Solidity, can be deployed to a live test network, such as the Ethereum test network Ropsten. Additionally, all Solidity compilers are available and no download or installation of software is required which allows for an easy setup process.

During the process of this project a similar online Solidity development tool, called Eth-ereum studio, was released8. However, the workflow for this project had already been established and Remix fulfilled all the needs of development. Consequently, Ethereum studio was not further reviewed.

2.8.4 Chainlink

Chainlink is an open source framework for implementing oracles which are meant to be part of a decentralised network9. It is the oracle solution that has been implemented and used in this thesis project. There are multiple concepts and components involved when requesting and receiving data from a Chainlink node, for which some explanation is justified. For a visual representation of the general flow, please refer to Figure 2.2. First off, the smart contract which requests data is commonly referred to as the requester contract. A Chainlink node is an off-chain component which observes the blockchain using a dedicated blockchain node. In particular, the blockchain node observes the specific events in a dedicated on-chain smart contract, commonly referred to as the oracle contract. Thus, when the requester contract wants to make a Chainlink request, it sends the request to the oracle contract which in turn

5_{https://ethernodes.org/sync (visited on 17/04/2020)} 6_{https://ropsten.etherscan.io//blocks (visited on 27/05/2020)} 7_{https://etherscan.io/blocks (visited on 27/05/2020)} 8_{https://studio.ethereum.org/}

(18)

emits an event. Once the Chainlink node notes that it has received a request it starts working on fulfilling it.

Figure 2.2: Chainlink flow

To fulfill requests, also referred to as jobs, the Chainlink node uses job specifications. Job specifications can be seen as a set of premade instructions, also referred to as tasks, that will execute when the id of the relevant job specification is specified in the request. The available job specifications are specified in the Chainlink node. A job specification consists of at least one initiator and at least one adapter. Initiators are simply the entry points which triggers the execution of the job specification. An adapter, on the other hand, is a piece of software with specific functionality to execute some task. A Chainlink node has multiple built-in adapters which can be used out-of-the-box [6]. Among others, these include HttpGet, HttpPost and JsonParse. Some adapters expect a parameter. For example, the HttpGet adapter expects receiving a url with the request. Consequently, when sending a request, the requester contract must specify the job id for the desired job specification, and if the job specification includes adapters requiring parameters, these parameters must also be submitted along with the request.

For requests which involve tasks that are not handled by the built-in adapters, Chainlink offers the ability to configure the node and add functionality through external adapters [18]. For example, a developer can create specific functionality and deploy it as a serverless func-tion. The Chainlink node then connects to this external adapter using a built-in functionality called a bridge [17]. As a node operator, you have to provide a unique name for the bridge and the URL address to the deployed external adapter.

Often an associated cost is incurred when requesting data from a Chainlink node. For conducting work, Chainlink nodes accept payment in tokens called LINK. Exactly how much LINK is to be paid for a certain piece of off-chain data is up to the node operator and the requester to agree upon, but it is nevertheless a cost that will present itself unless for some reason the node operator wishes to provide its services for free. Taking a look at one of the largest node operators in the space at the time of writing, a payment of 0.16 LINK, which taking into account the current exchange rate for LINK equals $2.49610, seems to be the standard amount11.

2.8.5 Web3

Web3.js is a software library facilitating communication between web front ends and smart contracts [3]. It allows calling smart contract functions, subscribing to events, access to

vari-10_{https://coinmarketcap.com/currencies/chainlink/ (visited on 24/08/2020.)}

(19)

ous Ethereum related utility functions as well as querying blockchain information such as the latest mined block. In the case of this project it is the software library with which all tests have been written and performed. As with all other development tools chosen in this project there are alternatives to web3.js. The biggest contender to web3.js is ethers.js12. Similarly to web3.js it provides all the functionality listed above. Since the focus of this report is not to compare software libraries that interface traditional web services with the Ethereum blockchain, eth-ers.js was not further investigated as authors had previous experience working with web3.js and all needs were met using it.

2.8.6 Truffle Suite

The Truffle suite is a collection of open source Ethereum development tools13. It consists of Truffle, Ganache and Drizzle. In this project Truffle is used as a development and test-ing framework as well as an asset pipeline. Ganache is an Ethereum blockchain emulator designed for testing and to be run on a local machine. Drizzle is a collection of front-end libraries aimed at simplifying writing decentralised application front-ends. However, since the focus of this project is not working with front-ends Drizzle was excluded from use. Ulti-mately, Ganache was also excluded from use since it was determined that the smart contract written in this project was to be deployed on the Ropsten network.

The reason for working with a development framework, as opposed to not, was due to the automation of certain things such as deployment and testing which otherwise needs to be set up manually. Using a development framework thus saves time and energy.

There are similar frameworks to Truffle such as Embark, Dapp and Builder. However, Truffle was chosen because of the authors previous familiarity with the framework as well as its more extensive community and GitHub repository size. Additionally, Chainlink provides a so called Truffle box [21] which can be used with Truffle to quickly get up and running with Chainlink inside Truffle. Since this is not a comparative report on Ethereum development frameworks no further investigation as to what development framework to use was made. This could be interesting, but is left for future research.

2.8.7 Ethereum Clients

In order to communicate with the Ethereum blockchain an Ethereum client must be utilised. The Ethereum client or Ethereum node is used as the entry point to the blockchain and is what allows for contract deployment and transaction invocations. The Ethereum client connects with web3 and handles all blockchain related requests.

Geth & Parity

Geth and Parity are the two most popular Ethereum clients released so far [26]. They both support the notion of a full, fast and light sync. A full sync entails downloading all the blocks in the blockchain and iteratively processing all the blocks one by one verifying each transaction. A fast sync similarly downloads all the blocks, but does not manually verify each transaction. Instead it uses the transaction receipts and block hashes to do this [19]. A full sync is an extremely time consuming and disk usage intensive task and so is a fast sync despite its name. A benchmark test performed by Péter Szilágyi for the official Ethereum blog showed that using geth, depending on the version, a fast sync could take anywhere between 4 and 11 hours and require 130-180 GiB of disk space, while a full sync could take anywhere from 6 days and 8 hours to 6 days and 15 hours and require 300-340 GiB of disk space [38].

12_{https://docs.ethers.io/} 13_{https://www.trufflesuite.com/}

(20)

2.9. SJ & The Swedish Transport Administration

Third Party Clients

Due to the work and friction involved in setting up your own node, there are multiple com-panies providing Ethereum clients as a service. Some examples of comcom-panies which offer these services include Infura14 and Linkpool15. These services allow developers and other users of blockchain applications to interact with the blockchain without having to maintain their own Ethereum client. Since these services may be used by multiple entities it should be noted that a rate limit on the number of requests that can be made, per some time unit, is often imposed.

2.8.8 Metamask

Metamask is a web browser plugin which allows users to create Ethereum wallets as well as interact with smart contracts directly through the web browser16. Metamask is using In-fura as the underlying Ethereum client node, through which interactions with the blockchain occurs [13]. Metamask is also able to connect with Remix and act as the EOA that initiates contract deployments and function calls.

2.9 SJ & The Swedish Transport Administration

SJ is the leading passenger train operator in Sweden. The organisation is government owned. In 2018 31.8 million journeys were made with them. SJ has a reputation of not always being very punctual, which is to some degree represented in the statistics. In their annual report of 2018 SJ reported a punctuality of 77% and 88% for long and medium distance travels, respectively [36]. In the case of delays the passenger is, in certain cases according to Swedish law, entitled to a refund [24]. These cases are clearly stated on the SJ website17 and are the requirements which are used by the smart contract system built in this project. The issuance of refunds based on these cases are indeed implemented in the system, but is not a part of the research conducted in this thesis. SJ mainly serves as the provider of the real world context for this project.

The Swedish Transport Administration, or Trafikverket, as it is known in Sweden, is a governmental institution in charge of long term infrastructure planning18. Trafikverket maintains a public open API that can be used to query various traffic information19_{. Of}

interest to this project is the information regarding train trips and specifically the data field in the API responses called timeAtLocation, which can be used to determine the time at which a train used for a trip arrived at its destination20. This is the information that the Chainlink node, set up for this project, has retrieved from Trafikverket and fed to the smart contract.

2.10 Related Work

The related work explores blockchain and smart contract research, the field of oracles and how to retrieve external data to smart contracts, as well as the area of sybil attacks.

14_{https://infura.io/} 15_{https://linkpool.io/} 16_{https://metamask.io/} 17_{https://www.sj.se/sv/reseinfo/resevillkor/ersattning-vid-forsening.html (visited on 26/05/2020)} 18_{https://www.trafikverket.se/om-oss/var-verksamhet/trafikverkets-uppdrag/} 19_{https://api.trafikinfo.trafikverket.se/Console/} 20_{https://api.trafikinfo.trafikverket.se/API/Model/}

(21)

2.10. Related Work

2.10.1 Smart Contracts and Performance

Weber et al. [41] investigate and identify a few availability limitations of both the Bitcoin and Ethereum blockchains. The authors have looked at the ability to write to the blockchain, i.e. conducting a transaction which results in a state change, and observed the time it takes for such a transaction to commit, how some transactions never commit, and how some transac-tions stay pending for an unknown duration of time. In addition, they suggest a mitigation strategy, consisting of a method to abort a pending transaction. This work relates to our work thematically as both deal with the issue of availability. Their work is, however, more related to scalability issues and on-chain transactions while our work is focused on off-chain retrieval of data and how that aspect affects the availability of the system. This report uses 12 block confirmations as confirmation that a transaction has been permanently committed to the blockchain, which is further used by this thesis. However, the studies also differs in some ways. Weber et al. conducted a two part study. One part which was purely observational and the other part experimental. Over 6 million transactions were observed as basis for the first part of the study and as previously mentioned these transactions were used to derive results regarding the overall availability of the blockchain. The second part tests the ability to abort transactions. They derived the results for those experiments from 202 transactions which is a far smaller number of transactions than that of the first part and also compared to the number of transactions issued in this study.

Yao-Chieh Hu et al. [22] propose and implement a hierarchical architecture composed of two types of smart contracts, and measure the performance of their proposal. Their proposed architecture consists of a custodian and their client(s) with their relationship being that the custodian contract can deploy, access data and call methods of their client contracts. A lot of the methodology described in their study have also been applied in this thesis with some differences. Yao-Chieh Hu et al. measure the performance of their proposal using latency and gas usage as metrics. With the exclusion of block delay, these are the same metrics as those used in this thesis. However, the authors of the article do not mention block confirmations, thus it is presumed that they have only measured the latency between transaction initiation and the first block inclusion of the transaction. Furthermore, they have conducted their performance tests on three test networks, Ropsten, Rinkeby, and Kovan, while the tests of this thesis is only conducted on Ropsten. Furthermore, their study consists in large part of measuring the performance of deploying contracts. The measurements conducted in this thesis, on the other hand, are with regard to the use of the smart contract’s functionality, and in particular functionality concerned with retrieving external data.

McCorray et al. [29] implement a voting protocol as smart contracts on the Ethereum blockchain with the purpose of evaluating the process as well as the final result of the imple-mentation. While investigating another use case McCorray et al. has applied a process which is similar to ours. Both projects involve the development of a smart contract system and a performance evaluation which takes gas costs into consideration. Additionally, their design and implementation for voting is reminiscent of the implementation of this thesis for retriev-ing data from passengers as both involve the system receivretriev-ing submissions from multiple EOAs, which are then computed into a single result. Furthermore, McCorray et al. bring up limitations of implementing smart contracts, such as the limited amount of code which can be deployed in an Ethereum block, due to the block gas limit. This is a relevant topic as it may force more complex smart contract implementations to be divided up into several smart contracts. While limitations such as these are relevant to the work involved in this master’s thesis, where the implementation of a relatively complex smart contract has taken place, it is not the purpose and was never a hindrance during the development. It should also be noted that had their study taken place today, they could have, most likely, used one smart contract for all their code, as the block gas limit has increased since their study took place. Still, the

(22)

2.10. Related Work

block gas limit is something that must be considered when implementing a complex smart contract system.

Casado-Vara et al. [5] propose a model for solving certain problems in the area of logist-ics, and in particular the problem of delays in deliveries of products between parties. The model consists of three elements: A blockchain in which all transactions are stored, smart contracts that manages commercial transactions between the different parties, and a multi-agent system enabling the execution of all these operations. According to the authors, their model allows for an efficient logistics system, due to automation. It additionally allows for solid security features, due to the incorporation of a blockchain, such as tracking of ship-ments and proof of all transactions being stored and unmanipulated. The smart contracts’ terms are used by agents to verify that all parties abide by them. While logistics is not the subject area of this thesis, this article similarly explores a solution which involves different actors and is dependent on external data inputs. However, the article does not dive too deep into and evaluate how this data is retrieved and what that process entails. This is a relevant problem in the subject area, and is an area which this thesis hope to provide more data to.

2.10.2 Oracles

The challenge of retrieving external data from multiple sources is explored by Laan, Ersoy and Erkin [28]. As presented, it was, at the time of the release of the article, not possible to retrieve data from multiple sources without issuing one separate request per data source and thereby retrieving data from each source separately. This requires the oracle to perform one transaction per data source to feed the requesting smart contract with its data. The authors of this paper present the concept of MUlti-Source oraCLE (MUSCLE) wherein they explore the retrieval of data from multiple sources and the subsequent feeding of said data in one transaction. Through the implementation of MUSCLE using five different aggregate signa-ture schemes they compare the respective performances between the schemes and to oracle solutions based on TLS-N which represents the current state of the art. They found that their MUSCLE based on the aggregate signature scheme ECDSA and BGLS had the lowest total gas expenditure and lowest transaction and storage costs respectively

H. Ritzdorf et al. [33] propose TLS-N as a so called non-repudiation solution which can be integrated with the existing TLS 1.3 protocol adding the ability for data providers to cryptographically sign their data with minimal overhead. While the authors acknowledge several use cases for this, they especially acknowledge blockchain-based smart contracts as being a significant beneficiary of this solution. Since the signature would allow for the smart contract to verify the validity/authenticity of the provided data, i.e. that it really came from a certain source, there is no need for having to trust the transmitter of that data, i.e the oracle. This solution, however, requires that the data sources modify their infrastructure to include the ability to generate signatures in accordance with the protocol.

The articles [28, 33] are mainly focused on the problem of bringing authentic data to the blockchain and propose solutions based on cryptography and signatures to ensure the valid-ity of the data provided. Both reports investigate approaches which require modifications to the data source. While such an approach was not investigated in this thesis, as there was no possibility to modify the data source, in both articles the performance of their proposals are evaluated. Similarly to our evaluation, in both articles performance of the solutions are assessed by measuring gas costs, with [33] also taking time into account.

Zhang et al. [44] tackle the problem of connecting smart contracts to off-chain data feeds by introducing their authenticated data feed system called Town Crier (TC). The system will act as a bridge between smart contracts and websites and it utilises special hardware in the

(23)

2.10. Related Work

form of Intel’s Software Guard Extensions(SGX). TC consists of an on-chain contract(CTC) and an off-chain server(STC). The on-chain contract serves as the interface through which smart contracts request external data. STC monitors CTC and consists of a relay and an enclave. The enclave is a part of the SGX and allows for code to be run in a trusted execution environment. The hardware is designed such that neither the OS or other processes on the host machine can tamper with the secure execution of code inside it. However, the enclave lacks network access thus the relay is used which provides network connectivity to and from the blockchain, clients and data sources. The relay is a regular user-space application and does not share the benefits of the enclave, thus it is susceptible to tampering. However, due to the use of cryptographic signing of output data from the enclave the relay cannot tamper with results. However, it can perform denial of service attacks. Nevertheless, assuming an honest relay, TC can be trusted to fetch external data in a trustworthy way.

The Town Crier article, which both TLS-N and MUSCLE refer to, also focuses on the problem of bringing authentic data to the blockchain and similarly to this thesis project makes the assumption that since the data source is trusted by the requester/client/smart contract, that data can be trusted. However, instead of verifiable signatures from the data source, the Town Crier-solution uses a hardware-based approach to ensure that the oracles behave honestly.

2.10.3 Sybil Attacks

Douceur [12] coins the term Sybil attack in his article with the same name. He explains that entities within a distributed system do not necessarily know about each other in a physical sense and therefore they use distinct identities to distinguish between themselves. In an ideally and honestly created distributed system where all entities act truthfully, one identity corresponds to one entity. The Sybil attack denotes an attack in a distributed system wherein a malicious entity broadcasts multiple identities as to represent multiple entities and thereby gaining a disproportionate amount of influence. There exist three sources that entities can use to verify the authenticity of identities of other entities. Namely, a trusted centralised authority, itself or other entities. Without a trusted centralised authority, which is the case in permissionless blockchains such as the Ethereum blockchain, there are two ways for entities to verify the identity of other entities.

• The entity itself has by some means directly identified another entity. This is known as Direct Validation

• The entity can trust the validity of another entity’s identity as vouched for by other entities which it has already verified. This is known as Indirect Validation

In the article it is found that, in the case of direct validation even when dishonest nodes are severely resource constrained they are able to forge a constant number of multiple identities. Furthermore, if an honest entity does not simultaneously validate all identities presented to it, a dishonest entity can forge an unlimited number of identities. This is unachievable in a large-scale distributed system. As for indirect validation if the set of dishonest nodes is large enough an unlimited number of identities can be forged by them. Additionally, a dishonest entity can forge a fixed amount of multiple entities unless all entities in the system perform their identity validations concurrently.

Compared to the other areas of research done in this thesis, the sybil attack and vulner-abilities in general are not the focus of this project. However, as it is such a fundamental attack on decentralised, distributed systems, researching it helped guide some of the design choices made in this study. Not only are smart contracts vulnerable to sybil attacks launched on the blockchain they reside on. They become even more so, when their execution depends on the input of independent identities, such as the passengers of a train. While this report

(24)

2.10. Related Work

does not specifically address smart contract systems, it does present a real threat which affects the implementation of smart contracts and thus has an effect on their performance.

(25)

3 Method

As revealed by the related literature, an appropriate way of measuring the availability of a blockchain is monitoring transactions and recording the time it takes for them to per-manently commit to the blockchain. A similar method was used in this thesis with some modifications. However, the goal of this thesis is not to measure the blockchain availability as a whole, but rather the availability of a smart contract system with regard to fetching and receiving off-chain data. Therefore, transactions to the smart contract, rather than arbitrary transactions taking place on the blockchain, was recorded and measured. As the context of this work is that of train travel, and more specifically retrieving the time at which a train reaches its destination, it was necessary to build a smart contract system capable of such. Furthermore, the goal of this thesis is also to compare the two approaches of either retrieving this piece of information as reported by a conventional data source with having passengers of the train directly submit their data to the contract. Therefore, there was a need to implement connectivity between the smart contract system and the off-chain data sources. In the literature, there are multiple proposals with regard to making conventional data sources, such as APIs, accessible to smart contracts. There was, however, only one approach which was feasible considering the data source accessibility levels and resources available to the authors of this thesis. With no access to modify the data source and no intention of acquiring specific hardware, the Chainlink framework was selected as the oracle solution with which to connect the smart contract system to the API provided by Trafikverket. This chapter describes and motivates the methods used in this thesis project. The chapter is divided into two main sections where each part presents a respective segment of the method used to conduct the research in this thesis. The first subsection gives an overview of the smart contract system build in this thesis. Thereafter, the development of the smart contract system including the implementations for retrieving off-chain data to the smart contract is presented, while the second subsection describes how the evaluation was conducted. Adhering to the definition provided by B.A. Kitchenham et al. [27], this study classifies as an empirical study utilizing formal experiments. Accordingly, the research conducted in this project largely followed the guidelines laid out in that paper.

(26)

3.1. Building a Smart Contract System

3.1 Building a Smart Contract System

In order to perform practical tests, a simplified train ticket system was constructed as a smart contract. Additionally, functionality was implemented to retrieve data according to the two approaches described in the research question. This meant retrieving the time at which a train arrives at a station as reported by the passengers and by Trafikverket. Additionally, to compare the performance of the two, a test suite was built.

The components that comprise the system built in this thesis are illustrated in Figure 3.1. The yellow box contains components that exist on the blockchain, while the blue box contains those that exist outside. Furthermore, the green components have been constructed by the authors, while the red components are out-of-the-box components that were used.

Figure 3.1: Structural representation of the system with additional components used for the evaluation.

The data retrieval from passengers are represented by the vertical flow at the bottom of Figure 3.1. In order to interact with the smart contract, a Geth node was set up. This allowed

(27)

the test suite to interact with the smart contract. The Test Suite contains the functionality needed to trigger the tests and perform the measurements. Furthermore, it contains the boxes “Passengers” and “Contract manager”. The passengers were simulated as EOAs generated and handled by the test suite. Similarly, a contract manager account was simulated and used to trigger administrative functionality. The Web3.js component is the component housing the necessary code allowing the test suite to format transactions and interact with the Geth node. The data retrieval from Trafikverket is the other flow at the top of the diagram. As smart contracts do not natively support connectivity to conventional data sources, some oracle solution had to be used. While there are multiple propositions in the literature, they were not feasible for this investigation as they either require modifications of the data source, or access to special hardware. Therefore, the open-source framework Chainlink, which required neither, was used.

In order to use Chainlink to retrieve data to a smart contract, you have to add at least four components.

• A Chainlink node has to be deployed on some server. • An Ethereum node has to be set up.

• An oracle contract has to be deployed on-chain.

• The smart contract has to be extended with the ChainlinkClient library.

The Chainlink node itself is a program which was deployed on Google cloud and is the entity responsible for querying the Trafikverket API. Similar to the Geth node, LinkPool is an Ethereum node which has been provided by a third party service, allowing the Chainlink node to interact with on-chain entities. The oracle contract is a contract monitored by the Chainlink node. So when the oracle contract receives a new request from our smart contract, the Chainlink node starts its work on fulfilling it. Lastly, the smart contract needs to be extended with the ChainlinkClient library in order to handle Chainlink requests.

Additionally, the external adapter is a component that was added in order to extend the Chainlink node’s functionality, enabling it to create authenticated xml-formatted requests. This was done as Trafikverket’s API only handle requests of that type.

It is important to point out that the system built is not a fully fledged train ticket sys-tem. It is rather the scaled down version of what such a system could look like and only developed to the extent that would allow experiments to be conducted and selected metrics to be measured. Building the system did not only provide a realistic environment for testing, but highlighted the limitations of the smart contract platform as well. These might not have been discovered had this been a purely theoretical study. In order to construct this system, the authors attempted to identify the minimum key requirements the system had to fulfill to be able to handle tickets for different trips and set correct refunds based on a train’s TAL, where the TAL should be retrievable from both the passengers and from Trafikverket. For this, the smart contract system needed to be able to handle creation of trips, bookings made by passengers and retrieving TALs from both passengers and Trafikverket. These high level features helped guide the development of the system and were concretised into the following requirements:

• A trip must have a representation in the smart contract system.

(28)

• It must be possible, with an EOA, to interact with the contract and get added as a pas-senger to a trip.

• The contract must be able to store identifiers to all passengers in each trip. • It must be possible to retrieve the TAL(s) from Passengers to the smart contract. • It must be possible to retrieve the TAL from Trafikverket to the smart contract.

• It must be possible, using the retrieved TAL(s), to store a single value representing the TAL for the trip.

When developing a smart contract, it becomes apparent that the normal considerations a programmer has when attempting to make a program as efficient as possible are magnified. As every operation and calculation, during code execution, takes place on all nodes in the Ethereum network and costs gas to perform, there is an incentive to minimise the number of operations needed to fulfill the functionality of the smart contract. Furthermore, storing large quantities of data is also discouraged as every node has to store a copy of it which can be expensive.

A trip was represented as a struct which the contract had access to. An alternative way to accomplish this would have been to use the factory design pattern [43]. This entails deploying a so-called factory contract to the blockchain from which new contracts can be instantiated and deployed. In the case of this project those new contracts would be the trips, i.e. every trip is represented as its own contract. Implementing it this way has the main benefit that money is divided between the contracts. In this way a hypothetical attacker cannot reach all funds by attacking a single contract but rather the attacker’s reward for a successful attack is proportional to the amount of contracts he manages to compromise [22]. However, at the time of testing, the cost of deploying a new contract was considerably higher than creating a new struct instance. As this system would require creating multiple trips, this design pattern was ultimately deemed infeasible under the current conditions. Thus, the trips are represented as structs and are handled using a dictionary.

3.1.1 Retrieving Data from Passengers

Conceptually, it is assumed that the passengers have access to and can communicate with the Ethereum network, for example through a solution such as Metamask1. In order for them to submit data to the smart contract, specific contract functions needs to be implemented allowing for this. Furthermore, since only a singular TAL should be stored in the trip in the end, an essential part of retrieving data from multiple clients is that there must be an aggregation method which takes all valid submissions and outputs a single resulting value. With these two aspects in mind, the following requirements were defined and implemented in the smart contract.

• There must be functionality allowing for passengers to a trip to submit a TAL once, and only once, and for the contract to store all submissions.

• The trip must have functionality to aggregate all submitted TALs and store the result as the final TAL for the trip.

Aggregation

The aggregation method in this implementation was median aggregation. This was achieved using the quickselect algorithm [32]. Conceptually, it was assumed that there may be dishon-est submitters. While there are multiple measures that can be taken against such behaviour,

(29)

most of these were outside the scope of this report. However, since aggregation is essential in this case and is taking place on-chain, it must be considered as it affects the performance of the system. The choice of implementing median aggregation is motivated by the fact that if strictly more than 50% of submissions are correct, then the end result will be correct as well. Consider the alternative of average aggregation. Using average aggregation would also provide a single value as the result. However, it would allow a single dishonest user to sig-nificantly skew the result with a submission that differs largely from the rest. Using median will also cause Sybil attacks to become more expensive. A malicious actor who performs a Sybil attack by buying several train tickets using different EOAs must purchase strictly more than 50% of all the tickets in order to be sure that he controls the resulting outcome. Therefore, in order to minimise the number of correct submissions needed to generate a correct result, the median aggregation was viewed as the most appropriate choice for the implementation of this thesis.

3.1.2 Retrieving Data from Trafikverket

Since a smart contract cannot directly query off-chain resources, and most off-chain resources, such as Trafikverket, does not have the infrastructure required to provide data on-chain to a smart contract, the main concern, with regard to retrieving TAL data from Trafikverket to the smart contract, is selecting and implementing an oracle system which can provide this service.

Choosing the Oracle Solution

As described in Section 2.10.2, there are some different proposals to retrieve data from trus-ted conventional off-chain data sources [28, 33, 44]. However, these did not provide any practically available solutions. Instead, in this project, the oracle solution used for retrieving external data was Chainlink2. Unlike other solutions and proposals, Chainlink does not re-quire the use of specific hardware or modifications made in the data source. Furthermore, the documentation is thorough and provides a clear explanation of the architecture, the different components of their solution and how they are used. It further provides clear instructions for setting up your own oracle node, i.e. a Chainlink node. During this project, the authors participated in the official discord channel and found both the core development team and community members actively responding to questions regarding implementation and how to use Chainlink within a smart contract system.

Developing and testing a smart contract which requires the services of a Chainlink node can currently be done without having to set up your own Chainlink node, assuming one only requires the out-of-the-box adapter functionality. At the time of writing the Chainlink docu-mentation provides addresses to eight Chainlink nodes on the Ropsten test network which can be used to request off-chain jobs [10]. However, in this project the off-chain capabilities required expanded beyond the provided out-of-the-box adapter functionality of a Chainlink node. This was due to two aspects. Firstly, accessing the API provided by Trafikverket requires authentication. Secondly, the API requires that HTTP requests must be provided in an XML format. Neither of these can be accomplished without added functionality. For this reason, as well as to have control over the configuration of the system, a custom Chainlink system was set up in order to retrieve data from Trafikverket.

Setting Up the Chainlink Node

The Chainlink node was set up by following the guide provided in the Chainlink document-ation [34]. For this project, a Chainlink node, of version 0.7.4, was set up on a virtual machine

Availability of Smart Contracts that Rely on External Data

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Computer Science

2020 | LIU-IDA/LITH-EX-A--20/054--SE

Availability of Smart Contracts

that Rely on External Data

Tillgänglighet av Smarta Kontrakt Beroende av Extern Data

Tjelvar Guo

Daniel Han Herzegh

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

List of Tables

Acronyms

1

Introduction

1.1

Motivation

1.2

Aim

1.3

Research Questions

1.4

Method Overview

1.5

Delimitations

2

Theory

2.1

Blockchain

2.2

Transactions

2.3

Proof-of-Work and Mining

2.4

Blockchain Forks

2.5

Block Limits and Gas Usage

2.6

Smart Contracts

2.7

Oracles

2.8

Development Tools

2.8.1

Solidity

2.8.2

Test Networks

2.8.3

Remix

2.8.4

Chainlink

2.8.5

Web3

2.8.6

Truffle Suite

2.8.7

Ethereum Clients

2.8.8

Metamask

2.9

SJ & The Swedish Transport Administration

2.10

Related Work

2.10.1

Smart Contracts and Performance

2.10.2

Oracles

2.10.3

Sybil Attacks

3

Method

3.1

Building a Smart Contract System

3.1.1

Retrieving Data from Passengers

3.1.2

Retrieving Data from Trafikverket