What are the challenges of developing a marketplace for real-time data?

(1)

What are the challenges of

developing a marketplace for

real-time data?

Vilka är de stora utmaningarna i att utveckla en marknadsplats för

realtidsdata?

Isak Allanson, Gustav Eglénius Nilsson

Examen: kandidatexamen 180 hp Huvudområde: datavetenskap Program: systemutvecklare

Datum för slutseminarium: 2018-08-28 Handledare: Radu-Casian Mihailescu Examinator: Aleksander Fabijan

(2)

Abstract

In this paper the difficulties and challenges of developing a marketplace for real-time data is presented. Additionally some solutions to these challenges are suggested. With the term real-time data that is used in this paper, it means live data from different kind of sensors for example thermometers and humidity sensors. These can be put out at different locations where someone can make use of them, after purchasing a dataset from the marketplace. The sensors can be put up and the data be sold by individual sellers on the marketplace.

This subject is important because of the rising popularity of the smart home. A marketplace for real-time data is used for a “smart city”, which is a natural progression of the smart home. But it is in its early stages and as such no major research has been done in the area. The marketplace is used in a “smart city” to help users gain access and to use the different sensors and applications in the network.

The study follow a case study researching methodology at a company related to development of Internet of Things products that is in the planning phase of developing a marketplace for real-time data. Following this case study research approach, two methods for investigating the topic are used; interviews and literature review. Four interviews are performed at the company, by the researchers on some of the employees, about the different aspiration of the upcoming marketplace. The literature review is performed to get a background and additional information on how a marketplace for real-time data can be developed and what is needed to make it happen. These two are then compared and analysed together to form the conclusion.

Several different topics were found in both the interview and the literature review, including trust, privacy, interoperability and user usability. These are presented in the conclusion part of this paper.

Sammanfattning

I denna uppsats presenteras de problem och utmaningar som finns vid utvecklandet av en marknadsplats för realtidsdata. Några lösningar på dessa utmaningar är också föreslagna. Med realtidsdata menas data från olika typer av sensorer som tex termometrar och luftfuktighets sensorer. Dessa kan bli placerade på olika platser där någon kan finna ett använde för dem, efter att ha köpt ett dataset från marknadsplatsen. Sensorerna kan sättas upp och datan kan säljas från individuella säljare på marknadsplatsen.

Detta ämne är viktigt pga den ökande populariteten av ett “smart” hem. En marknadsplats för realtidsdata används för en “smart” stad, vilket är en naturlig fortsättning av det smarta hemmet. Men det är just nu i ett tidigt stadie och ingen grundlig forskning har utförts på området. Marknadsplatsen används i den smarta staden för att hjälpa användarna få tillgång och använda de olika sensorerna och applikationerna i nätverket.

Denna uppsats är en fallstudie på ett företag relaterat till utvecklandet av “Internet of Things” produkter som är i planeringsfasen för att utveckla en marknadsplats för realtidsdata. Enligt tillvägagångssättet för en fallstudie, två metoder för att undersöka ämnet används: intervjuer och en systematisk litteraturstudie. Fyra intervjuer utförs på företaget, på några av de anställda, om de olika målen med den kommande marknadsplatsen. Litteraturstudien utförs för att få en bakgrund och kompletterande information om hur en marknadsplats för realtidsdata kan utvecklas. Dessa två jämförs sedan för att forma en slutsats.

Flera olika områden hittades i både intervjuerna och i litteraturstudien, inklunderande tillit, privacy, interoperabilitet och användbarhet. Dessa är presenterade i slutsatsen i denna uppsats.

(3)

1 Introduction 4

1.1 Background 5

1.1.1 Internet of Things 5

1.1.2 Internet of Things Data 5

1.1.3 Digital Marketplace 5

1.1.4 Digital Marketplace for Real-Time Items 5

1.1.5 Privacy 6

1.2 Research Question 6

2 Method 8

2.1 Case Study Research 8

2.2 Case Company 8 2.3 Data Collection 8 2.3.1 Interviews 9 2.3.2 Interview discussion 10 2.4 Data Analysis 11 2.5 Threats to Validity 12 2.6 Literature review 12 2.7 Other methods 13

2.7.1 Design and Creation 13

2.7.2 Observation 14

2.7.3 Quantitative data analysis 14

3 Results 16

3.1 Findings and Challenges identified through Case Study 16

3.1.1 Privacy 17

3.1.2 User usability 17

3.1.3 Interoperability 18

3.2 Findings and Challenges identified through Literature review 18

3.2.1 Trust 19

3.2.2 Internet of Things Platform Requirements 20

3.2.3 Interoperability 21

3.2.4 Privacy 21

4 Analysis 24

5 Discussion 26

5.1 After first interview 26

5.2 After second interview 27

6 Conclusion and future work 28

References 29

Appendix 31

Intervjumall 31

Intervju 1 34

(4)

(5)

1 Introduction

With the rise of Internet of Things and the use of them in smart cities and homes, the need for technologies to support these become more common[1]. One of these technologies that is available is LoRa LPWAN, which stands for Long Range Low Power Wide Area Network. This technology allows devices to connect to a network from up to ten kilometers away and send data to a central unit[2]. The idea behind this is to allow millions of Internet of Things devices to connect to such a network and then share that data between each other. To make this process of discovering and sharing data between devices easier, a marketplace for all of the devices is a good idea to develop and keep in use.

With the new EU regulation, GDPR (General Data Protection Regulation), revolving handling of personal data, this could also be deemed a problem[3].

A digital marketplace is something that is a common method used in today’s society to make any kind of sale, weather it be in form of physical or digital goods. Physical stores are moving to a digital platform and the sales of digital goods are common sense when they are sold on a digital platform. Digital sales such as services are also something that has become increasingly more popular in the recent years[4]. With services such as streaming movies and music. The difference here however is that the files are already stored and are just being streamed and shared over the internet. What has not been done and what the meaning behind this essay is to look into creating a digital marketplace for _{live data. To create a marketplace for real time data, from sensors,} Internet of Things-devices and alike. This data can then be used for different programs and devices around the world.

This is the research gap that we’re trying to fill. What are the differences between creating a digital marketplace for real time data and a regular digital marketplace? What techniques and strategies can be used to help solve this problem?

(6)

1.1 Background

In this following section a few different subjects, that are brought up in this paper, are described.

1.1.1 Internet of Things

Internet of Things is a phrase describing a network of physical devices that are connected to this network and are able to communicate with each other. This could be anything from home appliances, mobile devices, sensors, transport or any other physical object that has embedded electronics, sensors and connectivity that allows these devices to send and receive data with other devices to communicate. This enables machine to machine communication which could in turn automate a lot of mundane tasks, and give you information about the environment around you.

In turn this could allow for an easier way of living. An example of Internet of Things devices and the use of them could be monitoring, which is what a lot of IoT devices do. A humidity sensor in a bathroom could for example control the ventilation automatically in the room, if it senses that it is too humid. The benefits and value of IoT does not stop here, and there is a lot of potential that is to be explored as more and more IoT devices emerge[5].

1.1.2 Internet of Things Data

As a result to a lot of devices connected to a network that communicate with each other, a lot of data is generated. This data has the potential to be stored and be used to analyze a lot of different things, depending on what kind of data the device generates. The data is in many cases personal, since IoT devices can often be personal wearables, and this data might not be something that the person wants to share. IoT devices can however also be some kind of sensor that is put out in a city, which would in general be more suitable to be shared with different people. It could for example be some kind of weather sensor. Sought after data generated by some kind of sensor has the potential to be marketed and be sold, for example on digital marketplace platform.

1.1.3 Digital Marketplace

A digital marketplace, or online marketplace, is at the foundation no different than a traditional marketplace where people buy and sell products. A common use of a digital marketplace is an extension of a company’s physical store. It allows the company to sell the same products but to more people because of the extended reach that the internet provides. However some of the most famous and most profitable online marketplaces have very few or no physical stores. Companies like Amazon and Alibaba conduct the majority of their business online.

There are several reasons that an online store is preferred over a physical store. One part is as mentioned earlier the extended reach that the internet provides. It also allows the customers to shop without any need to travel. One downside to an online store has traditionally been the customer service as there is no seller that could provide helpful information. This issue is increasingly less critical as there are different methods to solve this problem, such as a live chat with customer support.

1.1.4 Digital Marketplace for Real-Time Items

A digital marketplace for real-time items is what happens when you integrate Internet of Things with a digital marketplace. With real-time data we mean data that is continuously being updated.

(7)

The interval can differ, depending of what kind of data that is gathered. For instance, weather data could be useful to gather in 10-15 minute intervals to catch sudden changes. On the other hand energy consumption in a home doesn’t necessarily need to be updated as often to be useful, hourly or even more infrequent might be enough.

The goal is to create a platform where individuals can connect their sensors and sell the data that is generated. The data can also be stored in the platform and grouped together so it can be sold as a package, e.g. previous day/week/month worth of data. The marketplace doesn’t need to have a specific target group. Initially the users, as it has been with “smart” devices, will most likely be companies and technically skilled individuals. It is of course easier for a big company to place a large amount of sensors and to use that kind of data. But as the user base increases and new services are being provided it will reach a larger proportion of the population. This is however still a new concept and very few examples are constructed. But there are certain visions for how such a marketplace could function.

If a large amount of sensors that measure the weather are placed on the balcony around a city a more local and exact prediction for where it rains in the city. It might rain on the opposite side of the city but not where you are positioned. Another scenario is to combine street lights with proximity sensors. This could save energy by only turning on the lights when there are people close by. These are by no means all the applicable scenarios where a digital marketplace for real-time data could be useful but just two examples.

1.1.5 Privacy

Regarding privacy, different regulations about personal data has been used. One of these regulations that exists in Sweden is called PUL (Personuppgiftslagen) which is a law that fulfills EU’s directive regarding the free flow of personal information. The law states a lot of rules and regulations on how personal information is handled. The law is based on consent and information to the people about the data that is handled about them. There are for example rules about security and the correction of incorrect information. The purpose of this law is to protect people’s personal integrity from being misused with handling of personal information[6]. With the new regulation called GDPR, personuppgiftslagen will become obsolete, and be replaced by GDPR.

GDPR stands for General Data Protection Regulation and is a regulation from EU revolving the handling of personal data and information. Its purpose can be summarized to strengthen and protect people’s privacy and personal integrity, mainly on how companies handle this data and what they can do with it. This makes it easier for companies outside of Europe to follow the regulations that are required, instead of individual laws in every country. If said company does not comply with GDPR, a fine of up to 4% of their worldwide turnover or 20 million euros can be given, depending on which one are the highest[7]. For example; with GDPR you have the right to see all the information a company has about you, as well as requesting them to delete it.

1.2 Research Question

The research question originates from the challenges that could occur during the development of a marketplace for real time data. The research that is done has a purpose to alleviate the process of development and possibly find solutions to different problem that could arise. Research

(8)

regarding digital marketplaces is something that has been done to some extent, but not with the focus on real time data. Even just a marketplace with real time data (disregarding the research) that is collected and sold is something that we have not found. With the increasing devices of Internet of Things and machine to machine communication, this is definitely something that is interesting and useful to research[1]. The reason why this is important right now is with the increasing number of sensors and devices in a smart city. Everyone should be able to see and make use of this data, that would otherwise just sit there. This is why a marketplace, where all of these sensors and devices are collected and ready to be used.

The research is inspired and requested by a company that in the near future are planning on developing a marketplace for real time data. The company has previous experience with working with Internet of Things devices, the cloud and different smart home and city solutions. The next step for them, is to be able to sell data collected from a smart city or home. You should also be able to put up a sensor on the marketplace and sell the data that this sensor generates, for example a temperature or humidity sensor in a specific place in a city.

The research question is thus “what are the challenges of developing a marketplace for real-time data?”.

(9)

2 Method

2.1 Case Study Research

The research approach and strategy that is used for this thesis is case study research. The reasoning for this is to do a study that has the necessary depth for the given research field. A case study is good for complex environments were it can be difficult to study one specific factor, and it also works excellent in present and new theories which this marketplace builds on[8]. As well as keeping the study in the natural environment that the marketplace will be developed in and for. A case study can however can be a complex task to be done, since there is really no limit on how deep and detailed it can be. The data and research that is collected is also more open to misinterpretation due to a more analysing approach compared to for example an experiment, and this will be kept in mind during the research that is done. There are however some research strategies and methods that were considered, and these are discussed later in the ‘other methods’ section, but no other research strategy was seen as fitting as a case study. The case study will follow a standardized template for how to construct case study research[9].

2.2 Case Company

The company in question that the case study will be performed at is named Sensative and is a tech startup that is offering different IoT and smart home and city solutions[10]. They are focusing on pioneering in the IoT world, developing and marketing solutions that represent simple-to-use and valuable tools for people. The company develops practical IoT products that can be used in people’s everyday lives.

One example of a product that the company has developed is a thing called “Strips”. It is a wireless magnetic sensor that can be mounted on any window or door to detect if it is open or not. The “Strips” also come in versions where one senses temperature and light and the other senses water leakage. These have a battery life of up to 10 years of constant use.

The company also have a second business solution in focus that is called “Yggio”. This is an open connectivity platform that manages properties that unites IoT devices with service providers[11]. It is simply a platform for connecting a lot of different IoT devices, and has the goal to allow services to focus on the application development instead of the connectivity part of the application. This is achieved with an API that developers and devices can connect to.

2.3 Data Collection

The methods that are used in this paper are literature review and interviews. The reasoning to use two different methods is to get a more complete view and result over the current research and what has not been done. As well as to get information about the current research situation. The different methods of the case study go into depth about different aspects of the research.

The literature review serves as a tool to see what strategies that has been used in the past, how a marketplace is built, what different parts that are needed, how the user interaction with the system works and so on. The literature review serves a purpose to inspire solutions to the challenges found in the research. By performing a literature review, not only are the potential

(10)

solutions to different challenges found, but different pitfalls can be identified and prevented, not to mentioned be easier dealt with if occured. One of the more important things to accomplish with a literature review is also to establish the current state of the art, to see that the research gap that we’re trying to fill, has not already been researched[12].

The interviews are used with the purpose of establishing and evaluate what the challenges and solution to the research question could be. The interviews as the main method for the research allow for in depth answers necessary to base the research on. They allow the researchers to get an understanding of the current situation that the company is in, how they are planning on creating the marketplace as well as the requirements for it. The questions in the interview might reveal additional problems that weren’t thought of beforehand. The second interviews are then also used to reflect and analyze the results to confirm these results as possible challenges and solutions.

2.3.1 Interviews

The main method that is used this paper are interviews[13]. These interviews are performed at a company that is active in the topic, namely Internet of Things, LoRa networks and development of a digital marketplace for real time data. The respondents of the interview are three persons, that ideally have different roles within the company. But the important part is that one of them have knowledge about the vision of the marketplace and the idea of how the design of it is going to be. The same respondents to the first interview are used again in the next interview.

The interviews are performed in two parts. This means that a total of four interviews will take place. Three individual ones with each respondent, and one group interview for analysis. The first one as a pre-interview, to get information about the company's current situation and their idea behind their marketplace. What problems they are expecting and as well as how they are planning on solving these, if they have planned for it. The structure of this interview is semi-strict. The reason for this is to keep it structured and lead the interview to get the answers to those question that are needed for the preparation of the work that is going to be done in this paper. While still keeping it loose and open enough to allow the respondent to give their input to the technologies and subject that is presented, but might not have been asked.

The second interview goes over the results and solutions that has been found about the topic with the company, about the research that has been done. Their answers are then used to analyze if the given solutions to the topic can be deemed worthwhile and useful or not. This interview is more open than the previous interview, to spark a conversation and discussion about the results that are found.

The interviews are taking place in a conference room at the company where the respondents are working and are performed by the researchers of this paper. Documentation of the interview is done in the form of an audio recording with additional note taking. This audio recording is then transcribed and summarized with the results of the research.

During the interview the recordings are made with a smartphone. The researchers have decided to print out the questions that are asked, instead of using a laptop, to reduce distractions and represent a more professional impression. One of the researchers are in charge and lead the

(11)

interview, while the other researcher takes notes and help if needed. The second researcher can also ask follow up questions if that is needed. The important thing is that the first researcher leads the interview to keep it somewhat structured and to not make it confusing.

2.3.2 Interview discussion

The interview is structured around five main questions to keep it in the right direction of information that we’re trying to extract from the interview. We also have a few follow up questions to every main question. At the start of the interview the basic structure is explained. Why we are doing the interview, what research we are doing and how it comes in to play. As well as what happens with the information that is given from the interview and where the research paper will be published and who might see it. After this explanation a few underlying questions are asked regarding the interview. We ask if the respondent would like to be anonymous, and if they are okay with the interview being audio recorded to make the work of putting the information on paper easier. The questions are gone over in further detail below, and the complete template for the interview can be found in the appendix.

The first question that is asked once the interview begin is who the person is, what responsibilities the person has at the company as well as what work they do. This is partly to warm up the person for the interview, to get them talking and become more comfortable with the interview. But it is also to legitimize the respondents roll and value for the answers that are given. This answer gives us a reason as to why we are talking to this person and not someone else. If this person explains to us his experiences and what he is working with, it also gives us more reason to trust him regarding the research that we are doing in that field. Presumably, of course, that the field is the same that we are researching in.

The second thing that we ask from the respondent is to explain to us how their marketplace, that they would like to to develop, would function and look like. This is one of the more open and also important questions that we ask. It gives the respondent a chance to explain exactly what the marketplace is to them personally, and it can vary a lot from the different respondents due to the developing process being in such an early stage. Due to it being such an open ended question, there are a lot of information that could be extracted from this question. It also gives us the background information about the marketplace so that we actually can understand what they are talking about and maybe why they are thinking and answering the upcoming questions in the way that they do. Since our research is about finding the difficulties and challenges with developing a marketplace for real time data, this question is also very important to understanding and analyzing what challenges could occur, in order to find a solution for them.

After this main question we follow up with some questions depending on their answer and explanation. We ask if they have any previous experience with working on digital marketplaces. Who their target audience is. How the architecture is going to look like (if the planning even has started). And for them to describe a user scenario. It could be argued that these questions are big enough to asked as separate main questions, but in such an early stage of developing, many of these questions do not really have a definite answer. The answer to these questions are also likely to be brought up in the explanation of the marketplace, which is why they are put as an addon to the main question, if the topic of these follow up questions were not brought up.

(12)

The third question is about the difficulties of the development process of the digital marketplace, and is simply asked as; “what do you think the difficulties and challenges of developing a digital marketplace for real time data could be?”. This is to see if the respondent might already have thought about and found some of the difficulties regarding the development of the marketplace. The answer to this question will be used later in question five.

The fourth question part, that follows, are questions that ask about some of the difficulties we think of when we researched the marketplace that the company is trying to develop. These are brought up if the respondent haven’t brought them up himself when answering the previous questions. The questions are; “will the LoRa-network affect these challenges for you? (with regards to the challenges that were mentioned in the last question)”, “will you save any of the data from the users and how might this be affected by the upcoming regulation GDPR?” and lastly “how are you planning to solve the security issue with the marketplace?”. Our impression of the marketplace is that it will be centered around a LoRa-network, this might not be the case however, but it is important to establish what the challenges are in that case. With the rising discussion about privacy, including GDPR, we thought it would be appropriate with a question regarding this regulation. This could be one of the major challenges at this time with the marketplace and the handling of personal data, so the question and research regarding this topic is a very important one. With data traveling from sensor to sensor, to database and to different clients, the issue of security is also an important topic to address. The last question in this section is to get an answer to if they have planned some kind of solution on how to solve this issue, since data in transport could be very vulnerable.

The fifth and last question is if the respondent and team have thought of any solutions to the problems brought up in question four and five. The answer to this question could thus be very dependent on how far the company has gotten in their planning of the marketplace, and what the respondent have thought of personally. This answer could also be an essential part in the research and work that is done in this thesis, and could give us some insight on some of the solutions for the research question.

Since the form of the interview is semi-structured these questions are only done as a guideline, and the order and depth of the questions may vary. Some questions might have been answered by the respondent in a previous question, which we then go into more detail about, and will then not be asked again.

2.4 Data Analysis

The initial interview were transcribed to help analyze what was discussed. If an area was discussed to some lengths in the interviews then it was noted as important to research in the literature review. For example: interoperability was discussed and is therefore researched through the literature review.

As mentioned above a second interview is performed as a way of verifying the data. But this interview also functions as a way of analyzing the result and the data that is collected. By having a conversation and asking the respondents if the results that are concluded are correct and could

(13)

be of use for the development of the marketplace for real-time data. Additionally the data that is collected is compared and analyzed together with the results from the literature review.

2.5 Threats to Validity

Whenever you need to gather information there is always a risk that the information is not completely accurate. No matter how much an author tries it is hard to be completely unbiased. That does not necessary mean that the information is wrong, most of the time it is simply a matter of how the information is angled. In the context of the interviews that are being performed that could mean that some information is left out of the answer.

The interviews in this paper are performed on a few different employees on a IT company. The reason that these persons are chosen is because they are in the planning phase of constructing a digital marketplace for real-time data. That means that they have some information of the challenges in the development of said marketplace. However it also means that they have some interest to present their future product in a positive way.

A simple example is on a question about data collection. The responders might be tempted to only mention the positive sides on this subject, for example the increased ability to personalize the service provided. Which of course is a good thing, however as with many things there is a downside. With data collection comes the issue of privacy and personal integrity. Maybe the company collects more data than it needs for the service or it might sell it to a third party.

Because of this risk of validation it is important to supplement all information from different sources. Which is why this paper gathers information in a literature review as a complement to the interviews. If we find information in articles that confirms what is mentioned in the interviews then that information could reasonably be considered accurate.

2.6 Literature review

To get a current view over the current state of the art in the given subject, a literature review is performed. In the literature review, a number of articles and research papers about the topic of digital marketplaces technologies are looked over, analysed and compiled. Since this is a new field to be explored, it is important to research the current situation of what has been done and see if any of this information can be used and be applied to the situation of real time data in a digital marketplace.

The literature review is performed in a systematic way. This means that a few different searches in the academic database are made. Out of these results the suitable papers are chosen. The results are then filtered further with their abstract content being read and decided if these are of value or not. This is done one more time but with the full text and context of the paper, instead of just the abstract. This is to ensure that as much as possible of the previous work that has been done is covered. A systematic literature review require considerably more effort than regular reviews, the advantage however, is that they are more likely to guarantee a more thorough review since they are performed in a systematic and structured way. These articles are then summarized and used for the research in this paper[14].

(14)

The research sites that are being used to find articles and papers are primarily ACM Digital Library[15] and IEEE[16]. To filter out outdated papers that might not be relevant for today's digital markets and the research about digital marketplaces for real-time data, the earliest publication year for the papers that are used in the search is the year 2000. The main keywords that are used in the systematic literature review are; challenges, develop, marketplace, real-time data. These are used in different combinations to find as broad of a result as possible, while still being relevant to the subject. The keywords are supplemented by different synonyms, to further broaden the spectrum of papers and articles that are potentially relevant to the research. The synonyms are; for challenges; problems, difficulties, risks; for develop; create, software development, build; for marketplace; digital marketplace, digital market, e-marketplace; for real-time data; real-time stream, data stream. Additionally a few keywords are used separately to give a more complete picture of the research field. These keywords are; Internet of things, IoT, Security, privacy, LoRa and LPWAN.

As mentioned above the results will then be filtered to remove papers and articles that are not relevant. This is done in five steps. The first step is to decide the relevant databases to search in. In this step the keywords and publication year is also part of the filtering process. Our subject is specific for marketplaces of real time data. If it isn’t a part of our keywords, then it is most likely not relevant for our paper. To ensure that the sources are trustworthy, we remove any that are not of the correct type in the second step. The correct types are peer reviewed articles and conference and journal papers.

In the third step the articles are filtered through with regards to their abstract content. The most essential content of the article is brought up in the abstract and if it seems irrelevant the article is removed. The third step is duplicate removal, which is simply to remove duplicates that might have come up with the different combined searches with our keywords.

The fifth and final step is to filter out the papers and articles based on the content of the entire paper, and only keeping those that seem relevant. If the article doesn’t seem relevant enough and/or there is another article that contains the same issue and content, then it is removed from the literature review.

2.7 Other methods

2.7.1 Design and Creation

Design and creation can be implemented in several ways[17]. One way is to create a type of product that doesn’t exist before or that in some ways contribute new information that wasn’t known previously. This paper has a focus on the developing process and therefore is not useful to use in this way.

Some problems with design and creation could include technical problems or it could be related to the development process. The problems and the way they are solved could be considered as the answer to the research question. However some research should be done to verify that these problems are not unique to this project. Sometimes it might not be necessary, or even reasonable, to create the whole product to get the same results. If there is no new information created in

(15)

developing the complete product then there is no value in doing so. If this is the case then a smaller product should be created, either one part of the whole product or a prototype. For this paper that would mean that a marketplace where not all features have been implemented. However a smaller product includes a bigger risk of missing some of the challenges, it is important to make sure that no information is lost because of the chosen scope of the product.

There are a some advantages to this strategy but also some risks. One advantage lies in finding the challenges. This subject is relatively new and therefore there are not a lot of research done, in creating the marketplace you can be quite certain that the challenges that arise are relevant to the research question. In the same way, the solutions to the challenges are also straightforward since whatever solutions works in developing the marketplace can be considered correct answers. The risk with this is the limited source pool, with only your own developing process to work with there is a lack of credibility. For this paper there is also a deadline and since creating a marketplace is relative time consuming there would be limited time over for the writing process.

2.7.2 Observation

To use observations for this paper would mean to observe a development team that is creating a digital marketplace. In a sense it is the same thing as with Design and Creation, but to observe instead of developing the marketplace yourself. The developers will discover challenges along the way and work to solve them. If the observers are hidden, the developers doesn’t know they are being observed, the result might be a more correct answer then what could come up in an interview. This is because it is hard to consider all variables in an interview, depending on the situation different solution might be preferable. However this might also give a too specific of a result, or it might not follow a known strategy. A solution that works in one situation might not be applicable in another.

There is also the problem with time. The development of a digital marketplace is a relative long process and might change over time depending on how much previous knowledge the developers has and how iterative they are working. It might be possible to get enough information by just observing part of the process, but that could still be several weeks or months of development.

One way to get around this problem would be to indirectly observe the work that has been done. This could be done by analyzing the documentation on for example github after the marketplace has finished development. In that way it is possible to gather information on the challenges that arose during development, without the need observe during the whole process. This comes with other problems however, because even in the best of situations not all aspects of the development are documented. And even if all challenges have been documented the strategies to solve them might not.

2.7.3 Quantitative data analysis

A quantitative data analysis is an alternative to the chosen qualitative data analysis. It is not a specific method or a strategy. Instead it differs from the qualitative version in what kind of data that is generated. The data generated is generally in a large volume and is highly statistical in nature[18]. By statistical it means that the data is susceptible for mathematical analysis. It is a well known data gathering method and works well when the data needed is short and simple. The

(16)

data is often generated with surveys that is being send out to a large amount of people. The questions can vary in complexity but in general they are simple, sometimes even with pre-made multiple-choice answers. This makes it easier to compile the answers but has the downside of not being able to represent all possible answers. Since there is limited possibility to explain the questions this method is dependent on the responder to have sufficient knowledge on the subject[19].

There are several reasons that this is not ideal for this paper. The development of a digital marketplace that is specific for real-time data is overall a new area. That means that there is a limited amount of people that has sufficient knowledge to answer questions without an explanation. Therefor it is not ideal to send out a survey to a large amount of people, since the recipients might not have enough knowledge to get credible answers. The research question in this paper is also more in-depth and relatively complex. Therefor it is not well suited to gather data about this subject in a quantitative way.

(17)

3 Results

Our case company is in the process of creating a digital marketplace for real-time data. This follows a natural evolution of regular Internet of Things that has slowly increased and moved in to people’s homes. This digital marketplace is supposed to provide services to a city which uses data gathered from connected sensors placed around the city. By doing this you can create what is called a “smart city”.

In this section of the paper we present our findings in what the challenges are when creating such a product. Most of the challenges are collected during the interviews and then validated through the literature review. We also present potential solutions to the challenges which is discussed and validated through the second phase interviews.

3.1 Findings and Challenges identified through Case Study

Since the process of the marketplace is in such an early state, there are a lot of question marks with regards to how it is suppose to look like and how to function. There are some clients to the interviewed company that handles the renting of houses and apartments that include a lot of smart home solutions. With these smart home solutions comes a lot of data that is passed through between the homes and the client, and it is this data that is interesting to look at to see what is possible to do with it.

One kind of data that is collected a lot from the homes is something that is called Individual measuring and charging (Individuell mätning och debetiering, IMD). This is simplified by the respondent as a directive from EU that says that you are only supposed to pay for the hot water and heating that you actually use, and not more than that. Which is easy to say but more difficult to implement. In Sweden there is a principle that states that if you live in two different apartments that are seemingly equivalent, then the rent of the two apartments should be the same. However, if one of the apartments is positioned to the north and the other one to the south, the cost and use of heating of these two apartments will differ. There is another company by the name of SABO (Sveriges Allmännyttiga Bostadsföretag) that wants to put out guidelines which handles this. For example an apartment facing north should have a factor lower rent than an equivalent apartment facing the south. In the end of all of this, the conclusion is that the money that SABO, the company that handles the guidelines to this, earns from this is too low and the cost of implementing and administering it too high. And this is why this datamass, that is just there sitting and is ready to be used is there.

The results from the first couple of interviews show that a problem that is very current and keep arising is the issue of privacy and the use of personal data. What the interviewed company is trying to do, is to collect different kinds of data that a user generates in their home for example. This data can then be sold for a profit on the marketplace. Additionally, you are suppose to be able to put up and market your own personal sensors or streaming data. The data that is collected from just living in your home is not necessarily collected by the person that lives there, but is instead collected by another company that has implemented a lot of smart home solutions. These solutions include for example measuring of electricity or hot water use.

(18)

3.1.1 Privacy

This data would probably be relatively uninteresting to an ordinary person. But several different companies could definitely make use of this kind of information. This data would then be sold as an anonymized version, but even then this could become problematic with regulations such as GDPR. With the measuring of electricity and hot water use data for example you could see when the typical person go to bed and when they wake up in the morning. This could potentially be too privacy invading, even though you would not be able to see the information about one particular person or home. One example that a respondent to the interview brought up was the issue of being able to see this data yourself in a detailed form (with a web interface). In the example brought up the respondent explained how seeing a device in use (for example the shower by looking at hot water use) or that has been used during a specific time when a person should not be home, could possibly breach a privacy barrier. The respondent then brought up that to solve this you could downsample the data source, from for example updates every thirty seconds to only see the total hot water use for a period of six hours or a day.

An anonymized version of the personal data acquired might not be enough to pass GDPR, and additional approval from the subjects may be needed. Another aspect of the marketplace and the data that is sold, is data that is not as personal. One example that is brought up is one of water levels in a city.

3.1.2 User usability

One important part of the marketplace is the issue of user usability and how the user interacts with the website. A respondent imagines a web interface where you can specify different search criteria, that could be either very strict, where you get few but more accurate results, or very a open search criteria where you get a much wider range of results, but they might not be as accurate. The respondent continues and brings up a scenario where a customer wants to purchase a specific dataset. Since the dataset is something that the customer is charged for it can be difficult to preview it without giving away all of the information in the dataset. To handle this the respondent thinks that they have to limit the search and preview functionality somewhat. The client get to look at a preview of the dataset that they are interested in using a web interface for the marketplace. One site that the respondent gets inspiration on how to handle this web interface on how to preview and visualize the dataset to be purchased is from a website and company called citinetworks[20]. This is a site to rent servers for company use. The site uses different sliders for RAM, CPU etc., with a price list that updates in real time while changing the specifications of the server with the sliders. This is one idea on what the preview of the datasets in the marketplace could function and look like. The user can make some test searches to see what kind of data that their looking for and if it matches their needs, regulate the size of the dataset and then lock in the purchase. Which in turn gives them an id for the their dataset, that they can then retrieve using an API.

One way of doing this preview is to use a “fake” dataset, or a dataset that is old and not in use anymore. To see that the dataset actually is legitimate, how good it is, if there are any empty spaces in the dataset, as well as the uptime for the streaming data, a rating system is thought to be used. This will allow other users to give feedback as well as see if the dataset might actually be worth purchasing or not.

(19)

3.1.3 Interoperability

The challenge when it comes to interoperability, is to connect multiples of different devices and computers. These all can use different protocols and ways to communicate with each other and the other devices that they are connected to. The challenge becomes to unite these and have them understand each other. This is important due to wanting and needing them all under one single marketplace. If a person buys data from multiple different sources and buyers, it would be prefered to have them all in the same format.

The respondents explain and compare a lot of the interoperability aspects of the marketplace with how Google and Android have built up their architecture. In Yggio there is an abstraction layer, or protocol translations. What this means it that the services that are connected to Yggio doesn’t necessarily need to know what type of protocol that device uses. If it is LoRa, Sigfox or NBIoT, it doesn’t matter, because it is formatted to abstract data. Once the information is in Yggio, it’s all in the same format, that can be used in the same way, thus solving a lot of the interoperability issues.

3.2 Findings and Challenges identified through Literature review

The literature review is a major part of this paper and is used in combination with interviews as the chosen method. Here we present the challenges and appropriate solutions for a digital marketplace that were discussed in the interviews. We can’t find a good example of a deployed digital marketplace for real-time data. However the challenges that we find aren’t unique to this kind of digital marketplace but can be found in regular digital marketplaces and other products. We consider it to be reasonable that a solution for the same challenge but in a different type of product could still be appropriate in a digital marketplace for real-time data. The result of the number of relevant articles from the searches are listed in the table below (Fig. 1). As well as the searches that has been made (Fig. 2).

Inclusion/exclusion criteria Items left Relevant db, keywords and publication year 540

Publication form 540

Abstract content 42

Duplicate removal 38

Full text content 8

(20)

Search word Search in Results

Abstract relevant

marketplace + development abstract 197 7

marketplace + development + challenges abstract 53 7

marketplace + "software development" abstract 14 8 marketplace + development + strategy abstract 26 2 development + "real time data" abstract 112 8 "e-marketplace" + development abstract 5 2

LPWAN abstract 9 3

challenges + development + "data stream" abstract 98 1

LoRa abstract 26 4

540 42

Fig. 2, The different combination of keywords used and the number of results

3.2.1 Trust

The main purpose of a digital marketplace is to sell products or services to its customers. But traditionally when a customer buys a product he/she would do so in person and exchange the currency for the product in full sight. That is something that can’t be done in a digital marketplace since it isn’t done in person. Therefore one of the first challenges for a digital marketplace that needs to be solved is trust, since the buyers need to interact with unknown sellers. Myoung-Soo Kim and Jae-Hyeon Ahn found out that trust in a marketplace essentially boils down to trust in the transaction[21]. If there is no trust in the transaction the customer will not use that marketplace. For there to be trust in the transaction there has to be trust in both the market-maker and the seller. The trust for these two part originates from several categories according to Kim and Ahn, eg. the website’s usability. If a company’s website is difficult to navigate or is poorly designed it might leave a unprofessional impression which in turn lowers, the trust for the marketplace. Another example is the market-maker’s reputation, a well known company that launches a new digital marketplace would be more trusted by the customer than a digital marketplace from an unknown company.

This has been prevalent throughout the history of the digital marketplace. There is also never a 100% certainty that the product marketed is the same product sold or if the there is a product at all. With more and more transactions being done online there are an increasing amount of people being subject for fraud[4].

Myoung-Soo Kim and Jae-Hyeon Ahn lists a few ways that a company can increase their trust from their customers[21]. It is important to understand that when it comes to a digital marketplace where private sellers operate there needs to be trust for both the marketplace and the seller separately. Trust for the marketplace can be improved through the website’s reputation, usefulness and security. A website’s reputation comes directly from the company’s reputation and can be improved by advertising. It’s usefulness is improved by good design and user usability. The website’s security can also be improved by obtaining a security certificate.

(21)

Information overload is an issue that is preventing good usability and is solved can generate trust from the user. This is when undesirable or irrelevant information is shown to the user, which often results in a confused or irritated user. This could be somewhat fixed with the use of information filtering tools, but these are limited in that they are not personalized to a specific user. Recommender systems could be used to overcome this issue. That is the argumentation that Yan Zheng Wei, Luc Moreau and Nicholas R. Jennings gives in their paper about recommender systems[22]. This requires the system to gather some information about the user which needs to be approved by the user, given the importance of privacy which have increased in recent times. Specifically in the EU, with the new legislation of GDPR.

The trust for the sellers could be a lot trickier for the company to influence, however they could promote those sellers that has proved to be trustworthy. First of all there needs to be sufficient information about the product that is communicated to the buyer. That includes what the data is, how often data is generated and what the buyer is allowed to do with the data. That last example is something that the company should enforce because of the new regulation in Europe, GDPR. A good example of how the company can promote trustworthy sellers is through a rating-system where buyers can rate their purchase. It might also be a good idea to reward sellers with a high rating, for instance with a smaller fee.

The trust between the two parties are not one directional, even if the most important part lies in the buyer trusting the seller. There must be some trust in the buyer, trust that the data is not going to be misused and not resold. For a physical product reselling is not an issue and is perfectly legal, the difference is that a digital product can be copied. Because of this the market-maker needs to be able to prevent this, otherwise the seller might not use their platform. There are two ways this can be achieved: you can prevent the data from being copied. This means that the user can sell it but then they lose their only copy. The second way is to bind the data to a specific machine. A suggested solution from Hongxia Jin and Vladimir Zbarsky hinders illegal reselling and enables the content to be stored encrypted[23]. How this works is to store the content encrypted together with the encryption key, which is also stored encrypted. To prevent reselling the title key is bound to a specific machine.

3.2.2 Internet of Things Platform Requirements

There are many challenges with building an IoT system. One way to handle the complexity is to create a middleware platform that fulfills usual non-functional requirements[24]. Flavia C. Delicato along with other researchers writes that somes of these requirements include:

1. _{Scale. The system needs to be able to connect to a large amount of devices and to be able to} increase that number fast if the demand suddenly requires it.

2. _{Heterogeneity. The devices that connect to the system is often heterogeneous and the system} needs to handle the interoperability between these devices and the system.

3. _{Uncertainties. The system needs to handle the level of uncertainty that exist in an IoT} environment. The uncertainty comes from the highly dynamic nature of the communication infrastructure in a IoI environment.

4. _{Conflict resolution. The system needs to handle conflicts, for example when different} application tries to control the same device.

(22)

Riccardo Petrolo, Valeria Loscrì and Nathalie Mitton discuss in their paper that when it comes to smart cities we are moving from a Internet of Things to a Cloud of Things platform. What CoT means is that IoT is integrated with cloud computing[25]. This is needed because of the sheer amount of data that the network of sensors in a smart city is generating. In their paper they propose several requirements for a smart city, which they split in two categories: _{citizen-centric} and _operational.

Citizen-centric puts the end-user in focus and requires the network to be a) _{user-centric and} conform to the users preferences. It also needs to be b) _{ubiquitous and reachable from} everywhere from any device. Lastly it needs to be c) _{highly-integrated as in based on the social} cooperation of multiple users. _{Operational focus more on the administration part and requires the} network to be a) _{highly-interconnected in that it needs to be able to connect a lot of} heterogeneous devices. The network needs to be b) _{cost-efficient in that it needs to be as} automatic as possible, both in deployment and organisation. It also needs to be c) _{energy-efficient} to meet the requirements of _{green applications and d) reliable in that the connectivity in the} network should be guaranteed.

3.2.3 Interoperability

The main difference between a regular digital marketplace and one for real-time data is the product that is sold. A digital marketplace for real-time data sells data that is generated from a large amount of sensors. These sensors can often use different protocols and sends their data in different formats. But it is very important that the buyer can get the data in one specific format, even when the data comes from different sensors. For that reason the marketplace needs to have some sort of abstraction to handle the different protocols, it needs to handle interoperability. Sofoklis Kyriazakos along with several other researchers touches this issue when they presents their suggested solution Things as a Service platform, BETaaS which stands for Building the Environment for Things as a Service[26]. One of its main feature is that it consists of a specific layer to adapt to different technologies and protocols. It is also based on a framework that allows plug-ins to be added dynamically. One of the issues that BETaaS is meant to solve is to provide a general reference model that could be used by different applications. Even though BETaaS isn’t made specifically for a digital marketplace it could still prove useful in our case since it is meant to handle different application with different protocols.

3.2.4 Privacy

Privacy is something that has been, and continues to be, very relevant in the IT environment. Privacy can be described as “the right to be let alone”[27], but in the world of Facebook, Google and other large IT companies that is something that is not always prioritized. Personal information have proven to be very profitable, mainly as a way to generate personalized advertising. In Sweden personuppgiftslagen (PUL) has been the law that regulates how personal information can be handled. However that law is limited in that it doesn’t specifically include information on the internet and only how data is handled. Which means that it has been hard for the government to regulate which kind of data that IT companies gather from its users. This is something that is about to change, at least for the citizens within the EU and EEA. The 25th of May 2018 a new law is enforced which is called General Data Protection Regulation (GDPR).

(23)

There are several differences between PUL and GDPR[28]. First of all it covers a lot more areas where personal information has been handled, e.g. on websites and in email. Earlier it was also possible for companies to get around the law by operating outside of the EU. This will no longer be possible, if the targeted audience is citizens of the EU then GDPR applies. For example if a digital marketplace has the option to show its prices in euro then that means that the targeted audience is citizens of the EU. The citizens will also get a lot more power over their data. A person will have the right to have its data deleted, with a few exception. GDPR also specifies that the user agreement needs to be simpler and more clear on what information that will be gathered and why.

This shift from PUL to GDPR will force a digital marketplace to adapt how it operates. The market-maker has a responsibility to ensure that the data that is gathered is safe. If there appears to be a risk to the data then the market-maker needs to ensure the security, e.g. by encrypting the data. The user agreement, that for a long time often has been very long and incomprehensible, has to be made a lot easier and clearer on what information that will be gathered. Also what rights the user has. It is also important that a contract is signed between buyer and seller for each transaction. The contract should state what the buyer is allowed to do with the data and that the company doesn’t have the responsibility for how the buyer uses the data. A lot of digital marketplaces applies automated directed advertising. But with GDPR that needs to be approved by the user. Because of it the digital marketplace needs to have two separate systems for advertising, one that handles directed that is used if the user approves it. If the user doesn’t agree then a system that handles regular advertising can be used. A smaller thing that also is included in GDPR is the right to delete your data. That means that the user at any time can require the company to delete all data that has been collected from that user. This could include all contact information and transaction history. Of course this can’t be done in some cases, for example a user’s invoice that hasn’t been payed can’t be deleted or if the information is used in a legal case.

IoT networks deals with huge amount of data and often that data could be sensitive for individuals. Because of this privacy is a big issue and needs to be addressed[29]. The large amount of devices with different protocols complicates things, and the devices needs to have low energy consumption which limits the ability for demanding cryptographical operations. For anonymization in a WAN there are two parts. First the owner must be anonymous, which is straightforward. Secondly, the location of the device needs to be hidden. This is more problematic, especially in a LoRaWAN since that would require a change in the protocoll. In a PAN privacy could be ensured by the public gateways, but since these gateways are managed by a third party they might not be trusted.

Since some of the data in a IoT network is time-sensitive there might not be the possibility to use cloud computing. An alternative is to use fog computing, which could be described as a mini-cloud on the edge of the network[30]. Abduljaleel Al-Hasnawi and Leszek Lilien proposes PEFM, Policy Enforcement Fog Module runs on single fog and enforces privacy for all sensitive data from the sources that is connected to itself. The key feature of this solution is of course that it uses fog computing. PEFM produces two different data streams in two different components: local and remote. LPEM, Local Policy Enforcement Module enforces privacy directly on local

(24)

IoT applications. RPEM, Remote Policy Enforcement Module that enforces privacy indirectly on remote IoT applications. LPEM both sets up and executes policy enforcement on the data in the fog node. RPEM only sets up policy enforcements and then either enforced via the execution on the local node or sealed within Active Data bundle. It would then be executed by non real time applications.

(25)

4 Analysis

Many related things and problems said in the interviews can be found in the literature review, thus answering and validating some of the problems and challenges stated in the interviews as possible solutions to the challenges posed when developing a marketplace for real-time data.

One of the things mentioned in the literature review is about trust. The trust that a seller or marketplace is needed to have to make a sale and have the buyer feel comfortable with the purchase or even make to purchase to begin with. The trust can be split up into two parts, one for the trust of the company that handles the marketplace, and the other part is the trust of the seller that sells their item (or data) on the marketplace. The trust of the company in general is something that is done in a more overall manner from the company, to get a reputation of being a good and trustworthy company.

Another way to build trust for the company is to have good user design and user usability, which is something that the company in question are planning to focus a lot on. One way that they are planning on doing this is for example using sliders and change the pricing dynamically while visualising the costs to easier grasp what the end cost will be and what exactly you are paying for.

Additionally security is something that is very important to gain the trust from a buyer. And this is planned to be done by using trustworthy certificates, that for instance encrypts the data in transition. It is also very important to keep the data encrypted in transition as well while stored on the internal platform. At the moment, the company mentions that the data inside the internal Yggio platform is in plain text, which could very well be deemed a security issue. If the data is decided to be stored this way, then this should perhaps be informed to the customer. Depending on what kind of information that is stored in plain text, this could clash with the GDPR regulation. According to article 35 in GDPR a consequence assessment is to be done if the person responsible for the personal information is planning to follow through with an operation that could potentially include big risks[28][31]. That’s why it would be very important to keep this transparent and be clear that this is the way the data is handled.

Trustworthy sellers is indeed something that is incredibly important when creating a marketplace that allows different users to sell their data to others, and developing this trust can be a very difficult task to accomplish. The company needs a way to regulate who can sell the data and remove those that might abuse or misusing the system. One way that the company are planning to handle this is by implementing a rating system for each dataset. Buyers can rate the dataset with for example 1 to 5 stars. What could also be done, is to rate the seller in a similar way, making them more trustworthy. Additionally to a rating system, the company mentions comments as a way of rating and giving feedback to other customers regarding the dataset. The sellers that are performing well could then be part of some kind of reward system, to motivate people to be good sellers. One example is to give out a smaller fee for using the marketplace (assuming that one is there in the first place). Another one could be to promote the good performing and trustworthy sellers on the marketplace, which could in turn benefit the marketplace as well.

(26)

In both the literature review and in the interviews the issue of interoperability is brought up. The main thing that both of these mention is the importance of uniting the different devices, and having them speak with each other with the same language and format. This is why it is important to have some kind of an abstraction layer that receives the data, that can be in different formats, and returns it in a standardized format for the system. The company in question of this research paper solves this abstraction layer inside their connectivity platform. What this means is that it (for the most part) doesn’t matter what protocol the devices that connect to the marketplace uses, since these will pass through the abstraction layer and follow a standardized format.

In the interviews privacy is mentioned, more exactly how GDPR will be handled is discussed. There is an understanding that GDPR will be a lot tougher on personal information, but it isn’t known how it will affect the major IT companies or the IT community in general. First when there is a completed lawsuit and a set fine will it be known how costly it will be to break the new law. It is clear that it has the potential to be extremely costly, with a set maxfine on 4% of the company’s revenue. For Google, who collect a lot of information and might break GDPR, that would result in a 4.4 billion dollar fine. Therefore it might be best to try and follow GDPR.

After closer inspection of the GDPR document it seems like there are a few but important changes from PUL, which is the current law in Sweden. Mainly it now covers more areas such as email and websites. It also applies to all companies that targets EU citizens even if it operates outside of EU. It seems like the best action to follow GDPR is to inform and ask the users for permission before personal information is gathered. There is some discussion in the interviews that the data needs to be anonymized following GDPR. However this might not be enough since it is sometimes possible to pinpoint the individual from anonymized data depending on the context.