• No results found

Improving Back-End Service Data Collection

N/A
N/A
Protected

Academic year: 2021

Share "Improving Back-End Service Data Collection"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

IN

DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS

STOCKHOLM SWEDEN 2017,

Improving Back-End Service Data Collection

ISABEL GHOURCHIAN CHARLOTTA SPIK

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY

(2)

1

Improving back-end service data collection

ISABEL GHOURCHIAN CHARLOTTA SPIK

isabelgh@kth.se spik@kth.se

Bachelor thesis, 15hp

Supervisor: Johan Montelius

Examiner: Thomas Sjöland

(3)

i

Abstract

This project was done for a company called Anchr that develops a location based mobile application for listing nearby hangouts in a specified area. For this, they integrate a number of services which they send requests to in order to see if there are any nearby locations listed for these services. One of these services is Meetup, which is an application where users can create social events and gatherings.

The problem this project aims to solve is that a large number of requests are sent to Meetup’s service in order to get information about the events, so that they then can be displayed in the application. This is a problem since only a limited number of requests can be sent within a specified time period before the service is locked. This means that Meetup’s service cannot be integrated into the application as it is now implemented, as the feature will become useless if no requests can be sent. The purpose of this project is therefore to find an alternative way of collecting the events from the service without it locking. This would enable the service to be integrated into the application.

The hypothesis is that instead of using the current method of sending requests to get events, implement a listener that listens for incoming events from Meetup’s stream, to directly get updates whenever an event is created or updated.

The result of the project is that there now exists a system which listens for events instead of repeatedly sending requests. The issue with the locking of the service does not exist anymore since no requests are sent to Meetup’s service.

Keywords

Streaming, API, Java, caching, mobile application

(4)

ii

Referat

Detta projekt genomfördes för ett företag som heter Anchr som utvecklar en platsbaserad mobilapplikation för att lista närliggande sociala platser inom ett specificerat område. För detta integrerade de ett antal tjänster som de skickar förfrågningar till för att se om det finns några närliggande platser listade för dessa tjänster. En av dessa tjänster är Meetup som är en applikation där användare kan skapa sociala evenemang.

Problemet detta examensarbete syftar till att lösa är att ett stort antal förfrågningar skickas till Meetups tjänst för att få information om evenemangen så att de kan visas i applikationen.

Detta är ett problem då endast ett begränsat antal förfrågningar kan skickas till deras tjänst inom ett visst tidsintervall innan tjänsten spärras. Detta betyder att Meetups tjänst inte kan integreras in i applikationen såsom den är implementerad i nuläget, eftersom funktionen kommer bli oanvändbar om inga förfrågningar kan skickas. Syftet med detta projekt är därför att hitta ett alternativt sätt att samla in evenemang från tjänsten utan att den spärras.

Detta skulle göra så tjänsten kan integreras in i applikationen.

Hypotesen är att istället för att använda den nuvarande metoden som går ut på att skicka förfrågningar för att få nya händelser, implementera en lyssnare som lyssnar efter inkommande händelser från Meetups stream, för att direkt få uppdateringar när ett evenemang skapas eller uppdateras.

Resultatet av detta är att det nu finns ett system som lyssnar efter evenemang istället för att upprepningsvis skicka förfrågningar. Problemet med låsningen av tjänsten existerar inte längre då inga förfrågningar skickas till Meetup’s tjänst.

Nyckelord

Streaming, API, Java, cachning, mobilapplikation

(5)

iii

Contents

Introduction ... 1

1.1. Background ... 1

1.2. Problem ... 2

1.3. Purpose ... 2

1.4. Goal ... 2

1.5. Method ...3

1.6. Delimitations ... 4

1.7. Outline ... 4

Background ... 5

2.1. Difference between REST API and streaming API ... 5

2.2. Streaming... 6

2.3. Previous work in the area ... 6

Method ... 9

3.1. Presentation of different methods ... 9

3.2. Methods used in this project ... 13

The implemented system ... 16

4.1. Software development tools ... 18

4.2. JavaScript Object Notation (JSON) ... 18

4.3. Dropwizard ... 19

4.4. Retrofit, Okio and okHttp ... 20

4.5. RxJava ... 20

Result ... 21

Discussion and conclusion ... 24

6.1. Sources of error ... 24

6.2. Choice of methods ... 25

6.3. Future work ... 25

References ... 27

(6)

iv

Terminology

API Set of data and functions for interaction between computer programs Binary search A search algorithm for finding a target value within a sorted list or array Dropwizard Java framework for web service development

JSON Human friendly data-interchange format

Maven Software project management comprehension tool

OkHTTP Used for sending network requests

Okio Used for buffering data

REST Architecture style used for modern web services REST API Listens and responds to requests from client

Retrofit HTTP client for Java

RxJava API for asynchronous programming with observable streams

Streaming Making data available as soon as client needs it Streaming API Persistent connection with client

(7)

v

(8)

1

Chapter 1

Introduction

In this section the project is described briefly. This includes a description of the background, purpose, goal and method, to give a short explanation of why and how the project was carried out. Furthermore, delimitations as well as an outline of the report are presented in this section.

1.1. Background

The work was done for a startup company called Anchr. Anchr is a small Swedish company run by only three people. The company’s business concept is developing a location-based mobile application. The idea behind the application is to show all so called “hangouts” within a specified radius of the user's current position. These hangouts could for example be cafés, restaurants, bars and monuments. The application should be user-friendly and facilitate the everyday life for people by making it easier for the users to find locations of interest, without having to perform extensive online searches. Information about these places will be displayed as well as opening hours, reviews and description. This information is collected from a number of different APIs, for example Yelp and Wikipedia. With this application a user can quickly see all different hangouts that are nearby by just starting the application.

The user can also search for, share and save events as well as build a network of user connections and chat with other users.

One API that the company wants to integrate into their application is Meetup. Meetup is a service that allows people from all over the world to arrange events, so called “meetups”, that anyone can join. These events could be anything from restaurant visits to programming events[1]. The service allows people to meet and socialize with others with similar interests.

Meetup provides a REST API and a streaming API that can be used to get information from their server (see section 2.1. for an explanation of the difference between these two kinds of APIs). There are different API methods for different types of information a user of the API could be interested in receiving. For example, if the user wants the events themselves, they

(9)

CHAPTER 1: INTRODUCTION

2

would use the “events” API method, but if the user is only interested in getting photos from Meetup, they would use the “photos” API method. The API method used for this project is

“events”. By integrating Meetup’s API with the company’s application all Meetup events near the user’s location is displayed in the application, so that users can easily find and search for Meetup events.

1.2. Problem

To get the data from Meetup, Anchr has implemented a client that sends requests to Meetup’s REST API. These requests contain the location of the user, a radius and possibly a search term to query the API and see if there are any meetups within the specified radius of the user’s location that possibly matches the search term if a term is provided. Meetup’s server then sends a response to the client with meetups matching the query, if any exists.

The problem with collecting data this way is that only a limited number of requests can be sent within a certain amount of time, as Meetup has specified a limit in order to not overload their server. If this limit is exceeded within the specified time period the API key is locked, which means that the API can no longer be used, as an API key is needed in order to use the API, and no more requests can be sent. The REST API thus have a limit for how many requests it can handle. This is problematic as the user thereby would possibly miss nearby meetup events as only a few of them can be displayed, and no new ones would appear if the user changes its position. Therefore the current client code for Meetup’s API cannot be integrated into the application, as it does not provide a stable and working system. The question this project aims to answer is therefore: is there an alternative to collect data without the limitations of the REST API?

1.3. Purpose

The purpose of this report is to explain how a system can be developed to gather data from Meetup’s API without having the problem described in section 1.2. Furthermore, the research methods used to do this is presented and discussed.

The purpose of the project is to solve the problem presented above (see section 1.2) so that a working implementation of the application with the Meetup API can be implemented.

1.4. Goal

The goal of this project is to develop a system that is able to receive events from Meetup’s API and prevent the API key from locking itself and stop providing events. The system should be able to be integrated into the existing application and work in a way so the users quickly get access to nearby events from their current position.

The system should be a microservice that collects the desired data from Meetup’s service without the locking problem. The hypothesis is that this can be achieved by, instead of sending repeated requests to the server’s API, implement a listener that listens for incoming updates from the server’s streaming API. If this succeeds, the number of requests sent will be reduced to zero, as there would no longer be a need to send requests to the REST API to get information about the events, and the problem will no longer remain.

The goal is also to save the data that is received from the streaming API in a database. This is necessary in order to collect the information in an easily accessible space. The stream continues to provide events indefinitely every time an event is created or updated. These

(10)

CHAPTER 1: INTRODUCTION

3

events must be saved somewhere because if they were displayed directly in the application when they were received they would disappear every time the user restarted the application.

Furthermore, the system developed must run for a couple of months before it can be released with the application. This is because if the application was to be released as soon as the system was implemented, no events created before the system started running would be displayed in the application, which would lead to the user missing a number of events. For example, if a user created an event in May that takes place in July, and the system started running in June, the event would not appear in the application, as it was created before the application started running and therefore would not have been saved in the database. If however the program is allowed to run for a couple of months to gather events, the events created before the system was implemented would have passed, and only the upcoming events would be displayed. This should result in the database containing all upcoming events from the server, which would allow the microservice that has been developed to be able to send requests to the database instead of the server’s API. The hypothesis is that after a few months the database should be in synch with the information the server site has.

The goal is also that the system should be able to handle the number of requests sent by the server. There is no guarantee that the server will send evenly distributed events, so our system must be able to handle several events per second as well as irregular bursts of events.

1.4.1. Benefits, ethics and sustainability

The ethical aspects of this project could be a violation of people's privacy, as the application has access to the user’s location. This information is sent via a TCP connection to the system.

It is therefore important that the connection is secure so that it cannot be hacked. Otherwise anyone could get access to the locations of the application users, which is a violation of people’s privacy.

This project may contribute to a more sustainable environment in a positive way. Since the resulting application is meant to display different nearby events it is not necessary for the user to travel long distances. If a user is in a particular place and is looking for a social event or hangout and are not familiar with the surrounding area, it is likely that they will use some sort of travel method to reach a place they are already familiar to. This way of traveling long distances can affect the environment negatively since the emissions from vehicles such as train, subways, cars and other means of travel that people may choose are polluting the air.

Instead of traveling in that manner a user can now with the finished application pick up their phone and quickly get access to all the nearby events within the area they are currently standing. Hopefully the presented events are within walking distance or only a limited amount of transportation is needed. Because of this, it is likely that the developed application will benefit the environment by reducing travel time for the users.

Furthermore, people in their everyday life would benefit from this system when they use the application since it helps them to reduce their travelling time and cost, and in addition makes it easier for them to find places without having to search online for information.

1.5. Method

For this project the engineering design process was used to develop the system. This method involves defining the problem, perform background research and create the solution on this basis before the product is developed and tested. This method is therefore about working from an identified problem, either generally in society or more specifically for a company or customer, and from this develop an idea to solve the problem. This method is specifically formed for engineers for developing new products and is the most common method used by

(11)

CHAPTER 1: INTRODUCTION

4

engineers. For a more detailed description of this method and the steps involved, as well as presentations for the different versions of the engineering design process, see section 3.

Various other methods were evaluated for this project before choosing what method was the most appropriate to use. For a description of these methods see section 4. For a comparison between the different methods and a discussion about how the most appropriate for this project was chosen, see section 6.

1.6. Delimitations

In this project only one solution, that is, listening to a streaming API, is considered to collecting the desired data from Meetup without the locking of the API key. This is not a feasible method if a service does not provide a streaming API. Therefore, this work is only relevant in situations where a streaming API is available. It is consequently likely that there exist other methods which might reduce the number of requests sent to the API and prevent the locking that are not considered in this report. The reason for this delimitation is mainly due to time limitations for the project.

Furthermore, only one streaming API is investigated in this project, which means that all conclusions drawn will be based solely on the results from this streaming API. However, the purpose with streaming APIs is generally the same, which means that the conclusions drawn can be expected to be relevant for other streaming APIs as well.

1.7. Outline

Chapter 2 explains the concepts needed to understand the built system, as well as giving a description of previous work in the same area. Chapter 3 presents different existing methods considered for this project and the method chosen to be used for the project. Chapter 4 presents the system architecture and the different steps included to build the system. Chapter 5 describes the results of the project and finally chapter 6 discusses error sources, choice of methods and future work.

(12)

5

Chapter 2

Background

In this section the background for the project is presented. This includes a description of the different aspects the reader might need information about in order to understand the project, such as a description of the difference between REST API and streaming API, a description of how streaming works, and a description of TCP, as well as a presentation of previous similar works done in this area.

2.1. Difference between REST API and streaming API

An application program interface (API) is a set of data and functions for allowing interaction between computer programs and exchange of information between them. API:s are used by client programs to communicate with web services. [2]

The representational state transfer (REST) is an architecture style that is commonly used for modern web services. It is built on the following constraints[2]:

1. Identification of resources: An identifier for a Web-based concept, for example URI.

2. Manipulation of resources through representations: Allows a resource to be represented in different formats without changing its identifier.

3. Self-descriptive messages: Metadata for additional info of the resource.

4. Hypermedia as the engine of application state: Links to related resources.

A Web API listens and responds to requests that comes from a client. If the API uses a REST architecture style it is called a REST API. A REST API makes a Web-service “RESTful” [2].

Unlike a REST API, a streaming API has a “persistent connection”, which means that it keeps the request open indefinitely, that is, it never closes the connection. There are a few steps involved when using a streaming API[3]:

1. The client makes an initial request.

(13)

CHAPTER 2: BACKGROUND

6

2. The server defers the request until an update is available or a timeout has occurred.

3. The server sends an update to the client when it becomes available.

4. The connection is not terminated by the sent update and the server returns to step three.

This way of requesting a streaming API is based on the capability of the server to send data on the same response without terminating the requests. It reduces network latency since the client and the server do not need to close and open the connection repeatedly.[3]

2.2. Streaming

Streaming is about making data available as soon as a client application needs it. This is useful when a user needs immediate access to data that is constantly updated. If delays in data transferring is crucial and the data needs to be analyzed as soon as it becomes available, streaming is the way to go.[4]

Streaming should be separated from downloading. Consider the following example to illustrate the difference: Imagine you want a drink of water. If you fill up a glass of water and then drink it, it can be viewed as downloading. If you instead drink the water directly from the bottle, it can be viewed as streaming[5]. The difference between these two scenarios is that in the first case you save your water in a temporary location before consuming it. In the second case you drink the water directly from the source.

A streaming data system is a system that delivers data when the client requests it immediately. These kinds of streaming systems are in-the-moment systems.[4]

The streaming technology is capable of transmitting data such as video and audio events in real-time over the internet, when they are happening. The main features of streaming are the following[5]:

● Deliver live content, for example a football match, concert or political speech at the same time it happens.

● Provides random access to movies. The streaming server can act as a remote video player and perform functions like: skip back and forth, enable watching a portion of media production.

● It occupies no space on the user’s hard disk. The user saves the URL of the stream instead of the actual media.

● Uses no extra bandwidth other than what is needed.

● Allows streaming for tracks.

● Can use broadcast and multicast approaches, a stream can be sent to several users.

A streaming server has the requirement that the data must be delivered in real time as soon as it becomes available, although some level of transmission errors can be tolerated, unlike regular web servers which only download data and does not have the capability of streaming it. The streaming servers uses excessive bandwidth to buffer ahead the data faster than real- time. When packets are lost, the server retransmits only the lost packets and thereby reduces network traffic.[5]

2.3. Previous work in the area

In a previous study by Bifet, Holmes and Pfahringer a streaming API is examined, detecting changes and frequency of tweets. This API is from Twitter which is a microblogging service

(14)

CHAPTER 2: BACKGROUND

7

used for users to post messages known as tweets. The tweets are often short and constantly generated. The Twitter streaming API provides real time access to all the tweets in a filtered form. They also include replies and mentions that are created by public accounts. The API requires a valid Twitter account and uses basic HTTP authentication, the generated data can then be retrieved in a JSON format.[6]

Bifet, Holmes and Pfahringer worked with a method called the MOA-TweetReader that reads tweets in real time by using the Twitter streaming API. It also detects changes and finds terms whose frequency changes. Figure 1 displays the architecture of the method.[6]

Figure 1:Architecture of the MOA-TweetReader Model

MOA-TweetReader takes the tweets as input which are then converted to machine learning instance. Standard machine learning methods are used by the TweetReader. There is an item for storing the frequency of the most frequent terms as well as a change detector which changes the frequencies of the items. The tweets generated are a list of words that can be transformed by the adapted Twitter filter, retrieving the most relevant features. This filter is based on a space saving algorithm that has the best performance results. The algorithm works in the following manner: every time an item that has been monitored before arrives its count is incremented by one. To detect changes the authors used ADWIN as a change detector which keeps a variable length window of recently seen items. If an older fragment value differs from the rest of the window it is dropped.[6]

The streaming data is useful when one wants to discover different moments that are happening all over the world at any given time. The system the authors introduced as the MOA-Tweetreader for data streams works well when the tweets that are generated from the Twitter Streaming API delivers a large quantity in real time.[6]

Furthermore, Bommaiah, Guo, Hofmann and Paul wrote an article where they presented a design and implementation of a caching system for streaming media. They describe their core of the streaming cache as helpers. These helpers caches proxies inside a network, each client is associated with one helper and they provide servicing requests for streaming objects by sharing common resources. If a client wants to get hold of a streaming object it sends a request to the server, which is then redirected to the client’s helper.[7]

The caching system consist of helpers. Each of these helpers have a limited amount of disk, memory, network and computational resources. When it receives a request it must decide how they should be handled with these limited resources. If a streaming request arrives at a cache it could be a partial hit since parts of the object could be stored in the cache and other parts somewhere else.[7]

(15)

CHAPTER 2: BACKGROUND

8

The main modules of a helper consist of a RTSP client and server which receives and processes requests from the server, a buffer manager for the available memory, a cache manager for the disk space allocated for caching, and finally a scheduler that manages the global queue of events. The helper helps to improve the perceived quality of multimedia streams with this kind of caching.[7]

(16)

9

Chapter 3

Method

In this section different methods considered in this project are presented, as well as the methods that were ultimately used. The methods considered are mostly focused around engineering and technology, as these were considered more in line with the project, but some more general methods were considered as well. This is by no means a complete presentation of all available development methods.

3.1. Presentation of different methods

In this section a presentation of the different methods considered for this project is presented. This includes the following methods:

● Deductive method

● Inductive method

● Hypothetico-deductive method

● Quantitative method

● Qualitative method

● The design science research methodology process

A discussion about the different aspects of the methods and why there were not chosen for this project is presented in section 6.2.

3.1.1. Inductive and deductive methods

A number of methods have been researched and considered for this project. For example, Blomkvist and Hallin writes in the book “Metoder för teknologer” (methods for technologists)[8] about deductive methods contra inductive methods. A deductive method involves initial research in the area as a way to form theories that are then tested with an empirical study. An inductive method involves in contrast making an empirical study based on the identified problem and from this draw conclusions around and understand the results.[8]

In “The qualitative content analysis process”[9] the authors write that whether an inductive or a deductive method is used depends on the purpose of the study. They recommend that

(17)

CHAPTER 3: METHOD

10

an inductive method is used when there are no previous studies dealing with the phenomenon or when knowledge is fragmented. The inductive method is described as moving from the specific to the general, as particular instances are observed and then combined into a general statement. A deductive method however is recommended when the aim is to test an earlier theory in a different situation or to compare categories at different time periods. In contrast to the inductive method, the deductive method is described as moving from the general to the specific, as it is based on an earlier theory or a model.

According to the same source, both the deductive and inductive method consists of three main phases: preparation, organizing and reporting.[9]

3.1.2. Hypothetico-deductive method

According to Andersson and Ekholm, the hypothetico-deductive method is most often used within scientific and technological research areas. This method involves an initial theory or idea as a starting point for the research that is then followed by experimentation and finally evaluation of the result.[10] The following three criteria need to be met for the method to be scientific[10]:

1. Objective, that is, it gives basically the same result independent of who performs the research.

2. Controllable, that is, the method can be controlled with alternative methods.

3. Theoretically rooted, that is, there are hypotheses or theories that can explain how the method works.

According to the author of “Hermeneutics and the hypothetico-deductive method”[11], the hypothetico-deductive method is about formulating a hypothesis and deducing consequences from it to arrive at conclusions. These conclusion are well supported through the way their deductive consequences fit in with well-supported beliefs. [11]

The scientific method follows a number of steps. The list below is the steps for the scientific method adapted for technological research[11]:

1. How can the problem formulation be solved?

2. How can a product be developed to solve this problem effectively?

3. What information is available and required to develop the product?

4. Develop the product from the information from step 3. If the product is shown to be complete, continue to step 6.

5. Try again with a new product

6. Create a model/simulation of the suggested product.

7. What are the consequences of the model/simulation?

8. Test the application of the model/simulation. If the outcome is not satisfactory, continue to step 9, otherwise skip to step 10.

9. Identify and correct shortcomings in the model/simulation.

10. Evaluate the result relative to existing knowledge and practice, and identify new problem areas for future research.

3.1.3. Quantitative and qualitative method

The terms “quantitative” and “qualitative” refer to the type of data generated in the research process. The main difference is that quantitative research produces data in form of numbers while qualitative research produces data in the form of text or prose. A qualitative method

(18)

CHAPTER 3: METHOD

11

can for example be using surveys to gather information about opinions from different target groups, while a quantitative method can be about measuring some numerical value. In order to produce different results the research methods are typically different. Qualitative research is mostly linked with non-economic social science disciplines, while quantitative research is strongly associated with economics and natural science learning.[12] Table 1 summarizes the difference between qualitative and quantitative methods.

Table 1:Comparison Between Qualitative and Quantitative Methods[12]

Data collected through quantitative methods are often believed to give more objective information as they were collected using standardized methods and because they can be replicated. Because of this qualitative research is considered most suitable for formative evaluations, whereas quantitative research can be used for summative evaluations as they require quantitative measures to judge the ultimate value of the project.[13]

3.1.4. The design science research methodology process

Design science research focuses on the development of artifacts with the intention of improving the functional performance of the artifact. Design science research is typically applied to different aspects of engineering and computer science, such as algorithms and human/computer interfaces.[14]

A typical design science research method proceeds as follows[14]:

1. Awareness of problem: the output of this phase is a proposal for a new research effort.

2. Suggestion: This phase is described in “Design Science Research in Information Systems” as: “Suggestion is essentially a creative step wherein new functionality is envisioned based on a novel configuration of either existing or new and existing elements”[14].

3. Development: in this step the artifact is developed and implemented. The techniques for how this is done depends on what artifact is developed.

4. Evaluation: in this step the artifact is evaluated according to criteria specified in the proposal developed in the first step. Deviations from the hypothesis is noted, analyzed and tentatively explained.

5. Conclusion: the results are consolidated and the knowledge gained is categorized as either “firm”, which means that facts have been learned and can be repeatedly applied or behavior that can be repeatedly invoked, or as “loose ends”, which is anomalous behavior that defies explanation and may very well serve as the subject of further research.

Figure 2 shows the research process model for the design science research methodology.

(19)

CHAPTER 3: METHOD

12

Figure 2: Design Science Research Process Model[14]

In ”A Design Science Research Methodology for Information Systems Research” The design science research process is described as:

[...] a rigorous process to design artifacts to solve observed problems, to make research contributions, to evaluate the designs, and to communicate the result to appropriate audiences. Such artifacts may include constructs, models, methods, and instantiations. [15]

Hevner describes design science as a problem solving process and writes that the fundamental principle of design science research is that “knowledge and understanding of a design problem and its solution are acquired in the building and application of an artifact”[16]. The purpose of the seven guidelines is to assist researcher to understand the requirements for effective design-science research. Hevner writes that each of the seven guidelines should be addressed in some manner for design science research to be complete.

Hevner shows the guidelines and summarizes the descriptions of them in the table below (see table 2).[16]

(20)

CHAPTER 3: METHOD

13

Table 2: Design Science Research Guidelines[16]

3.2. Methods used in this project

In this work a deductive method has been used. This is because a literature study was performed initially and a hypothesis (see section 3.1.1.) was formed based on the information gathered. After this, a system was implemented (see section 4) to test and verify the hypothesis. From the results conclusions were drawn and the question was answered.

The method used to develop the system was also quantitative, as it was about gathering data from a system and it gave a numerical result, as opposed to a qualitative method that produces data in the form of text or prose. The method used was also experimental, as several experiments were performed to decrease the number of requests sent, which further points to it being a quantitative method (see section 3.1.3.).

Furthermore, the engineering design process has been used to design and develop the product. This method is aimed to identify the need and from this create solutions and develop a product. A formal definition of engineering design is found in the curriculum guidelines of the Accreditation Board for Engineering and Technology (ABET). ABET states that:

Engineering design is the process of devising a system, component, or process to meet desired needs. It is a decision-making process (often iterative), in which the basic sciences, mathematics, and the engineering sciences are applied to convert resources optimally to meet these stated needs.[17]

Tayal describes the engineering design process as a “formulation of a plan or scheme to assist an engineer in creating a product”[18]. It is further described as an often iterative decision

(21)

CHAPTER 3: METHOD

14

making process to meet desired needs. Below are a series of steps that engineers follow to solve problems[18]:

1. Define the problem 2. Do background research 3. Specify requirements 4. Create alternative solutions 5. Choose best solution 6. Do development work 7. Build a prototype 8. Test and redesign

It is common for the engineers to jump back and forth between the steps and repeating earlier steps[18]. This is called an iterative process. The steps are thus not followed religiously.

The engineering design process is a so called “open-ended” design as the best solution to meet the requirements of the problem is not known in advance. Previous knowledge together with information gathered from research is used to explore possible solutions and compare different ideas in order to select the solution that best uses the available resources and best meets the products requirements.

Yousef Haik describes the engineering design process as

[...] a sequence of events and a set of guidelines that helps define a clear starting point that takes the designer from visualizing a product in his/her imagination to realizing it in real life in a systematic manner—

without hindering their creative process.[19]

This definition also makes it clear that the engineering design process is about starting with an idea, developing a design and finally developing the product. The author describes two different ways to design a device or system[19]:

1. Evolutionary change: Here the product is allowed to evolve over time with only slight improvement.

2. Innovation: technological discoveries has placed a great deal of emphasis on new products, which draw heavily on innovation.

Haik uses the telephone as an example when describing the difference between these two points. The telephone was an innovative design as it was a new product made possible from technological discoveries. The telephone then evolved slowly for many decades but with only minor improvements until the next innovation and technological jump occurred with the mobile phone. This in turn evolved with small improvements being added until the next innovative design, and so on. Haik also adds that “although the emphasis is on innovation, designers must test their ideas against prior design. Engineers can design for the

future but must base results on the past”.[19]

There are however multiple versions of the engineering design process, as is presented by T.J. Howard and S.J. Culley, E. Dekoninck in their report “Describing the creative design process by the integration of engineering design and cognitive psychology literature”[20], where they compare different versions of the method. The version used for this project is the one described above. The list below is described as the general agreed upon phases of the process[20]:

(22)

CHAPTER 3: METHOD

15 1. Establishing a need phase

2. Analysis of task phase 3. Conceptual design phase 4. Embodiment design phase 5. Detailed design phase 6. Implementation phase

The difference between the version used and the version presented above is most noticeably that the one presented above is more focused around the different phases of work while the one used is more of a step by step guide of how to develop the product. Furthermore, the list presented above is more focused on the different design phases while the version used is more focused on the implementation and the testing of the product.

(23)

16

Chapter 4

The implemented system

The developed system’s basic idea is to listen to a streaming API and save the results in a database, thus eliminating the need to send requests to the server. The process of building this system includes a number of different steps that were performed in order to achieve the desired working system:

1. Explore Meetup’s streaming API 2. Set up a client

3. Make the client listen to the streaming API

4. Set up a database with tables corresponding to the different information included in the API

5. Store the generated results from the listening stream in the database 6. Retrieve the information from the database and convert it into a Java object 7. Testing the system and collecting data

The process where the information travels from the streaming API to the user is shown in figure 3 below:

(24)

CHAPTER 4: THE IMPLEMENTED SYSTEM

17

Figure 3: Illustration of the Process of Sending Data from Meetup's Server to the User

Exploring the streaming API was done to know what kind of response the client would receive so that the client could be built to handle what the server sent. The desired information was collected from the streaming API’s documentation[21]. From this documentation it was learned that the response from the API is in JSON format (see section 4.1. for a description of JSON). It was also found out that Meetup uses a persistent connection with the client that will only be terminated for server maintenance. This is what was desired for this project as the application developed requires an indefinite stream to get all information wanted. The documentation also specified what host (base URL) the client should connect to. Furthermore, the documentation contained a detailed figure over what information the response sent contained.

The next step in the development process was setting up a client. This was done with the help of the Retrofit library, that uses OkHttp to send network requests (see section 4.4.), which made the process simple. The client is the system that will connect to the server and receive the response.

To make the client listen to the streaming API the Retrofit framework (see section 4.4.) was used. This framework allows the user to make a client listen to a specified endpoint, which in the case of this project is Meetup’s streaming endpoint.

Step 4 was setting up a database with the tables needed to store all the information from the API. This was done using PostgreSQL, which is an open source object-relational database system[22]. What tables to create was determined from the information in the API’s documentation, where a detailed description of what was sent from the server was available.

(25)

CHAPTER 4: THE IMPLEMENTED SYSTEM

18

When all the tables were set up in the database the code was written for caching the response from the server in the database. This mostly involved writing SQL statements for inserting information in the database.

The next step was retrieving the information in the database and creating a Java object with the information. This object is what is sent when a GET request is sent to the application with a user’s location and possible search term. Postman was used to send requests to the application to test the response. This meant sending a request to the one of the endpoints set up for the application. For example, to one endpoint a location (longitude and latitude) and a radius is sent and the response from the application is an array of the objects created for all the Meetup events within the specified radius of the location. (An example of the response from the application can be seen in figure 5).

Finally, extensive tests were done to make sure everything was working as intended. This was done by running the program for a full week on one of KTH’s servers (sky2.it.kth.se). It is of course not possible to run the application indefinitely, but it was decided that a week of streaming should be sufficient to determine that the stream could be trusted to be endless, as well as making sure that the system could handle bursts of events being received in a small amount of time. During this time, the number of received items from the server was counted to calculate the average number of items received per minute. The result of this is presented in section 5.

4.1. Software development tools

The system was developed in Java using the IDE “IntelliJ IDEA”. A Maven project was set up in IntelliJ to manage the libraries used. Maven is a software project management and comprehension tool[23]. Maven uses the project object model POM to manage a project’s build, reporting and documentation[23].

The database used was PostgreSQL with PGAdmin used as an administrator for the database.

However, most of the database-related work was done in Windows command prompt (CMD).

4.2. JavaScript Object Notation (JSON)

JSON is a human friendly lightweight data-interchange format. It is easy to parse and generate and is also language independent. It has the following structures[24]:

● Name and value pairs which is realized as an object, record, struct, dictionary, hash table, keyed list or associative array.

● An ordered list of values which is realized as an array, vector, list or sequence.

These data structures are interchangeable with all modern programming languages.

A JSON object is a set of name and value pairs separated by a comma and begins with a left brace,“{“, and ends with a right brace, “}”, with a colon at the end of the statement[24]. An example of a JSON object can be seen in figure 4.

(26)

CHAPTER 4: THE IMPLEMENTED SYSTEM

19

Figure 4: Example of a JSON Object

The figure shows a JSON object, “widget”, with other JSON objects within it, “window”,

“image” and “text”. On the left side of the colon is the name of the object, for example

“window”, and on the right is the content of the object.

4.3. Dropwizard

DropWizard is a Java framework which gathers different libraries for development of a REST web service. It provides functions for building web applications with the help of Maven. Since a web application needs HTTP in order to work properly Dropwizard uses the Jetty HTTP library for embedding a HTTP server into the project. The project has a main method that starts the HTTP server and the application is run as a simple process.[25]

In order to build web applications there is a need for performance, as well as clean and testable classes which maps HTTP requests into objects. The Jersey framework provides these different features and is integrated in the Dropwizard framework. Among other things Jersey also supports streaming output and GET requests, two features that are both used in the developed application.[25]

The created Dropwizard application includes an application class which gathers all the different bundles and commands to provide basic functionality. It is from this class that the whole application is started in a run method. This run method is called from a main method, that works as an entry point for the application.[25]

(27)

CHAPTER 4: THE IMPLEMENTED SYSTEM

20

4.4. Retrofit, Okio and okHttp

Retrofit is a type-safe HTTP client for Java, and is used to connect to a REST web service. It is a framework for authenticating and interacting with APIs, where the API interfaces are turned into callable objects. Retrofit also makes it possible to download JSON data from a web API with help from the HTTP annotation[26]. Retrofit uses OkHttp to send network requests. OkHttp is an HTTP client for sending and receiving HTTP-based network requests[27,28].

OkHttp was also used in this project to deactivate a default timeout for the client, in other words making the connection to the server endless. Without setting this timeout the connection to the server was deactivated if no event was sent to the client in a default amount of time, resulting in an exception from the client.

The Okio library was used to buffer the data received from the server, which made it possible to easily access, store and process data. This was done with Okio’s BufferedSource, which contains an internal buffer for storing of bytes. A buffer is a sequence of bytes where the size does not have to be defined in advance and there is no obligation to handle positions, limits or capacities. The readings and writings were buffered as a queue. In other words, BufferedSource was used to buffer the events received from the server. Since the stream is endless, a fixed size could not be given and this buffer was therefore used for storing the JSON objects that came out as the result.[29]

4.5. RxJava

RxJava is an API for asynchronous programming with observable streams. Observables are objects that represent a source of data which it streams when it gets available in our system.

A subscriber is used to listen on this observable and is called subscription. The subscriber listens until the observable marks itself as ready, or, as in this case, continues indefinitely.[30]

(28)

21

Chapter 5

Result

The previous system, which sent multiple requests, exceeded the 200 request limit of requests that could be sent before the API key locked itself. With the current system the number of requests has been decreased down to zero. This is because instead of sending several requests to the Meetup API to receive the events, the system is now listening to Meetup’s streaming API so that the client receives all newly created events and all event updates directly.

The objects that are received from the stream sometimes arrive several at a time in a burst and sometimes several minutes can go by without anything being sent from the server.

Because of this variety in objects, tests had to be carried out to make sure the client could handle this variation as well as handling several requests arriving in a small amount of time.

The program was therefore run for a week to see if any complications would occur. The number of events received during this week were documented in order to calculate the average number of events received in one minute (see section 4). The result of this were that 129 540 events were received during the week, which means the average number of requests received per minute is approximately 13.

The events sent from the streaming API are sent as JSON objects (see section 4.2. for a description of JSON). This object contains all the information that is sent from the API, including id, time and description of the events. The JSON object for events contains other JSON objects for fee, venue and group, that contains information of the price of the event, the venue where it takes place and the group that created the event respectively. Group in turn contains two JSON objects for category and group photo. Each of these objects has different fields for the information the object contains. For example, fee contains fields for amount and currency. If any of the objects do not contain any information, for example if the event is free and therefore has no fee, the object is left out from the event. Below is a pretty printed (the objects are structured instead of being displayed as a single line) version of one JSON object sent from the server (see figure 5). As seen from the figure, it is a free event as it contains no fee object.

(29)

CHAPTER 5: RESULT

22

Figure 5: Example of an Event Sent from Meetup's server

These JSON objects were then saved in a database, where each object was saved in a separate table, that is, there was one table for each of the objects stated above where the information from that object was stored. The tables had columns corresponding to each of the names of the fields. Each of the rows in the table is a new event received from the server. Furthermore, the table contains a column for the event id so that it can be identified which event the information is from. Below is an example of the content displayed for the table “fee”. For example, we can see from the figure that the first row is from an event with the ID

“skkjcnywjbtb” that costs 15 GBP (British pounds) per person to attend. If a numeric value does not exist it is set to -1, as shown in the second column for the fee_api_version.

(30)

CHAPTER 5: RESULT

23

Table 3: Table "fee" in the Database

(31)

24

Chapter 6

Discussion and conclusion

From the result it is evident that the goal of finding an alternative way of collecting data from Meetup’s server without the locking problem of the REST API has been achieved. The number of requests are minimized and the data is collected and stored without any problems.

This means that the hypothesis of implementing a listener to listen to a stream of events from the streaming API that were presented in section 1.4. achieved the desired result. It is now possible to receive all the events from Meetup’s API without the API key locking itself.

From running the system for a week (see section 4 and 5) without the system crashing or any other complications, the conclusion is drawn that the system is indefinite and can handle a large number of data arriving in a small amount of time.

6.1. Sources of error

A possible error source of this project is that at irregular times after several hours of streaming, an EOFException[31] occurs. This is an exception thrown to signal an end of file.

Extensive testing has been done to determine the cause of the error and find a solution, however, no pattern has been identified of when the exception occurs or why. From the research process it has been determined that the most likely reason is a malformed response sent from the server. This can however not be confirmed since at no times when the exception occurs has the event causing the exception been identified. The testing that has been done to determine the cause involved running the stream directly in a browser. It was discovered that the stream stopped when run in the browser at the same time it stopped when run in the system developed in this project, and no more events were received until the page was refreshed. From this it is likely that the fault is not caused by the developed system, but rather by the data sent from the server.

The problem described above is temporarily solved by performing a so called “retry” on the connection to the stream. This automatically restarts the connection every time the exception occurs, so that the client can continue listening to the stream. Restarting the connection could in worst case cause a loss of data, in case any data is sent during the small amount of time it takes for the system to restart. This would mean events are lost so that they are never

(32)

CHAPTER 6: DISCUSSION AND CONCLUSION

25

displayed in the application, and the user could miss Meetup events. However, since the stream can go on for several hours before the exception occurs and since the time it takes to restart the connection is short, at most half a second, the impact is not considered severe.

The odds of receiving an event at precisely this time are quite small, and since the application is currently only released in Sweden and the large majority of the Meetup events takes place in other countries, the effect of this error source is considered small.

6.2. Choice of methods

A method can, as described in section 3.1, be either deductive or inductive. A deductive method was chosen rather than an inductive method as a hypothesis was formed for how the problem could be solved early in the process and the system was then implemented to test the hypothesis and solve the problem, which follows the deductive method (see section 3.1.1.). As described in section 3.1.1. an inductive method is, unlike the deductive method, about making an empirical study based on an identified problem and from this draw conclusions around the results. This is taking an opposite approach compared to the deductive method.

The engineering design process was chosen as the main method for this project as it is a well- established method for development of new products for engineers. It seemed suitable for the kind of work that was to be done in this project. The method had the advantage of giving concrete steps to follow through the development process, which provided a clear structure for the work and as a result made it easier to perform. Following the steps throughout the project gave a clear structure and was especially useful to specify the requirements of the system. This not only made it clear what needed be done but also helped to figure out how to do it.

The hypothetico-deductive method is, as stated in section 3.1.2, about starting with an initial theory and then perform experiments and evaluate the result. Though this is basically what has been done during this project, the engineering design process was chosen in favor of this method as the engineering design process was deemed more specifically formed for engineering development and therefore more suitable for this project. Therefore, the steps for the hypothetico-deductive method were not intentionally followed, though some of them have been performed due to this method being quite similar to the engineering design process, though it is more broad and general.

Lastly, the design science research method was considered for this project. This project is quite similar to the engineering design process, as it is about the development of artifacts to solve observed problems, something that is central for the engineering design process as well.

Furthermore, it is often applied to engineering and computer science. However, the guidelines (see section 3.1.4) that according to Hevner has to be addressed for the method to be classified as design science research method has not been fully addressed as some of them were deemed irrelevant or too extensive for this project. For example, guideline three states that “the utility, quality, and efficacy of a design artifact must be rigorously demonstrated via well-executed evaluation methods”[16]. This level of evaluation was not done mainly due to time limits. Because these guidelines were not followed, the design science research method has not been used.

6.3. Future work

Continuous work on this project could involve solving the recurring EOFException (see section 6.1.) that occurs after several hours of continuous streaming. As mentioned in section

(33)

CHAPTER 6: DISCUSSION AND CONCLUSION

26

6.1., this exception usually signals that the end of a file or stream is reached unexpectedly.

Since the stream is supposed to be endless according to its documentation[21], the program should not receive this kind of exception. To solve this exception, further research must be done to determine the cause of the error as well as to be able to solve it. This would mean that the problem with a potential loss of events during the time it takes to restart the connection with the server (see section 6.1.) no longer would be an issue.

In this project only one solution to the problem of sending too many requests to the server has been considered, namely to use a streaming API instead of sending requests to a REST API (see section 1.6.). This makes the solution quite specific, only working in the case a streaming API is provided for the service. Future work could therefore involve finding alternative solutions to reduce the number of requests sent to the server so the locking does not occur.

The system that has been developed for this project is currently run separately from the system that Anchr has developed to form the mobile application. The next step would thereby be to integrate the created system with their existing system, so that the events received from Meetup’s streaming API can be displayed in the application.

The current system is built in a way that when a request is received all the rows in all the tables in the database are searched through in order to find the events that matches the request. As the system is continuously running, the database will fill up with millions of rows of data, which will increase the search time when querying the database. This will affect the performance of the system and the application when listing nearby events. Therefore, the system could be developed further by implementing a more efficient way of searching the database. One possible solution would be sorting the events when they are inserted into the database so that binary search can be used to find events in a shorter amount of time.

Furthermore, a system could be developed to delete events that have passed, and therefore no longer are relevant, from the database and thereby reducing the number of rows that must be searched for each request.

A future feature that could be added to the application is allowing the user to search for events within a specific time interval. This could for example be useful if a user is visiting a particular place for a limited amount of time. The user could then choose to only display events that takes place within this specified time interval.

(34)

REFERENCES

27

References

1. "About Meetup." https://www.meetup.com/about/ 2017-05-26

2. Masse, M. REST API Design Rulebook: Designing Consistent RESTful Web Service Interfaces: O'Reilly Media, 2011.

3. Loreto, S.; Saint-Andre, P.; Salsano, S.; and Wilkins, G. "Known Issues and Best Practices for the Use of Long Polling and Streaming in Bidirectional HTTP." 2011.

4. Psaltis, Andrew G. "Streaming Data " Understanding the real-time pipeline:Manning Publications Co., 2016.

5. Kozamernik, Franc. "Media Streaming over the Internet." an overview of delivery technologies:EBU Technical Department, 2002.

6. Bifet, Albert; Holmes, Geoff; Pfahringer, Bernhard; and Gavalda, Ricard. "Detecting Sentiment Change in Twitter Streaming Data." Proceedings of the Second Workshop on Applications of Pattern Analysis, vol 17 (Diethe TomBalcazar JoseShawe-Taylor John and Tirnauca Cristina, eds). Proceedings of Machine Learning Research:PMLR, 2011;5--11.

7. Bommaiah, E.; Guo, K.; Hofmann, M.; and Paul, S. Design and implementation of a caching system for streaming media over the Internet. Proceedings Sixth IEEE Real- Time Technology and Applications Symposium. RTAS 2000 2000 2000;111-121.

8. Blomkvist, Pär, and Hallin, Anette. Metod för teknologer: Studentlitteratur AB, 2014.

9. Elo, Satu, and Kyngäs, Helvi. "The Qualitative Content Analysis Process." Journal of Advanced Nursing 62 (2008): 107-115.

10. Andersson, Niclas, and Ekholm, Anders. "Vetenskaplighet – Utvärdering av tre implementeringsprojekt inom IT Bygg och Fastighet 2002: Lunds Tekniska Högskola, Institutionen för Byggande och Arkitektur, 2002.

11. Martin, M., and McIntyre, L.C. "Hermeneutics and the hypothetico-deductive method." In Readings in the Philosophy of Social Science. United States of America:

The MIT Press, 1994.

12. Garbarino, Sabine, and Holland, Jeremy. "Quantitative and Qualitative Methods in Impact Evaluation and Measuring Results." Discussion Paper. Birmingham, UK:

University of Birmingham, 2009.

13. Frechtling, Joy. "An Overview of Quantitative and Qualitative Data Collection Methods." In The 2002 User-Friendly Handbook for Project Evaluation, 2002.

14. Vaishnavi, V, and Kuechler, W. "Design Science Research in Information Systems."

2004.

15. Peffers, Ken; Tuunanen, Tuure; Rothenberger, Marcus; and Chatterjee, Samir. "A Design Science Research Methodology for Information Systems Research." J.

Manage. Inf. Syst. 24 (2007): 45-77.

(35)

REFERENCES

28

16. Hevner, Alan R.; March, Salvatore T.; Park, Jinsoo; and Ram, Sudha. "Design science in information systems research." MIS Q. 28 (2004): 75-105.

17. ABET. "Criteria for Accrediting Engineering Programs." Effective for Reviews During the 2016-2017 Accreditation Cycle United States of America:ABET, 2015.

18. Tayal, S.P. "Engineering Design Process." International Journal of Computer Science and Communication Engineering (2013).

19. Haik, Yousef, and Shahin, Tamer. "Engineering Design Process." United States of America:Global Engineering: Christopher M. Shortt 2011.

20. Howard, T. J.; Culley, S. J.; and Dekoninck, E. "Describing the creative design process by the integration of engineering design and cognitive psychology literature." Design Studies 29 (3// 2008): 160-180.

21. "OpenEvents Stream."

https://www.meetup.com/meetup_api/docs/stream/2/open_events/ 2017-05-26

22. www.postgresql.org. "About." https://www.postgresql.org/about/ 2017-05-26 23. maven.apache.org. "Welcome to Apache Maven." https://maven.apache.org/#

2017-05-29

24. www.json.org. "Introducing JSON." http://www.json.org/

25. www.dropwizard.io. "Getting Started."

http://www.dropwizard.io/1.1.0/docs/getting-started.html 2017-05-29

26. square.github.io. "Retrofit: A type-safe HTTP client for Android and Java."

http://square.github.io/retrofit/ 2017-05-26

27. square.github.io. "OkHttp: An HTTP & HTTP/2 client for Android and Java applications." http://square.github.io/okhttp/

28. guides.codepath.com. "Using OkHttp."

https://guides.codepath.com/android/Using-OkHttp 2017-05-26

29. github.com. "Okio." https://github.com/square/okio 2017-05-26 30. www.vogella.com. "RxJava 2.0 - Tutorial."

http://www.vogella.com/tutorials/RxJava/article.html 2017-05-26 31. docs.oracle.com. "Class EOFException."

http://docs.oracle.com/javase/7/docs/api/java/io/EOFException.html?is- external=true 2017-05-26

(36)

TRITA TRITA-ICT-EX-2017:89

www.kth.se

References

Related documents

Thus, the larger noise variance or the smaller number of data or the larger con dence level, the smaller model order should be used.. In the almost noise free case, the full

Supplementary Materials: The following are available online at http://www.mdpi.com/2076-2607/8/12/1977/s1 , Figure S1: Read counts in 16S rRNA (V3-V4) gene sequencing before and

In paper IV, we tested behaviour in the open field on our advanced intercross line, finding that low fear score was associated with lower fearfulness in females in the open

To choose a solution offered by traditional security companies, in this paper called the firewall solution (Figure 6), is today one of the most common, Identity management market

How much you are online and how it has impacted your daily life How well you are with using internet for a balanced amount of time How well others near you (your family,

Here, you can enjoy shopping in lots of stores and you can also see landmarks like The Statue of Liberty, Empire State Building, Central Park..

According to Lo (2012), in the same sense “it points to the starting point of the learning journey rather than to the end of the learning process”. In this study the object

In summary, we have in the appended papers shown that teaching problem- solving strategies could be integrated in the mathematics teaching practice to improve students