Application of Amazon Web Services in software

(1)

IN

DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS

,

STOCKHOLM SWEDEN 2018

Application of Amazon Web Services in software

development

EMELIE THAM

MARCUS WERLINDER

KTH ROYAL INSTITUTE OF TECHNOLOGY

(2)

(3)

Abstract

During these last recent years cloud computing and cloud services have started to gain traction, which has been most notable among companies. Amazon have proven to be one of the powerhouses on providing scalable and flexible cloud computing services. However, cloud computing is still a relatively new area. From an outsider’s point of view, the overwhelming information and available services might prove to be difficult to familiarize with. The aim of this thesis is to explore how Amazon Web Services can be applied during software development and observing how difficult it might be to use these services.

Three test applications that utilized different Amazon Web Ser- vices were implemented to get an insight into how Amazon Web Ser- vices can be applied from a cloud computing beginner’s point of view.

These applications were developed in an iterative manner, where a case study was performed on each application. At the start of each new iteration a literature study was conducted, where sources were reviewed to see if it provided essential information. In total, nine different Amazon Web Services were used to implement and test the three respective test applications. Results of the case study were in- terpreted and evaluated with regards to the learnability and appliance of Amazon Web Services. Issues that were identified during the development process showed that Amazon Web Services were not user- friendly for users that have little to no experience with cloud computing services.

Further research on other Amazon Web Services, such as Elastic Cloud Computing, as well as other cloud computing platforms like Google or IBM, may provide a deeper and more accurate insight on the appliances of cloud computing.

Keywords: Cloud computing; Amazon Alexa; Learnability; Amazon Web Services; Software; Cloud computing services

(4)

Abstrakt

Under dem senaste åren så har molntjänster blivit ett allt mer pop- ulärt område, speciellt inom företag. Ett av dem största utgivare inom molntjänst branschen är Amazon som erbjuder skalbara och flexibla molntjänster. Molntjänster är dock ett relativt nytt område, vilket innebär att någon som inte är insatt i ämnet kan finna all tillgäng- lig information överväldigande och svår att bekanta sig med. Målet med det här tesen är att utforska olika Amazon Web Service som kan användas inom mjukvaruutveckling och observera problem som kan uppstå med dessa tjänster.

Tre testapplikationer som använde sig av Amazon Web Services var skapade för att få en fördjupad kunskap om hur dessa tjänster fungerar och vad för möjligheter de har. Dessa applikationer utveck- lades iterativt och en fallstudie utfördes för varje applikation. I bör- jan av varje ny iteration genomfördes en litteraturstudie, där källorna var kritiskt granskade för att se ifall dem innehöll väsentlig information för tesen. Sammanlagt användes nio olika Amazon Web Ser- vices för att implementera och testa de tre olika testapplikationerna.

Resultaten från fallstudien tolkades och utvärderades med avseende på Amazon Web Services lärbarhet och tillämpningsbarhet. Problem som samlades ihop under utvecklingsprocessen visade att Amazons Web Services inte var särskilt användarvänligt för utvecklare med liten eller ingen erfarenhet inom Amazon Web Services.

Ytterligare forskning inom andra Amazon Web Services som Elas- tic Cloud Computing och forskning som undersöker andra molntjänst plattformar som Google Cloud, skulle kunna bidra med en djupare förståelse och mer exakt inblick kring tillämpning av molntjänster.

Nyckelord: Cloud computing; Amazon Alexa; Learnability; Amazon Web Services; Molntjänster

(5)

Acknowledgements

We would like to thank our advisors at KTH Royal Institute of Technology, Anders Sjögren and Fadil Galjic, for providing us with their valuable insights and feedback throughout the course of this whole thesis work.

(6)

(7)

Abbreviations

S3 Simple Storage Service

EC2 Elastic Compute Cloud

SQS Simple Queue Service

RDS Relational Database Service

AWS CLI Amazon Web Services Command Line Interface ASK CLI Alexa Skill Kit Command Line Interface

AWS SDK Amazon Web Services Software Development Kit AWS SDK Alexa Skill Kit Software Development Kit

npm Node Package Manager

(12)

(13)

1 Introduction

Cloud computing is a topic that has gained increasing popularity in the recent years. A recent report by Nasdaq [73] has shown that in the last year 123 billion US dollars was invested into cloud services and cloud computing. It is estimated that by 2020 it will be close to 205 billion US dollars.

Adobe, Sony and Spotify [14] are some of the many companies that have incorporated cloud computing services into their systems. An example of a prominent user of cloud computing is Netflix, who has moved most of their computing to Amazon’s web computing and are aiming to completely mi- grate to the cloud [61].

As companies continue to grow larger, it becomes increasingly important to cut down on costs and stimulate productivity. Cloud providers such as IBM Cloud, Google Cloud Platform and Amazon Web Services offer well working and globally accessible infrastructures that can handle hardware requirements, software updates as well as security measures. These aspects of the cloud platform has made it a popular resource for companies to invest in, because it allows companies to focus on managing and deploy- ing their products without the need of spending resources on creating and maintaining a data center infrastructure.

Another attractive aspect of cloud computing is its low learning curve as, in contrast to grid computing [79], users are not required to have any prior operational knowledge [45]. Although cloud computing services has existed since the 2000s [1], it was not until these last recent years that it has really gained traction.

1.1 Background

This report will focus on Amazon Web Services (AWS), which is a popular cloud provider that first started in 2006 [1] and has continually grown since then. Currently, there are 90 different global cloud-based products that are offered for public use ranging from computing, storage, databases, machine learning, game development, analytics, development tools and many more. Developers are not exposed to the code behind the services, but are instead offered API:s and developer tools. For example when using an Amazon Alexa Skill, the developer only has to specify some specific phrases and words that Alexa should respond to instead of building up the whole structure needed for natural language processing. In this thesis, we

(14)

are interested in exploring the different Amazon Web Services and finding the possibilities in using these services.

1.2 Problem

Cloud computing services have great potential in terms of programming flexibility and efficiency as well as creating new opportunities for developers. On the other hand, cloud computing is a relatively new area that con- tinues to grow till this day. From an outsider’s point of view, the abundant information and services available might seem overwhelming and prove to be difficult to familiarize with. For example, it could be hard to know what services are useful for a specific task or how to implement these features with the rest of their product. There are not many reports that have tackled this problem or described the working process of using cloud computing services as well as the learnability problems that might occur. Thus, this thesis tries to answer the following questions:

RQ1. How can Amazon Web Services be used when developing modern applications?

RQ2.What can be said about learnability of using Amazon Web Services?

1.3 Purpose

The purpose of this thesis is to get a clearer view of the possibilities with Amazon web services and how they work as well as touching upon the problems that can occur when using them. This thesis will also give an insight for those who wish to work with AWS services or other web service in the future.

1.4 Goal

The main goal with this degree project is to provide a well structured documentation on the AWS cloud infrastructure and an evaluation of its usability from a computer engineering point of view. In addition, the goal of this thesis is to encourage further research on AWS as well as aid studies conducted on other cloud computing services. Another goal for this project is to create test applications that utilizes some of the AWS services.

(15)

1.5 Methodology/Methods

Before all else, a literature study was done with the aim of answering the research questions in this thesis. As such, literature that covered either cloud computing, software evaluation metrics and AWS were searched for in different documentation platforms. Each literature of interest was reviewed to see if it was relevant for the thesis.

The procedure for data collection was approached in a qualitative manner, in which a case study was created around test applications developed using AWS. Any information or framework that needed to be learnt during this process were acquired through additional research studies. The results were then evaluated accordingly to answer the stated research questions.

1.6 Delimitations

There are several web services that provide functions such as automatic speech recognition and natural language processing. Examples of such services are Google Cloud Platform, Amazon Web Services and IBM Cloud.

In this thesis the primary focus will be on Amazon Web Services to limit the scope of the study. However, not all of the web services provided by Amazon will be analyzed. Metrics that evaluate performance of the cloud services, such as latency, memory and bandwidth are not taken into consideration as it is outside of the scope of this project. This project instead aims to evaluate the learnability of using web services and cloud computing services.

(16)

1.7 Disposition

This thesis is structured as followed:

• Chapter 2 goes through the theoretical background necessary to follow the rest of the thesis.

• Chapter 3 describe the method used throughout the whole project.

• Chapter 4 present the information acquired from the literature study and the prerequisites for the different AWS services.

• Chapter 5 present the case studies on the different test applications and how they were implemented as well as integrated.

• Chapter 6 present the subjective data and the measured data acquired during the development of the test applications.

• Chapter 7 summarizes the results from the previous three chapters to answer the research questions.

• Chapter 8 discusses the methods used and the achieved results.

• Chapter 9 present the summary of the whole study as well as a discussion on future research.

(17)

2 Theoretical Background

In this chapter, a detailed description about the theoretical background of the degree project is presented together with related work. The chapter be- gins with an introduction of cloud computing. This is then followed by the technical background of Amazon Alexa and the AWS services. Following this is an introduction to the different tools and libraries that were used.

Lastly, the theory behind the definition and metrics of software learnability is presented.

2.1 Cloud Computing

For the purpose of this thesis, the National Institute of Standards and Tech- nology’s (NIST) [67] definition of cloud computing will be used; “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services) that can be rapidly pro- visioned and released with minimal management effort or service provider interaction.” NIST has identified five characteristics of cloud computing; (1) On-demand self-service (a consumer can provision computing power without the need of human interaction), (2) Broad network access (consumers can access the network through a broad range of devices), (3) Resource pooling (a provider can automate provisions and scalability of their services for multiple customers without it being apparent for the end user), (4) Rapid elasticity (the ability to automatically scale provisions and services) and (5) Measured service (the ability to measure resource usage).

There are three different types of cloud computing development models, where each model provides different degrees of control, management och flexibility. Out of these three models, Infrastructure-as-a-service (IaaS) offers developers most flexibility, control and management over their resources. It allows developers to, as described by NIST [67]; “...provision processing, storage, networks and other fundamental computing resources, where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications”. However, developers do not have the capability to alter or control the underlying cloud infrastructure.

The second model Platform-as-a-Service (PaaS) offers developers the ability to manage and deploy applications that they have acquired or created onto the cloud using programming languages, libraries, services and

(18)

tools provided by the cloud platform [67]. This is also the service model that this thesis will primarily focus on.

The third model Software-as-a-Service (SaaS) allows users to utilize completed cloud applications that are managed by the cloud service provider [67]. However, users have limited capabilities on what they can control and change in the cloud application.

2.2 Amazon Web Services

Amazon Web Services is a platform that offers cloud computing services, ranging from compute power, databases, storage, machine learning, analytics, development tools and many more. The following section will introduce the AWS Cloud’s infrastructure, followed by the Amazon Web Ser- vices that were used in this project.

2.2.1 AWS Cloud computing infrastructure

AWS Cloud computing structure consists of regions which consists of two or more availability zones. Each region is a separate geographic area, for example ‘us-east-1’, which is located in North Virginia or ‘ap-northeast-1’

which is located in Tokyo. All the available regions can be seen on Ama- zon’s page [5]. An availability zone is a collection of data centers. Avail- ability zones are physically separated from each other within a region, but are connected through low-latency links as can be seen in figure 1. Lastly each region has multiple edge locations to lower latency for developers/- customers requesting resources stored in those regions.

(19)

Figure 1: The structure of a region, where availability zones are separated physically, but connected through low latency links. [5]

2.2.2 AWS Lambda

AWS Lambda is service that allows the developer to run code or applications Lambda functions without the need for provision and server management [7]. The codes and applications deployed or implemented in AWS Lambda are called under the collective name as, Lambda functions. The service was created November 2014 and works by getting triggered by different events. These events can be for example Amazon Alexa or a website that sends a request to the AWS Lambda function [23]. Lambda can also interact with other AWS resources (see figure 2) such as CloudWatch. When a Lambda function is executed, AWS Lambda takes care of the amount of resources needed to execute the Lambda function on their servers.

Lambda can handle as many incoming requests as possible, because it can make a copy of itself that takes care of the other requests. Lambda is one of many Amazon Web Services and is mostly used when building smaller applications compared to for example AWS EC2. Lambda can be programmed in the the following languages: Node.js, Python, Java, Go and C#. [7]

Lambda allows user to only focus on the coding aspect when develop-

(20)

ing a product and not have to worry about any of the other parts, such as infrastructure and computing power when building a good application [7].

Figure 2: The Lambda triggers and resources, to the left are triggers that activate the Lambda function and to the right are resources that interact with the skill.

2.2.3 Alexa & Alexa Skill

Alexa is Amazon’s voice service that can comprehend and extract words from the following languages: English, French, German and Japanese. Ama- zon lets developers to use Alexa to be able to build different skills. A skill can for example be a program that allows a user to book a taxi.

Alexa is not always on and listening, because of legal and privacy measures. Instead Alexa uses a wake-up-word, which is Alexa. So to activate a skill the following utterance has to be said “Alexa open specific-skill- name”.

Lambda is usually the back end logic of the Alexa Skill. Alexa’s job is to understands what the user is saying and extract the necessary information, so that AWS Lambda can perform its task.

Generally a skill consists of intents, slots and different utterances. An intent is related to specific utterances that the user can say. The intent is used to help other services know how to interact with the Alexa Skill. A slot represents a variable that could potentially be a collection of different words to avoid having to create many utterances. Amazon offers a wide range of already available slots, such as date, places, celebrities and much more. An example of this can be seen in figure 3, where utterances such as

"what happens today" or "what happens tomorrow" will activate the intent

"DateSearch".

(21)

Figure 3: The Alexa Skill utterances and intent AWS interface.

2.2.4 AWS Identity and Access Management

AWS Identity and Access Management (IAM) [33] is a web service that manages and secures the access to AWS resources. The aim of the IAM is to be able to control the authentication of the user and which of the resources they are authorized to use. This includes features such as, allowing other users have shared access to your AWS account, distributing different permissions to different users for different resources, adding a two-factor authentication to your AWS account as well as to other individual users, and many more. For further information see [33].

To understand the IAM infrastructure (seen in figure 4) it is necces- sary to introduce the following concepts: "principal", "action", "resource"

and "policy". A principal is an entity that can take action on the resources through the AWS Management Console, the AWS Application Program- ming Interface (AWS API), or AWS CLI. Examples of principles are users, roles and applications. An action is an operation defined by an AWS service and a resource is an entity that exits within a service, such as S3 buckets and IAM users.

When a principal wants to take action it will send a request to AWS, which contains a "request context"; assembled request information from different sources. This will prompt IAM to evaluate whether or not the request context is authenticated and authorized to take a specific action on the requested resource. During the authorization, the IAM will take the request context’s values and check if there are any policies that matches it or not. A policy is an entity that can be attached to a principal or resource to define what permissions they have, for more information see [21]. In

(22)

IAM, the policies are stored as JSON objects, which contain specifications on what permissions are allowed or denied for resources or principals.

Once the authentication and authorization of the request is done, the action is performed.

Figure 4: The IAM infrastructure [33].

When an AWS account is created for the first time an AWS account root user identity is also created [33]. The root user has complete access to all AWS services and is accessed by signing in with your email address and custom password. It is not recommended to use the root user for simple tasks, as it has the highest authorization. As such, the root user should

(23)

only be used to creating the first IAM user. These are users within the AWS account that the root user has distributed specific permissions to. Each of the created IAM users have their own unique credentials, which by default contains the key access ID and secret access key. To sign in to the AWS console using the IAM user account the following sign-in URL is used [17]:

https://account-ID-or-alias.signin.aws.amazon.com/console. If you want to delegate access of your resources to another account, user, applications or services, without having to uniquely associate a credential to a specific identity then an option would be to use IAM roles. Unlike IAM users, IAM roles dynamically creates temporary security credentials provided to the user.

2.2.5 AWS Relational Database Service

Amazon Relational Database Service (RDS) is a web service that makes it possible for developers to set up and operate a scalable relational database.

A relational database is a database built around storing data in separate tables, instead of storing everything within the same place. Amazon’s focus is to take care and manage the database, while the user only has to think about the data, schema, tables etc. [29]

RDS basic building block is the database instance that is located in a isolated environment in the cloud (see figure 5). The instance can be accessed the same way a normal database would be, depending on what type it is. RDS is compatible with MySQL, MariaDB, PostgreSQL, Oracle and Microsoft SQL server. This is to avoid developers having to relearn and develop with something they are familiar with. [29]

It is also possible to specify specs when creating a RDS database, such as the amount of CPU power, storage and many other things. Users can also decide how often the RDS database should be backed up. [29]

A RDS database instance created in one region will not be available in the other regions. However, it is possible to copy a RDS database instance to other regions. [29]

(24)

Figure 5: The general structure for VPC, where the the instance could potentially be a RDS instance [29].

2.2.6 AWS Virtual Private Cloud

To be able to launch some of Amazon’s resources it is necessary to use VPC to carry out certain functions. For example, when launching an EC2 instance or a RDS database instance (see figure 5. Virtual Private Cloud (VPC) [32] resembles a normal network that would be operated at a data center, but with the additional benefit of a scalable infrastructure.

It is possible to configure the VPC by modifying the IP address range, create subnets and other general setting that would be possible with a tra- ditional network. A subnet is range of the VPC’s IP address. Different AWS resources can be launched at different subnets. [32]

(25)

2.2.7 AWS Elastic Compute Cloud

The point of Elastic Compute Cloud (EC2) is to offer scalable computing power that can easily expand or scale down within minutes. This helps companies to avoid the problem of buying enough computing power to handle a lot of traffic, though it may be just for a short period and later useless during the rest of the time when there is low traffic. [28]

A simplification of EC2 is to see it as a virtual equivalent of a computer or more as a server computer. These are called instances. An instance can be for example used for web hosting or an RDS database. In other words, instances are used when there is a need for a lot of computing power and is one of the reasons why EC2 is a popular product amongst companies like Netflix. [28]

2.2.8 AWS Simple Storage Service

AWS Simple Storage Service (S3) [20] is a web service that provides a virtual storage, which users can use to retrieve and store unlimited amount of data at any time and from any location in the web. Some of the advantages of S3 is that it supports almost all of the AWS services [20] and lets the user choose different types of storage classes depending on their needs [4]. Fur- ther information regarding the benefits of AWS S3 can be found in [20].

Buckets are resources of S3 that act as containers to store objects [37].

Each bucket needs to have a globally unique name. An object is also a resource of S3, which is composed of a key (the assigned name of the object), version ID (an unique string used to identify the version of an object in one bucket), value (the data that is stored), metadata (a set of name-value pair), subresources (additional information specifically associated with an object or a bucket) and access control information (information about the attached access policy - by default only the resource owner has access) [38].

2.2.9 AWS CloudWatch

Amazon CloudWatch [26] is a monitoring service used for looking over other Amazon Web Services, such as S3, EC2 and RDS. It is possible to look up multiple different metrics from different services. For example it is possible to see the amount of database connection done to a RDS instance or check the write latency of the RDS instance. Another function of cloud

(26)

watch is to see logs that different applications create during execution. For example it can be used to debug Lambda functions.

It is also possible to set up alarms in the cloud watch that triggers on certain events, like when a certain amount of traffic is coming in. These alarms can then trigger auto scaling so that the services that get all the traffic does not get overloaded (as seen in figure 6).

Figure 6: The AWS CloudWatch infrastructure [26].

2.2.10 AWS Simple Queue Service

When sending messages between different AWS services, Simple Queue Service (SQS) is a handy tool. SQS is a distributed message queuing service that supports both standard and FIFO (first in first out) queues. For example, it is possible to set up an event in S3 that gets triggered when a file is uploaded. S3 will then send a message to an SQS queue linked to the bucket. This message can then be read and used by another Amazon service. [30]

SQS has an at-least-once delivery policy (a message is guaranteed to be re-sent until it is received on the other end) and the messages are stored on different servers to reduce redundancy and provide availability. [30]

(27)

2.2.11 AWS Transcribe

Amazon Transcribe is an automatic speech recognition (ASR) service that was introduced in the late 2017. Amazon Transcribe can analyze files that are uploaded to S3. It is possible to transcribe audio files that are stored in common formats such as MP3 and WAV. [31]

When Amazon Transcribe is done analyzing the audio file it returns a JSON file that contains the completed translated text as well as timestamps for every word spoken in the audio file. Amazon Transcribe can also recognize multiple speakers and splits up the translated text between the different speakers. So far, Transcribe can support the languages: US English and Spanish. It can also at most translate a 2 hour long audio file. [31]

2.2.12 AWS Comprehend

Amazon Comprehend uses natural language processing (NLP) to extract information within four different categories. The following categories are

"entities", "key phrases", "language" and "sentiment". An entity can extract things such as places, dates, companies e.t.c. Key phrases can recognize things such as “books” and “customers”, which provide a better understanding as to what the text is about. When analyzing the text, it is possible to analyze what language the text is written in, as over 100 languages can be detected. The other categories can only though be analysed in English or Spanish. Sentiment tells if there is a positive, negative or a neutral state- ment behind the text. [27]

All of the results from the different categories also come with a confidence percentage. This tells the user the level of certainty in which Com- prehend had understood the text correctly. As such, the user can filter out results that do not meet a high enough confidence percentage. This also makes it possible to examine larger range of documents and analyze a common theme between them. [27]

2.3 Amazon Web Services Tools and Libraries

Amazon Web Services provides developers with development tools and libraries, the theory behind the tools and libraries used in this project are introduced in the following sub-sections.

(28)

2.3.1 Amazon Web Services Software Development Kit

AWS Software Development Kit (AWS SDK) is an API used to help build libraries or applications that make use of AWS services and is currently available for Java, .Net, PHP, Python, Ruby, C++, Go, Node.js as well as Browser [24]. It should be noted that not all AWS services are supported by AWS SDK. However, most of the services that were used in this project utilizes AWS SDK for Node.js. For more information on what services uses AWS SDK for Node.js, see [36].

2.3.2 Node.js

All browsers have a JavaScript engine that runs all the incoming JavaScript from different web pages. Different web browsers have different JavaScript engines, Firefox has Spidermonkey, Safari has JavaScriptCore and Chrome has V8 [18].

Node.js [2] is a runtime (decides how code should be executed) system based on Chrome’s V8 JavaScript engine, with some additional libraries.

This allows users to run JavaScript on their desktop and host simple servers that can handle incoming I/O requests, instead of using the web browser and upload code to a server.

2.3.3 Node Package Manager

Node Package Manager (NPM) [42] is a package manager used for down- loading different Node.js packages. NPM hosts over 600 000 different packages created by different stand alone developers that can be easily accessed and downloaded.

It also allows users to upload their own created packages, which allows developers to easily share their solutions to specific problems and for other developers to reuse them. This helps developers avoiding “reinventing the wheel”.

All NPM packages contain a JSON file, which keeps track of all the different packages and libraries a certain npm package uses. This allows the user to avoid installing all the different dependencies a certain package has, instead the user only has to run “npm install –package-name” and the package is ready to be used. [42]

(29)

2.3.4 JSON Objects

JSON stands for JavaScript Object Notation which is commonly used to transfer data asynchronously between user and server. It can transfer attribute- value pairs and array data types. JSON has its origin from JavaScript, but is a language-independent data format that many other programming languages can use. JSON objects have simple to follow structure that one can easily add to and extract objects from. [19]

2.3.5 Amazon Web Services Command Line Interface

AWS Command Line Interface (AWS CLI) is an open source tool built upon AWS SDK for Python (Boto). It provides the developer, through commands, the ability to efficiently interact with the AWS services. Using AWS CLI the developer can easily execute actions such as, creating a S3 bucket, configure administrative rights and credentials etc. For more information on the commands available for AWS CLI see [35].

2.3.6 MySQL

MySQL is a management systems used for handling relational databases.

MySQL uses the language SQL to handle the data that is stored on the database. SQL is the standard language when interacting with relational databases. MySQL is open source, which allows stand-alone developer to build out on the existing code for MySQL. This has led to MySQL being one of the most popular choices when building relational databases. [34]

2.3.7 Alexa Skills Kit Software Development Kit

Alexa Skills Kit Software Development Kit (ASD SDK) [81] is an open- source development kit that is constantly being developed on. It provides functions that can be used in Lambda which eases the interaction between Lambda and the Alexa Skill. This allows programmers to focus more on the program’s logic than setting up the necessary code to communicate between Lambda and the Alexa Skills.

2.3.8 Alexa Skill Kit Command Line Interface

When working on bigger Alexa Skills, Alexa Skill Kit Command Line In- terface (ASK CLI) is a handy command line tool, which is used to manage

(30)

different skills and the related Lambda functions. Using ASK CLI, developers can easily download, upload, create and copy already existing skills.

ASK CLI also requires AWS CLI to set administrative rights, so that the skill can be launched. [3]

2.4 Definition and Metrics of Software Learnability

As the systems become more complex it becomes increasingly important that softwares have a good learnability. There are no commonly agreed definition on the term learnability, which the results of Grossman’s literature review [59] on definitions of learnability have shown. The reason being that the definition could vary depending on the nature of the study. For example, whether or not the user has experience with similar software, what level of experience with computers or interfaces they have etc. To avoid locking in on one specific definition with the risk of disregarding relevant results, Grossman proposes a taxonomy of learnability, which is shown in figure 7.

Figure 7: A taxonomy of learnability proposed by Grossman [59].

Additionally, Grossman divides the set of definitions according to two main categories, which cites:

• Initial learnability: "Initial performance with the system."

• Extended learnability: "Change in the performance over time."

It should also be noted that the definition of performance can have different meanings depending on what is measured. For example, the most

(31)

basic measures are based upon the user’s success rate, completion time, error rate and their subjective satisfaction [70]. In other words, it depends on what the purpose and set up of the study is. This will be further discussed later.

With the consideration of both Grossman’s taxonomy and the purpose of this thesis, the definition of learnability that would best fit this study is

“the user’s ability to eventually achieve a specific performance, for a user with no experience with the interface”. This falls into the category of initial learnability as this study aims to see how easy it is for novice users to use a certain service by completing a series of tasks. As opposed to having an expert user find the optimal solutions for solving the tasks.

As previously stated, performance can be measured in varying ways and thus there exist several different evaluation metrics. Grossman [70]

have identified and categorised metrics found in their literature research into seven types: "task metrics", "command metrics", "mental metrics", "subjective metrics", "documentation metrics", "usability metrics" and "rule metrics". Each of these metrics are explored in further details beneath.

2.4.1 Task Metrics: Metrics based on task performance

Task metrics aims to evaluate the participant’s task performance and is a popular method for gathering quantitative data using the following metrics:

1. Percentage of users who complete a task optimally [66].

2. Percentage of users who complete a task without any help [66].

3. Ability to complete task optimally after certain time frame [52].

4. Decrease in task errors made over certain time interval [68].

5. Time until user completes a certain task successfully [69].

6. Time until user completes a set of tasks within a time frame [69].

7. Quality of work performed during a task, as scored by judges [54].

Nielsen [71] recommends that at least 20 participants should be used for the acquired results to be significant and statistically valid. Thereby, this type of metric will not be used in our study, as there are not enough

(32)

participants as well as the data that the study aims to gather is of qualitative nature rather than quantitative.

2.4.2 Command Metrics: Metrics based on command usage

Command metrics aim to evaluate the participant’s command usage and may include the following metrics:

1. Success rate of commands after being trained [53].

2. Increase in commands used over certain time interval [68].

3. Increase in complexity of commands over time interval [68].

4. Percent of commands known to user [48].

5. Percent of commands used by users [48].

Measurements of command usage are most helpful when done over an extended time period because the usage of commands usually change with time. In other words, this type of metric fits studies that revolve around extended learnability. Therefore this metric is not suitable for our study as we are more interested in the initial learnability.

2.4.3 Mental Metrics: Metrics based on cognitive processes

Mental metrics aim to evaluate the participant’s cognitive processes and may include:

1. Decrease in average think times over certain time interval [68].

2. Alpha vs beta waves in EEG patterns during usage [76].

3. Change in chunk size over time [75].

4. Mental Model questionnaire pretest and post test results [72].

This type of metric would not fit our study as it cannot be done at a low- cost and would require prior knowledge on the science behind cognitive processes to analyze the data acquired.

(33)

2.4.4 Subjective Metrics: Metrics based on user feedback

Subjective metrics is based on user feedback, which produces subjective data. The metric might include the following:

1. Number of learnability related user comments [68].

2. Learnability questionnaire responses [56, 65].

3. Twenty six Likert statements [56].

The aim of this metric is to evaluate whether or not each individual user is satisfied with using the product. As such, some of the metrics of this type will be used in this study, as it would help gather qualitative data on the learnability issues present with the product.

2.4.5 Documentation Metrics: Metrics based on documentation usage Documentation metrics aims to evaluate the participants documentation usage and may include:

1. Decrease in help commands used over certain time interval [68].

2. Time taken to review documentation until starting a task [68].

3. Time to complete a task after reviewing documentation [68].

This type of metrics is of a quantitative nature and needs extended use of the system, which is not suitable for our type of study.

2.4.6 Usability Metrics: Metrics based on change in usability Usability metrics aims to evaluate the change in usability and include:

1. Comparing “quality of use” over time [49].

2. Comparing “usability” for novice and expert users [49].

To conduct a study that uses this kind of metrics would require extended use of the system and a large test group to provide continuous feedback. As such, this metric is not suitable for our study.

(34)

2.4.7 Rule Metrics: Metrics based on specific rules

Rule metric aims to find learnability issues that present themselves during the early stages of the design process. A prototype is usually created and generalized into a system model in which the designer will use to test the consistency of the product’s interface and thereby its degree of learnability.

This type of metrics is based on:

1. The number of rules required to describe the system [60, 63].

This metric would not fit with our study as we are not interested in optimizing any model or system.

2.5 Related Works

This section goes through the similar work that was found during the literature study.

2.5.1 Reports on Cloud Services

There were multiple reports regarding cloud computing and cloud services. However, only two reports were deemed to have covered important aspects of cloud computing and comparisons between different cloud computing providers.

The first report from 2014 [44] looks into different reports that cover the subject cloud computing and summarizes these different reports. In this report some of the most important aspects of cloud computing is doc- umented.

The second report from 2016 [47] looks instead into important aspects startup companies should look for when deciding on a cloud service provider.

In this report three major cloud service providers, Amazon AWS, Microsoft Azure and Google Cloud were looked into and compared.

2.5.2 Projects using Amazon Web Services

A study conducted back in 2010 [64] looked into the possibilities of using some of Amazon’s Web Services such as, Elastic Cloud Computing (EC2) and Simple Storage Service (S3), together with their own environmental

(35)

sensors. The report goes through how each step of the application was integrated and the architecture of the application.

Another study in 2011, conducted [57] a research on how mass amounts of biomedical data could be processed and stored through the use of Ama- zon Web Services like EC2 and S3. The study also gives an cost estimation of the different steps of implementation of the application.

(36)

(37)

3 Method

This chapter describes the methodology and research methods used in the degree project. The first section introduces the choice of methodologies and how they were implemented in regards to the project. This is followed by a section that present the sub-questions. Lastly, a description of the research approaches are presented.

3.1 Research Methodologies

This section gives an overview of the research methodologies that were applied in this project and why certain approaches were chosen over the other.

3.1.1 Inductive & Deductive Approach

An inductive approach revolves around searching for patterns from observations and explaining/developing theories from these patterns with hypotheses. More specifically, inductive reasoning follows a bottom-up approach, meaning that empirical data is collected before developing a hypothesis. For more information see [50]. Deductive reasoning follows a top-down approach, where available information and facts are directly applied when developing a hypothesis, which is then verified through ex- tensive testing. In other words, this type of approach is more focused on testing or verifying a former theory. For more information see [50].

3.1.2 Quantitative & Qualitative Research

Quantitative research makes use of results that are numerical, such as percentage and distance. To analyze this data, mathematical and statistical methods are used. This is done in an attempt to produce unbiased results and data. Qualitative research focuses instead on interviews, observations, focus groups, etc. This results in the data being rich and gives a more in- depth picture of why and how things occur. [50]

3.1.3 Case Study

A case study is a process of research into the development of a certain area over a period of time. This involves doing an in-depth research about a phenomena or subject. The different cases in the phenomena is closely in- spected to get a rich and deep data about those specific cases [50]. It is not

(38)

possible to draw any statistical generalizations from the case studies, but well performed case studies can be compared to each other.

3.1.4 Bunge’s method

Bunge’s method [51] was created 1985 and revolves around doing lesser qualified guesses based on intuition and common sense, which results in trial and error until reasonable results are discovered. This project will use a generalisation of Bunge’s method created by Andersson [46], which is used when developing technical products. Below describes in detail the 10 steps of the generalisation:

1. How can the current issue be solved?

2. How can a technique/product be developed to solve the problem in the most efficient way?

3. What basis/information is available and what are the requirements to develop a technique/product?

4. Develop the technique/product based on the bases/information gathered in step 3. If the technique/product is satisfactory, go to step 6.

5. Try creating a new technique/product.

6. Create a model/simulation with the suggested technique/product.

7. What are the consequences of the model/simulation from step 6?

8. Test the implementation of the model/simulation. If the results are not satisfactory go to step 9 otherwise, continue to step 10.

9. Identify and correct the following flaws with the model/simulation.

10. Evaluate the results in relation to existing knowledge and practice, as well as identify new problem areas for further research.

3.1.5 Applied Research Methodology

When collecting data in our literature research, we will mostly look at cloud services/computing documentation and related work that has worked with cloud services/computing. This means that we will not be able to measure numerical data or use mathematical/statistical methods. As such, we chose

(39)

to use an qualitative approach when gathering data compared to a quantitative approach.

Our research does not have an originating hypothesis that we are trying to prove. During this research we will instead try to draw conclusions and hypothesis based on the data and experiences that we gather during our research. As such, the inductive method was deemed more appropriate approach than the deductive method.

To answer our research questions, we chose to work with case studies.

The case study was conducted on the development process of test applications, which were incrementally built using Amazon Web Services.

Andersson’s generalisation of Bunge’s method works well with our project, as it revolves around developing a technological product within iterations. The Bunge’s method also works well because we are not completely sure how and what AWS services to use, so through trial and error following Bunge’s method reasonable results will be gathered.

3.2 Sub-questions of the research question

The main research questions has been divided into sub-questions in an effort to further narrow down and identify issues that need to be consid- ered in this thesis. The following paragraph presents the identified sub- questions:

SRQ1.What different prerequisites are needed for using Amazon Web Ser- vices?

SRQ2. Which Amazon Web Services are important and how can they be used?

SRQ3. What problems arise when using different services during the application development?

SQR4.What are the costs of using Amazon Web Services as a developer?

(40)

3.3 Research Method

This section presents the research method followed in this project (see figure 8), with regards to the applied research methodologies mentioned in section 3.1.5. For the three methodologies used in this project, the following research method was followed (see figure 8).

Figure 8: The research method which was followed in this degree project.

3.3.1 Understanding

Bunge’s first step asks, “How can the current issue be solved?”. To answer this question a literature study around the research questions was conducted to get a better understanding of the area as a whole and the possibilities with it. Part of understanding the research question was to create sub-questions that needed to be answered and goals that the project had to fulfill by the end of the project.

3.3.2 Research

The Bunge’s second step, “How can a technique/product be developed to solve the problem in the most efficient way?” was also reflected on when creating the sub-questions and goals. After a clearer understanding of the goals and the research questions, the research was conducted with the aim of answering the third step of Bunge’s method “What basis/informations is available and what are the requirements to develop a technique/product”.

This was done by researching about different Amazon Web Services, where

(41)

a few of the services were more thoroughly looked into, that could potentially be used towards creating a technique/test application. This step is revisited each time a new iteration of the technique/test application was done.

3.3.3 Developing and Testing

Thereafter, the different theories, methods and services that were found during the literature study were used to start developing test applications.

After the end of each day, it was noted down how much time was spent on (1) finding resources, (2) implementing functionalities and (3) testing and debugging the test application. This step was done iteratively, where after each iteration it was evaluated whether or not additional services could be added to the application. If it is not possible to add further services and if it can be developed within the given time constraint, a new test application is created. This step of the research method follows more loosely step 4 - 9 of Bunge’s method.

3.3.4 Evaluation, Discussion and Conclusion

Finally to apply Bunge’s last step, “Evaluate the results in relation to existing knowledge and practice, as well as identify new problem areas for further research”, an evaluation was made on the data gathered during the literature research and the test application development. The data acquired was based on (1) the amount of time spent on the different activities and (2) the subjective observations made during the test application development, which were gathered using a feedback-based metric (described in the next section). The evaluation aims to answer the research questions and subsequently find out how well the project goals were met. From this a conclusion is drawn and further research areas are suggested.

3.4 Evaluation method

There are in general two types of evaluation methods that are used when evaluating usability, namely formative and summative evaluation. The aim of formative evaluation is to find and fix issues during the initial phase of the application development and thereby make the application more us- able. On the other hand, a summative evaluation is made later in the development with the aim of assessing the interface’s overall degree of usability.

(42)

These two evaluations can easily be applied to learnability as well.[78]

A summative evaluation is more fitting for our project as we are interested in making an assessment of how easy it is to learn the Amazon Web Services. As opposed to formative evaluation, which would be more suitable for studies that are interested in finding and fixing problems during the initial phase of the development process. As such, feedback-based metrics were used to evaluate the degree of learnability in using Amazon Web Services during development. Seeing that the collected data is subjective and of qualitative nature, it was deemed that metrics based on feedback would be the most fitting to use. The following factors were taken into consideration when evaluating the results, as well as the metrics that would have been suitable for this project; (1) the user’s level of experience with computers, (2) the user’s level of experience with interface and (3) the user’s experience with similar software. The following set of question- naires and Likert-scale was used to evaluate the difficulties of implementation, research and testing after each iteration:

• From the scale of 1 - 5 how difficult was it to implement this action?

Where 1 is very easy; 2 is somewhat easy; 3 is neither easy nor difficult; 4 is somewhat difficult; 5 is very difficult.

• What made it difficult/easy to implement this action?

• From the scale of 1 - 5 how difficult was the research for implementing this action? Where 1 is very easy; 2 is somewhat easy; 3 is neither easy nor difficult; 4 is somewhat difficult; 5 is very difficult.

• Why was the research difficult/easy?

• From the scale of 1 - 5 how difficult was it to debug and/or test this action? Where 1 is very easy; 2 is somewhat easy; 3 is neither easy nor difficult; 4 is somewhat difficult; 5 is very difficult.

• What made it difficult/easy to debug and/or test this action?

3.5 Development Tools

This chapter provides a brief overview of the tools, development environment and programming languages used in the project.

(43)

3.5.1 Development Environments

The development environment used in this project was AWS CLI, ASK CLI, Lambda Console Editor and Intellij. ASK CLI is a development tool using a command line interface that lets the developers manage their skills and other related Lambda functions. Lambda Console Editor and Intellij are both development tools used for editing code. The Console Editor can only be used when working with smaller projects. AWS CLI is a command line interface that is used to efficiently manage the AWS services. The OS used during development was Ubuntu 17.10 and 18.04, as it was compatible with the provided tools from Amazon.

3.5.2 Programming Languages

The code in this project was written mainly in JavaScript but some code was also written in Python. Javascript was used because it is one of the more common languages when writing Lambda functions and Python was used when writing code that used SQS, Transcribe and Comprehend.

3.5.3 Testing and Debugging Tools

In this project the following testing tools were used: Echosim.io, Ama- zon Alexa Skill kit -test, ASK CLI and Amazon CloudWatch. Echosim.io and Amazon Alexa Skill kit -test were more simple tools when testing the Alexa Skill as a whole, while ASK CLI with Amazon CloudWatch was used when debugging the code and checking that the request gave the correct re- sponse.

3.5.4 Project Model

To keep a project of this size under control, the following project models were used. Gantts chart was used when planning the workload ahead, which helps to keep track of the projects schedule and goals/deadlines that needs to be fulfilled before the project is finished. A simple graph was also created to give a simple overview of how many hours were going to be spent on the project each week.

To create flexibility and avoid setting the project up for failure, we created more flexibility around the method part of the triangle by using the MoSCow method in combination with Eklunds triangle [55]. The other aspects of Eklunds triangle “time” and “cost” stayed the same during the

(44)

project, because we have a limited amount of time until we need to present the project. There were barely any expenses during the project so it stayed the same. The method prioritization can be seen in the list below:

Must have:

• A report that fulfills course goals.

• Answered the research questions.

Should have:

• Use two or more AWS resources either by expanding on the current test application or building multiple test applications.

Could have:

• A fully functional product that could be used for Amazon Alexa.

Won’t have:

• Implementations using other web services besides AWS.

3.6 Documentation Model

For the thesis paper the template provided by KTH was used to create a good structure for the report. UML was used to create models to show the development process. Google sheet was used to keep track of the time spent and Google slides were used for some of the models. The Github repositoryhttps://github.com/Muckemannen/Bachelors_degree_work was used to store and share written code created during the project. This Github repository could also be used by other developers and give transparency on how everything was implemented.

(45)

4 Prerequisites for applying AWS Services and Tools

A big part of working with AWS Services is knowing how to set everything up and knowing the prerequisites for using each service. This chapter presents the results from the literature study conducted iteratively throughout the development of the test applications. This can also be served as a quick overview of the different services that were used in this project as well as the development tools needed to develop with the AWS services.

4.1 Prerequisites of applying AWS Services

This section describes the prerequisites that were needed when applying some of the AWS services, which was found from the literature study.

4.1.1 Lambda Prerequisites

Creating a Lambda function is simple. All that is required is to go to Ama- zons website for Lambda [8], log into one’s Amazon account and click create a new function. It is then possible to either start a new Lambda function from scratch or use preexisting blueprints for the Lambda function. For example there are multiple blueprints for Lambda functions that can communicate with skills.

When creating a Lambda function it is also important to set up an execution role (see figur 9). The execution role is needed because it decides which other AWS resources the Lambda function is allowed to interact with. For example to allow a Lambda function to access the Relational Database Ser- vice (RDS), the Lambda functions executions role has to have the following two policies:AmazonRDSFullAccess and AWSLambdaVPCAccess-ExecutionRole.

Otherwise the Lambda function will not be able to interact with RDS.

(46)

Figure 9: Setting the Lambda configurations.

There are two ways of editing Lambda code. Either one can use the online developer that the Lambda website provides. The other way is to download the Lambda function and use any code editor and then upload the Lambda function again to the Lambda website.

When starting out, the online developer is quite a handy tool, but it fails when it comes to working on bigger projects. Mostly because it is not possible to edit code in the online developer if the project is too large. Another issue with using the online developer is that it is more difficult to import different libraries.

When working with a downloaded version of the Lambda function, it is important to zip the project correctly. Otherwise Lambda will not be able to execute the code. To check if the project was zipped correctly the following command can be used, whereindex.js contains the main function.

zipinfo zip-map-name.zip | grep index.js | more If done correctly the following output should be seen:

-rw-rrw- ... index.js

It should be noted that theindex.js file should not be located in a sub- folder, otherwise it has been zipped incorrectly.

(47)

4.1.2 IAM Prerequisites

Before it is possible to access Amazon Simple Storage Service (S3) or Rela- tional Database Service (RDS), the developer needs to create an IAM user at the IAM Management Console [6] and set up the user authorization and authentication for the stated AWS services. It is necessary to select the access type “Programmatic Access” to enable an Access Key ID and Secret Access Key for the development tools (see figure 10). The "Management Console Access" option was also selected to create a custom password for the IAM user. As such, we could avoid using the root user.

Figure 10: Setting the IAM user details.

The next step handles what type of policies to attach to the user, in other words what extent of the services the user can access. Here, the policies are chosen depending on the AWS service and resources (see figure 11).

(48)

Figure 11: Setting permissions for the IAM User.

Once the user permission is set, Amazon IAM will provide credentials (Access Key ID and Secret Access Key) that are unique to this specific user.

It should be noted that the Access Key ID and Secret Access Key will only be shown once, but it is possible to create new credentials for the user. At most two keys can be created for each user for no cost.

4.1.3 S3 Prerequisites

There are several ways to create Amazon S3 buckets, however in this project we make use of the Node.js modules to create buckets and upload files. The Node.js modules utilizes AWS SDK for JavaScript and it was thereby necessary to install AWS SDK with the following command:

npm install aws-sdk

Note that before it is possible to use Amazon S3 resources, it is necessary to load the user’s credentials. To do this it was required to use AWS CLI to configure the user’s credentials with the following command:

aws configure --profile <InsertProfileName>.

4.1.4 RDS Prerequisites

In this project we set up a MySQL RDS database. Note that it is possible to set up other versions of a RDS database. Setting up a RDS MySQL

(49)

database can be quite difficult without a guide, as there are so many set- tings that need to be configured. Following Amazon’s documentation [12]

on how to set up an MySQL RDS instance, it becomes quite simple. Here it is also shown how to set up all the other types of RDS instances.

An important thing to notice after setting up the RDS instance is to check the setting for inbound and outbound traffic. As seen in figure 12, this can be found in the EC2 dashboard under the tab Security groups. Dur- ing this project we did not care about the security to much, so the RDS instance was available from anywhere (0.0.0.0/0). Otherwise there could be problems connecting to the database when working in school or at home.

Figure 12: Shows the security group for the RDS instance, where inbound traffic is set to 0.0.0.0/0 to allow traffic from anywhere

There are two ways to connect to the MySQL database [11] either with SSL or the mysql package. In this project we used the mysql package.

Connecting to the MySQL RDS instance using mysql. The following command can be used, where theendpoint is the RDS instance endpoint and mymasteruser the username that was created:

mysql -h <endpoint> -P 3306 -u <mymasteruser> -p

When connected to the MySQL RDS instance, regular SQL code works to navigate around in the database. It is possible to manage the RDS database with MySQL workbench [13]. In this project, it was not used due to the small size of the database.

(50)

4.1.5 SQS Prerequisites

SQS does not have any prerequisites for setting up a SQS queue. All that is needed is to use Amazons SQS online console, click create new queue, choose standard or FIFO, type in a queue name (when using FIFO, .fifo has to be included in the name) and then choose “Quick-Create Queue”. More in-depth steps can be seen on Amazon’s developer guide site [16].

Figure 13: Setting up the permissions for the SQS queue.

In this project SQS was used in combination with S3. Depending on which of the Amazon Web Service that wants to send a message to SQS, different policies are needed. Editing an SQS policy is done by going to the online console for SQS, then chose a SQS queue which policies are to be changed, click the tab “Permissions” and then click “Edit Policy Document (Advanced)” (see figure 13). More information about sending messages from S3 to SQS and setting up different policies can be seen on [15].

4.1.6 Amazon Transcribe and Comprehend Prerequisites

Both Transcribe and Comprehend require as most Amazon Web Service an Amazon account and an IAM User. If the user does not want to use the Amazon Online Console for Amazon Transcribe and Comprehend, it is required to set up AWS CLI. The only thing required after that is to write code that uses the Python AWS-SDK library Boto 3 [9] or another AWS-SDK library that supports interaction between Transcribe and Comprehend.

Application of Amazon Web Services in software

Application of Amazon Web Services in software

development

EMELIE THAM

MARCUS WERLINDER

Contents

Abbreviations

1 Introduction

2 Theoretical Background

3 Method

4 Prerequisites for applying AWS Services and Tools