Building a scalable email response system

(1)

Building a scalable email

response system

PETTER ANDERSSON

KTH ROYAL INSTITUTE OF TECHNOLOGY

(2)

(3)

response system

PETTER ANDERSSON

Master in Computer Science Date: April 3, 2020

Supervisor: Gunnar Karlsson Examiner: Mathias Ekstedt

School of Electrical Engineering and Computer Science Host company: Tendium AB

(4)

(5)

Abstract

This thesis concerns the task of designing and building a system around a natural-language understanding (NLU) bot to help automate email communi-cation, using cloud computing technology. Focus lies on a serverless environ-ment within the Amazon Web Services (AWS) infrastructure. Unique email addresses were employed to easily keep track of email threads connected to specific requests, and a storage solution was developed in Amazon Simple Storage Service (S3) to store inbound attachments connected to each procure-ment request. The NLU bot was trained on existing email data and deployed to classify inbound email content. The classification was then used by the system to determine adequate responses.

Templates were used for outgoing emails, which required a large amount of data in order to be useful. The thesis initially sought to develop a fully au-tomated system but that goal was modified to require human approval of out-going emails to closely monitor the accuracy of the NLU as well as to give a user-friendly experience. This decision resulted in a system that required more time and work from the user but that provided better reliability and accuracy. Improvements of the template matching mechanics would provide better tem-plate suggestions for outgoing email and is a main focus for future work. In hindsight, more resources were required for adapting and training the NLU bot in order for it to produce better classifications, which would have made it possible to more fully test the capabilities of the system.

(6)

Sammanfattning

Den här rapporten behandlar design och uppbyggnad av ett system kring en språkteknologisk applikation för att automatisera e-postkommunikation med hjälp av molntekniker. Fokus ligger på en serverlös miljö inom Amazon Web Services (AWS) infrastruktur. Unika e-postadresser användes för att enkelt kunna hålla uppsikt över olika e-posttrådar kopplade till olika upphandlings-fall. En lagringslösning utvecklades i Amazon Simple Storage Service (S3) för att spara inkomna bilagor kopplade till varje sådant fall. Den språktekno-logiska applikationen tränades på existerande e-postdata och användes för att klassificera innehållet i inkommen e-post. Klassifikationen användes sedan av systemet för att välja ut adekvata svar.

För utgående e-post användes mallar, vilket krävde en stor mängd data för att den skulle vara användbar. Examensarbetet hade initialt som mål att utveck-la ett helt automatiserat system, men det målet ändrades under arbetets gång till att kräva godkännande från en människa för utskick av utgående e-post. Detta gjordes för att bättre kunna kontrollera hur bra de valda svaren blev samt för att ge en bättre användarupplevelse. Beslutet resulterade i ett system som krävde att användaren lade ner mer tid och arbete men som gjorde reliabiliteten och precisionen bättre. Förbättringar för matchningen mellan e-postinnehåll och mallar skulle ge bättre svarsförslag för utgående e-post och är ett huvudfokus för det fortsatta utvecklingsarbetet. I efterhand kan konstateras att mer resurser krävdes för att anpassa och träna den språkteknologiska applikationen för att den skulle kunna producera bättre klassifikationer, vilket kunde ha möjliggjort en mer komplett testning av systemet.

(7)

1 Introduction 1 1.1 Motivation . . . 1 1.1.1 Societal impact . . . 2 1.2 System requirements . . . 2 1.3 Research question . . . 4 1.4 Contribution . . . 4 1.5 Limitations . . . 4 2 Background 5 2.1 Public procurement . . . 5

2.2 Chatbots and email response systems . . . 6

2.2.1 Natural-language understanding . . . 6

2.2.2 Chatbot applications . . . 6

2.2.3 Google Smart Reply . . . 7

2.3 Terminology and services . . . 8

2.3.1 Database indices . . . 9

2.3.2 Comparing SQL and NoSQL . . . 9

2.3.3 Elasticsearch . . . 9

2.3.4 HTTP and REST-based APIs . . . 9

2.3.5 Multipurpose Internet Mail Extensions . . . 10

2.3.6 JSON . . . 10

2.4 APIs . . . 10

2.4.1 Microservice architecture . . . 10

2.4.2 API best practices . . . 10

2.5 Cloud computing . . . 11

2.5.1 Serverless computing . . . 12

2.6 Amazon Web Services . . . 12

2.6.1 Virtual private cloud . . . 12

2.6.2 API Gateway . . . 13

(8)

2.7 Storage options . . . 13

2.7.1 Simple Storage Service . . . 13

2.7.2 Relational Database Service . . . 14

2.8 AWS services . . . 15

2.8.1 Simple Email Service . . . 15

2.8.2 Simple Queue Service . . . 16

2.8.3 Simple Notification Service . . . 17

2.8.4 Elastic load balancing . . . 17

2.8.5 CloudWatch events . . . 18

2.9 AWS computing . . . 18

2.9.1 Elastic Compute Cloud . . . 18

2.9.2 Lambda . . . 18

3 System design 20 3.1 System overview . . . 20

3.2 NLU bot . . . 22

3.2.1 Bot application . . . 22

3.2.2 Training the NLU bot . . . 22

3.2.3 MailDB . . . 23

3.3 Entry point: New procurements . . . 27

3.3.1 API usage . . . 28

3.3.2 Unique ID and email address . . . 29

3.4 Send Mail module . . . 29

3.5 Inbound email handling . . . 30

3.5.1 Attachments storage . . . 32

3.6 Mail analysis . . . 32

3.6.1 Templates . . . 32

3.6.2 Template mapper . . . 33

3.6.3 NLU bot connector . . . 34

3.6.4 Mail analysis system . . . 37

3.7 Template logic . . . 42

3.7.1 Scheduled mail . . . 42

3.7.2 No response handling . . . 43

3.7.3 Changed receiver . . . 44

3.7.4 Completed and cancelled cases . . . 44

3.7.5 Human intervention . . . 44

3.8 Service considerations . . . 44

3.8.1 Computing environments . . . 45

(9)

3.8.3 Programming languages and frameworks . . . 46

3.9 Tests of the system . . . 47

4 Discussion 48 4.1 Functional requirements retrospective . . . 48

4.2 Non-functional requirements retrospective . . . 50

4.2.1 NLU Training Environment . . . 50

4.2.2 Scalability Requirement . . . 51

4.2.3 Security . . . 52

4.3 Queue storage . . . 54

4.4 Ethical discussion . . . 54

4.4.1 Unique email addresses . . . 54

4.4.2 Responsibility of automatic email sending . . . 55

4.5 Sustainability . . . 55

4.5.1 Benefits of cloud computing . . . 55

4.5.2 Summary of tests of the system . . . 56

5 Conclusions 57 5.1 Future work . . . 58

5.1.1 NLU bot capabilities . . . 58

5.1.2 Improve template mapper . . . 58

5.1.3 Improve front-end . . . 59

5.1.4 Scale up system . . . 59

5.1.5 Work more with attachments . . . 59

Bibliography 61

(10)

(11)

Introduction

This thesis concerns the task of automating email communication for the pur-pose of reducing the need for human labour in recurring and non-cognitively demanding email request jobs. To do so, automation technologies combined with an advanced natural-language understanding (NLU) bot are explored and evaluated. An architecture of an entire automated system is constructed, the effectiveness as well as the limitations of such a system are discussed and eval-uated.

The practical work is carried out at the office of Tendium.

1.1 Motivation

Tendium is a data analytics company that identifies and qualifies business op-portunities within public procurement using proprietary technology for struc-tured data and artificial intelligence.

Public procurement is the procurement of goods, construction and services on behalf of a public authority, such as a government agency. This process is regulated to mandate transparency and fairness. As such, all documents are public and available for anyone to request.

When a request for tender (RFT) is issued by a government agency it is often published through various services like for example www.tendsign.com and readily available to download. Companies will then typically generate competing bids to obtain the contract. These bids are generally not published anywhere and must therefore be requested manually from the government agency in question. If a request is submitted, the agency is under legal obliga-tion to provide the bid informaobliga-tion unless restricted by some particular legal obstacle, such as confidentiality towards the company that generated it.

(12)

Requesting the bids associated with a particular procurement is usually done via email to some official at the government agency. The requests often involves some email correspondence concerning issues like finding the right person to send the request to and asserting the exact details of the request.

The interest of Tendium is to develop an application that makes email com-munication for requesting bids for public procurements more efficient. In order to automate this process and to efficiently be able to request large amounts of documents some intelligent agent must be able to carry out the email conver-sations.

1.1.1 Societal impact

Requesting public information has been limited by the large amount of human work that has been required, which now with technological advances within automated communication could be reduced. This can be seen in the area of tech support where a person needs to answer static questions about their case in order to minimize the human interaction required. For this to be realized within email communication, entirely new systems need to be built around these new technologies.

1.2 System requirements

This section defines the requirements and desired functionality of the system to be implemented in this thesis. The system is an end-to-end solution with the purpose of automating requests for public documents. It has two main components:

• Natural-language understanding (NLU) bot. • System that integrates the NLU bot.

For the NLU bot application the thesis covered one non-functional require-ment:

1. Provide an environment to easily maintain and add training data to the NLU bot.

The functional requirements for the integration system includes the following functions:

(13)

3. Utilizing unique dynamic email addresses to keep track of what request-thread each inbound email belongs to.

4. Storing email threads and attachments in a database. 5. Sending inbound email as input to the NLU bot module.

6. Choosing an outgoing mail based on the NLU bot classification. 7. Providing a user-interface making it easy to monitor the system and take

manual control if necessary.

8. Triggering a signal that calls for human intervention.

9. Closing the corresponding case if the objective is completed.

10. Resending emails at a later time based on email content, such as the receiver being on vacation.

11. Changing receiver based on email content, such as the receiver redirect-ing to another person.

This thesis seeks to design and implement a proof of concept version of the integrated system around the NLU bot that is scalable and secure. The system shall be designed to perform to the same standard independent of the amount of emails present in the system. Since the field of interest of the system is in emailing public authorities, the required capabilities for the system is depen-dent on the amount of initiated requests, which could be over 10 000 a day. Usually, email communication between people does not postulate immediate responses, compared to modern chat conversations. For the system to be inde-pendent of amount of email, it should be able to handle up to 1000 concurrent emails entering the system, which would simulate a scenario where 1000 peo-ple send email responses simultaneously. Security requirements for the system entails the standards involved in only allowing authenticated users to interact with a cloud hosted system.

These non-functional requirements of the system are:

12. The system should be able to handle receival of up to 1000 emails at the same time without impacting performance.

13. The system shall require authentication to be used to prevent malicious activity.

(14)

1.3 Research question

With the help of an advanced natural language processing bot, is it possible to develop an automated, efficient system for requesting public data (such as pub-lic documents from government agencies) via email, and collect the contents of the requested data and store it in a predictable way?

The hypothesis is that a proof of concept for the system, that lives up to the proposed requirements, can be developed, and that the system can run almost autonomously with limited human administration.

1.4 Contribution

This thesis seeks to deploy a fully integrated system around an NLU bot, using cloud computing technology, by the technical philosophy of module separa-tion and micro-services. The contribusepara-tion of the thesis includes pushing the boundaries of automation in communicative tasks as well as delineating an adequate architecture for such an automated system.

1.5 Limitations

Gavagai is a partner company to Tendium specializing in natural language pro-cessing and understanding. Tendium hires Gavagai to provide an NLU bot that can be used for email communication. As such, developing the actual bot is outside the scope of this thesis. For the thesis work, there are two responsibil-ities with the NLU bot: integration of the bot into the system and providing an environment to easily train the bot. Configuration of the bot will be the responsibility of Tendium. However, the author concedes that some configu-ration of the NLU bot will be performed within the scope of the thesis in the process of integrating the bot.

The process providing adequate tasks for the system, i.e. feeding it with procurement ID:s to be requested and the appropriate contact information for the government agency authoring the procurement, is outside of the scope of the system capabilities. The system is developed under the assumption that this information will be provided by a human agent.

(15)

Background

This section will cover background research and other areas of interest for the system design. Section 2.1 provides a background to the process of public procurement.

Section 2.2 gives a background to chatbots and natural-language under-standing concepts. It also covers previously done work on email response delivery and response generation. Furthermore, it covers how to build and integrate that functionality into existing systems.

After that, some terminology and services such as the open source search engine Elasticsearch are described in section 2.3 shortly followed by a section on application programming interfaces (APIs), discussing microservices and API best practices.

Section 2.5 focuses on Amazon Web Services (AWS), storage alternatives, different cloud computing architectures and AWS specific services.

2.1 Public procurement

The value of purchases covered in Sweden by Swedish procurement laws cor-responds approximately to a sixth of Sweden’s gross domestic product. In 2016, it was valued at 683 billion SEK. [1]

Tenders Electronic Daily (TED) is the online version of the "Supplement to the Official Journal" of the EU, dedicated to European public procurement. TED publishes 520 thousand procurement notices a year, including 210 thou-sand calls for tenders which add up to approximately e420 billion. TED pro-vides free access to public procurement notices from contracting authorities based in the European Union and in the European Economic Area. The no-tices are published almost daily and are available in TED, in HTML, PDF and

(16)

in XML format [2].

TED provides among the largest amount of data on European procure-ments. Every document available in TED has a unique reference number.

2.2 Chatbots and email response systems

This section covers background information on natural-language understand-ing and chatbots as well as some previous work on email response generation.

2.2.1 Natural-language understanding

For a machine to be able to make a valid response to an email, the content of the inbound email needs to be analysed. Natural-language understanding (NLU) is an area within artificial intelligence that deals with machine reading comprehension and is a subtopic of the natural-language processing field. The NLU processing takes place after initial text categorizations are made by other natural-language processing algorithms and the goal for the NLU algorithms is to map text to meaning; to understand the text [3].

2.2.2 Chatbot applications

A chatbot is an application that can simulate conversation with a user in nat-ural language. The user sends a message to the chatbot, the chatbot analyses it and creates a response. When working with creating and training chatbot applications, an important aspect is understanding the terminology on how to categorize user input. In this case, user input is the messages that are sent to the chatbot in natural language by users of the system. The purpose of a user’s input is called intent. An intent is defined for each type of user request the application needs to handle. Important features of client input are called enti-ties. Entities are terms of objects that provides a specific context for an intent. Intents can contain multiple different entities. Entities are defined by listing different synonyms or other possible values for each entity type [4].

To deploy a usable chatbot, it needs to be trained. That is, training data needs to be provided to the chatbot by an administrator in the form of labeled examples of common intents and entities. When using a deployed chatbot, classifications are generated from the user input in the form of classified intents and entities.

(17)

Figure 2.1: Shows an example intent containing a date entity.

The figure 2.1 shows an example intent of the user being on vacation contain-ing the entity date which describes the date the user will come back. Figure 2.2 show an intent containing two entities, which organization to contact as well as the connected email address.

Figure 2.2: Shows an example intent containing two entities.

2.2.3 Google Smart Reply

One widely used automatic reply system is Smart Reply which is featured in Gmail today [5].

(18)

response system was reaching adequate performance in response quality, util-ity, scalability and privacy.

Response quality concerns getting high quality individual responses in lan-guage and content. Utility concerns giving enough responses so at least one generated response is selected by the user. Scalability concerns remaining within the latency requirements of an email system while processing millions of messages daily. Privacy focuses on that the system should never inspect the data. The system architecture can be described by the following image:

Figure 2.3: Shows Smart Reply’s system design as seen in [5].

The figure 2.3 shows the system design for the Smart Reply system. The email is first preprocessed to be analysed to see if it should trigger a response or not. If a response should be triggered, the NLU module selects responses and suggests them to the user.

The authors discuss the triggering process of selecting which mails to be filtered out of the smart reply process. This filtering process in Smart Re-ply only lets through roughly 11% of messages which reduces the amount of useless suggestions for users significantly [5].

2.3 Terminology and services

This section covers used technologies and terminology that are not AWS spe-cific.

(19)

2.3.1 Database indices

A database index is an index which improves the speed of queries and oper-ations on the indexed data. However, each database index requires resources, both storage space and computing, for every update to the database in order to maintain each index. Indices should be used when it is known that a large amount of queries will be made on a certain field, since it would improve the retrieval time. The payoff however needs to be evaluated: faster queries versus requiring more resources on each database update [6].

2.3.2 Comparing SQL and NoSQL

Structured Query Language (SQL) is a domain-specific language used for managing relational databases. SQL tables have predefined schemas which means that the tables require a fixed table structure. Therefore SQL excels when working with structured data. Nested data requires multiple tables when working with SQL in order to allow querying on each field.

An alternative to SQL is NoSQL which allows the data to be stored in a tree structure or key-value based storage. NoSQL have dynamic schemas for unsorted data, which makes it possible to store nested data within a NoSQL document much easier than within a SQL table. Due to indexing differences SQL performs better on complex queries since it can utilize the indexing of the predefined table structure [7].

2.3.3 Elasticsearch

Elasticsearch is a scalable enterprise search engine developed in Java and is the most widely used enterprise search engine of today [8]. Working with Elasticsearch provides applications with an analytics engine for use cases such as log analytics and real-time application monitoring. Elasticsearch can be combined with database technologies to improve a systems search efficiency and give real time feedback on search results at a great capacity [9].

2.3.4 HTTP and REST-based APIs

Representational state transfer (REST) is a architectural API design style that defines constraints used when creating web service APIs. REST APIs are based on URIs and the HTTP protocol. A uniform resource identifier (URI) is a string of characters that identifies a particular resource, where the most common form of a URI is a URL, also known as a web address [10]. The

(20)

Hypertext Transfer Protocol (HTTP) is a stateless application-level protocol and is the foundation of data communication on the world wide web [11].

2.3.5 Multipurpose Internet Mail Extensions

The Multipurpose Internet Mail Extensions (MIME) is the standard format for data encoding for email communication over the Simple Mail Transfer Proto-col (SMTP). MIME-headers are used to give version and meta information about the data and is built up by multi-part message bodies [12].

2.3.6 JSON

JavaScript Object Notation (JSON) is a common language independent data format. JSON was originally created in JavaScript and uses human-readable text when transmitting data. Objects are marked with curly brackets and con-tain values of type string, number, Boolean, array, or nested objects [13].

2.4 APIs

An application programming interface (API) maps functionality to commu-nication protocol entry points to define and provide system usage. When de-signing a system of microservices, APIs define the endpoints of the system, how data is communicated between the microservices. The most common us-age of APIs is to handle interaction between different microservices within the system itself.

2.4.1 Microservice architecture

The microservice architecture is a design philosophy in software development, where each microservice usually expose a well-defined web API. Each service handles its own specific task, making them independent of each other. This means in turn that each service can be written in different programming lan-guages, use different storage methods and give a great opportunity for scala-bility [14].

2.4.2 API best practices

In the article "API Design Matter" the author Michi Henning discusses impor-tant aspects to focus on when working with APIs [15]:

(21)

• "An API must provide sufficient functionality for the caller to achieve

its task." Ensure that all relevant functionality exists, nothing is missed

that is relevant.

• "An API should be minimal, without imposing undue inconvenience on

the caller." Smaller is better, do not overwhelm the user and understand

the context it is going to be used in. Not choosing design decisions and leaving things up to be configurable is against the minimalistic concept. The API should be clear about what it should and should not do. • "APIs should be designed from the perspective of the caller." APIs

should be thought of as a user interface as much as a GUI is, declarative function and variable names.

• "APIs should be documented before they are implemented." This pre-vents missing vital use cases during implementation and improves doc-umentation quality. The docdoc-umentation designer can approach the prob-lem directly through system requirements to minimize risk of missing use cases while also helping the implementer with a clear requirement specification for the product. Henning states: “The worst person to write documentation is the implementer, and the worst time to write documen-tation is after implemendocumen-tation”.

2.5 Cloud computing

Cloud computing is the delivery of computing services over the internet, typ-ically charging only for the services that are used. Important benefits of cloud computing compared to using local servers is that it provides better scalability options and simpler manageability at a low cost [14].

The biggest cloud computing providers are Amazon with Amazon Web Services (AWS), Google with Google Cloud and Microsoft with Microsoft Azure.

Four popular cloud computing models are:

• Infrastructure as a Service (IaaS): IaaS provides virtualized computing resources over the internet. Servers, storage and other hardware tradi-tionally found in on-premise data centers are hosted and provided by the cloud provider. It can be seen as a virtual machine. Amazon Elastic Compute Cloud (EC2) is an example of an IaaS service [14].

(22)

• Platform as a Service (PaaS): PaaS provides a computing platform which tend to include operating system, programming language environments and databases. Example of this is Heroku or AWS Elastic Beanstalk [14].

• Software as a Service (SaaS): SaaS provides access to an already in-stalled application. No maintenance or coding is required, the software is usually accessed and operated directly from a web browser. Examples of SaaS systems are Netflix or Gmail [14].

• Functions as a Service (FaaS): FaaS platform are broken down into state-less chunks, which means that each function executes without any con-text of the underlying server structure. Examples of FaaS models are AWS Lambda or Microsoft Azure Functions [14].

2.5.1 Serverless computing

Serverless computing is described as a method of providing back-end services on a pay-per-use basis. Using serverless computing allows developers to de-ploy and write code without having to worry about the infrastructure running the code. Compared to server-based alternatives, serverless computing only charge for the computing that is used. To develop serverless systems the ap-plication architecture is broken down into small stateless independent parts.

Serverless fits well together with the microservice architecture since each task that the different microservices tackle only execute when necessary which gives a layer of separation between tasks [14].

2.6 Amazon Web Services

Amazon Web Services provides on-demand cloud computing platforms avail-able for anyone where you only pay for what is used. The growth of cloud computing is led by the availability of low-cost computers, high-capacity net-works and storage devices.

2.6.1 Virtual private cloud

A VPC is a virtual private cloud that is used to put services in a virtual network to better monitor and manage how the services interact with each other and the outside world in AWS.

(23)

A VPC contains public and private subnets which are ranges of IP ad-dresses within the VPC. Private subnets have limited access to improve secu-rity, which makes it an optimal place for services such as databases [16].

Public subnets should be used for resources that must be connected to the internet. It uses the VPCs Internet Gateway to connect to the internet [16].

2.6.2 API Gateway

AWS API Gateway is a fully managed service that helps developers to main-tain, monitor, create, publish and secure web APIs at any scale. The API Gateway handles all tasks involved in accepting and processing concurrent API calls of large sizes. This makes the API Gateway a common entry point to interact with back-end systems [14] and to handle communication between services.

2.7 Storage options

This section covers storage alternatives on AWS.

2.7.1 Simple Storage Service

S3 is the Simple Storage Service which is Amazon’s popular persistent storage framework. It provides a key-value pair storage system with an API that allows for simple ftp-style interactions:

• get(uri)

• put(uri, bytestream) • del(uri)

S3 is a valid option for storage when persistence is important and the amount of requests per minute are not immense. Since it does not require running active servers, it is also a cost-effective option.

The speed of S3 is lower compared to a locally attached disk drive. S3 has the property called eventual consistency, which means that S3 guarantees that all updates and changes persist and will eventually be visible to all clients, but the speed of it is not guaranteed. Storing the data in multiple data centres and working on a last-update-wins basis guarantees full read and write availability for S3 [17].

(24)

Each object in S3 is connected to a bucket, which is a container where objects are stored. Objects in S3 are simple byte containers which are identi-fied by bucket-unique URIs. Scans can be performed on a bucket to retrieve either all objects or only those objects which match a given prefix URI. S3 performs well when it comes to throughput, the amount of clients using S3 concurrently has no impact on performance for reading or writing data, which makes S3 superior to ordinary disks in this aspect.

The most common use case for S3 is long-term file storage. Another exam-ple use case described in the AWS white paper "Storage Services Overview" [18] for S3 is to speed access to relevant data. This is done through combin-ing S3 with a key-value database service. The actual information is stored in S3 while the database acts more as the metadata storage for the data in S3, e.g. object name, size, keywords and so on. This makes the metadata easily indexed and queried to make it very efficient to get S3 stored files [18].

2.7.2 Relational Database Service

The AWS Relational Database Service (RDS) is a service based option for SQL servers such as MySQL. The RDS instances can be reached from any AWS computing service. RDS gives multiple options when choosing service type. Amazon Aurora is Amazon’s own service solution which provides auto-matic scalability and more functionality on standard MySQL or PostgreSQL. It is also faster than standard MySQL or PostgreSQL but unsurprisingly costs more. RDS also provide standard MySQL and PostgreSQL which is less ex-pensive compared to Amazon Aurora. All the solutions require network over-head to configure which services can reach the database.

2.7.2.1 DynamoDB

The AWS DynamoDB service is Amazon’s own fully managed NoSQL database service which requires low overhead and is easily integrated with other AWS services [19]. It does not have the same amount of functionality as other database services but for simple storage solutions it is one of the best options due to the ease of use. DynamoDB works out of the box with other AWS services and require less configuration than other solutions, such as RDS.

One negative aspect of DynamoDB is the cost and configuration required for having multiple indices in one table, which is easily achievable in an RDS SQL table. Having multiple indices allows for efficient queries on different ta-ble attributes. Otherwise, queries can only utilize the primary index. Multiple

(25)

indices is a cost-ineffective solution in DynamoDB but can be circumvented by using multiple identical tables containing different primary indices to query the data. This, however, requires more maintenance for updating the tables.

A negative aspect of DynamoDB is the performance of the scan operation. Scans always fetch the entire database instead of fetching only the results that are queried, and then the filter is applied on the result. This gets very inefficient on large data sets [20]. To use DynamoDB efficiently on large data sets the usage of the scan operation should be minimized, and only queries based on the indexed fields should be used to prevent negative impact on the request rate.

2.7.2.2 DocumentDB

AWS DocumentDB is a NoSQL database service that has full MongoDB support. MongoDB is a general purpose, document-based, distributed data NoSQL database program. DocumentDB is a good option for developers that want to use AWS database services but still develop their software within the MongoDB framework. This is also interesting when working in NoSQL on AWS and the DynamoDB service does not provide enough functionality.

2.8 AWS services

This section contains some additional important AWS Services that were used in the system.

2.8.1 Simple Email Service

The Simple Email Service (SES) helps developers to send and receive email to verified domains and email addresses through AWS.

The chance of an email reaching its destination depends on multiple factors such as the content, list quality and infrastructure between the sender and target recipient.

A high-quality email, which is an email with high probability of reaching the inbox of the recipient, is defined in the AWS white paper "Email Send-ing Best Practices" [21]. The author of the paper states: "A high rate of hard bounces strongly indicates to email receivers that you do not know your re-cipients very well." An email bounce indicates that an attempted delivery has failed which is reported back to the sender. Amazon SES reroutes soft bounces automatically, where soft bounces are bounces that are blocked by temporary

(26)

failures such as the inbox of the recipient being full. Hard bounces are persis-tent delivery failures such as "recipient email does not exist" and when these occur the sender is notified.

The bounce rate refers to the rate of hard bounces occurring from the do-main which can have a negative impact on deliverability. A good objective is to keep the bounce rate below five percent to prove to ISPs that the recipients are known which highly raises the chances of the email reaching its target.

A negative complaint rate could have a bad influence on the success rate of the email reaching its receiver. The complaint rate is based on the amount of users marking the email as spam, where the ISP records it and might mark fu-ture emails from the same sender as spam in the fufu-ture. "A high complaint rate strongly indicates to email receivers that you are sending email that recipients do not want" [21].

Two best practices to follow to improve the chances of the email reaching its intended recipient are described below [21].

• Think carefully about the addresses used as "From" addresses since sender reputation is connected to those and that is what the recipients see.

• Authenticate the outgoing email, both with the sender policy framework (SPF) and DomainKeys Identified Mail (DKIM) which gives credibility to the sending domain and prevents spoofing "From" addresses [21].

2.8.2 Simple Queue Service

The Simple Queue Service (SQS) is a service supporting scalable amount of queues and capacity. It works with adding messages from the queue, where the messages contain a MessageBody and MessageAttributes, which then can be fetched from the service. The SQS queues are, in the same way as S3 objects, referenced by a URI and support interaction through an HTTP- or REST-based interface.

The SQS service supports the following methods: • createQueue(URI): Creates a new queue.

• send(URI): Sends a message to the queue identified by the URI param-eter. Returns the ID of the message in the queue.

• receive(URI, number-of-messages, timeout):

(27)

parameter sets a duration during which the requested messages cannot be requested from another client. This ensures that no messages are lost if a client crashes and prevents messages from being received more than once if two clients make requests at a similar time.

• delete(URI, msg-id): Deletes a message based on its ID from the queue. Typically, it is called after a client is done processing the message. • addGrant(URI, user): Allow another user to interact, i.e. send and

re-ceive, with messages in a queue.

Each call to SQS either returns a result, for example receive returns the mes-sage, or returns an acknowledgment if the action was successful or not, for example send, delete. SQS never loses active messages [17] but it deletes messages after 15 days in the queue.

There are two types of SQS queues, standard queues and First-In-First-Out (FIFO) queues. Standard queues allow higher throughput but does not guar-antee parsing the elements in the order the elements entered the system, it uses so called "best-effort ordering". All elements in the queue are guaranteed to be parsed, using "At-least-Once Delivery", but this also means that the elements can be parsed more than once [22].

FIFO queues still has high throughput of messages but does not provide the unlimited capacity that the standard queue supports. It provides guarantees for processing each element exactly once as well as guaranteeing that the elements are parsed in order of queue occurrence [22].

2.8.3 Simple Notification Service

The Simple Notification Service (SNS) gives developers a way of notifying users or developers of the system. The service coordinates delivery to sub-scribed endpoints, such as email, SMS or mobile push notifications [23]. SNS can also send messages to other AWS services such as SQS or Lambda and trigger webhooks.

2.8.4 Elastic load balancing

The AWS service Elastic Load Balancing support different ways to route HTTP/HTTPS traffic and spread out users to avoid overburdening resources. This is done by routing inbound connections to different available AWS re-sources or routing to new allocated rere-sources that are created on demand [24].

(28)

2.8.5 CloudWatch events

The CloudWatch events can be configured to be invoked at specific intervals that can be defined as every day, week or month for example. Configuration of the service is similar to the configuration of cron, the periodic job scheduler available in Unix systems. These CloudWatch events can be used to trigger other AWS services or to run specific functions.

2.9 AWS computing

This section compares computing options on AWS, defining both EC2 in-stances and Lambda functions.

2.9.1 Elastic Compute Cloud

EC2 stands for Elastic Compute Cloud and works as an infrastructure as a ser-vice. EC2 instances are on-demand computing instances in the form of virtual machines with varying cost based on their different hardware specifications. Using EC2 instances, you are charged for the time the service is running by the second. Common use cases for EC2 instances are for example hosting websites or general server applications.

Two common storage options for storage on EC2 are EC2 Instance Store and Elastic Block Store. Instance Store is a hard drive physically attached to the host computer where the data only persist during the life of the associated instance [25]. Elastic Block Store (EBS) provides block-level storage volumes that can be attached to running instances and can be treated as a local file system by the EC2 instance. This works as a persistent disk volume with a fixed size compared to Instance Store that do not provide persistent storage [26].

2.9.2 Lambda

AWS Lambda is Amazon’s option for a function as a service framework. Lambda is scalable, in the sense that hardware is provided when it is required. For most users AWS Lambda is inexpensive, since each AWS user get one million free run-time usages every month connected to the AWS free-tier plan [14].

The best supported languages in Lambda are Python and Node.js, but mul-tiple languages such as Go are getting increasing support as well as more

(29)

com-mon object-oriented languages such as Java and C# [27].

One example use case is serverless microservices, where the written code is simply uploaded and then Lambda takes care of scaling and running the code. Lambda is highly integrated with the API Gateway. It allows syn-chronous calls to be made to Lambda functions which enables fully serverless applications [28].

A Lambda function is only running when called by another service, which means that if it is not required, no resources are used. However, this also means that it requires startup time when being inactive for a longer period of time, more than two hours by default. This is not usually a problem but it can be problematic if the Lambda function is put in a virtual private cloud. Putting a Lambda function in a VPC increases the startup time from one second to over ten seconds. For an interactive front-end system this is crucial informa-tion. The rule of thumb is to put the Lambda function in the VPC only if it is required, and it is only required when the system needs to access resources within the VPC, such as database resources [29].

Lambda functions are invoked from other AWS services, where some ser-vices that work easily with Lambda are:

• API Gateway: Simply put, connects one or many endpoints in the API Gateway to the Lambda function, using the input data to the API as input to the Lambda function.

• CloudWatch events: Using CloudWatch events to trigger Lambda functions makes it easy to make reoccurring function calls, on a sched-uled basis.

• S3 events: Invokes Lambda through S3 bucket events such as creating, updating and removing a file.

• SQS: Lambda functions are capable of listening to SQS events, which invokes the Lambda function when for example new queue elements are added to the queue.

Standard queues support the functionality of using the queue as an event to a Lambda function, however this is not supported in FIFO queues [30].

These are only a few of the available options to trigger a Lambda function [31][32].

(30)

System design

This chapter focuses on the final system design of the email generation system. The chapter begins with an overview of the main system in section 3.1 followed by the NLU bot module in section 3.2 and then goes in-depth into each module.

3.1 System overview

This section describes an overview of the entire email response system as well as the NLU bot training and integration. A simplified overview of the system can be seen below:

Figure 3.1: A simplified system overview.

The system has two entry points, either when a new procurement case is added,

(31)

seen in figure 3.1 as New Procurements, through an API call or when an email reply enters the system, seen as Inbound Emails. All emails are entered into a queue to be analysed in the front-end system, seen as Mail Analysis. If the email is approved by the user, it is queued up for sending and is sent to the correct recipient, noted as Send Mail in figure 3.1.

The simplified overview does not contain information regarding database communication, template information, special email logic or where calcula-tions are made, it is used to show the flow of the system design.

Here is an overview of the system in AWS services:

Figure 3.2: A system overview in AWS services.

This AWS overview shows more details for each module through the services they communicate with and how they are connected. This section does not fully explain how each service is used, which is covered in each module’s sep-arate section. As seen in the figure 3.1, the two entry points to the system are

(32)

either through inbound email entering the system or a new procurement being added. The different modules communicate through the Mail Analysis Queue seen in figure 3.2 which has the role of fetching the data for the mail analysis system. The system updates database information and adds new elements to the Mail Send Queue which the Send Mail module utilizes to send the created emails to the correct receiver. The system was designed to support iterative improvements to the natural-language understanding module, to focus on im-proving and changing the NLU during the usage of the system.

3.2 NLU bot

This section covers the NLU bot module of the system, how the training pro-cess of the bot was approached as well as information on the database MailDB.

3.2.1 Bot application

The text analytics company Gavagai, partner of Tendium, is developing an ap-plication with the intent to create chatbots and NLUs for their clients, presented in section 2.2.2.

The input data to the NLU is the text to be analysed and the output is a clas-sification of the given input. The output clasclas-sification contains three arrays:

• tokens, which are the words given as input.

• intents, which are the contextual intention of each word, described more in detail in section 2.2.2.

• entities, which are the marked tokens containing object information, de-scribed more in detail in section 2.2.2.

3.2.2 Training the NLU bot

The bot is trained and tuned through reinforcement learning. This means that the capabilities of the bot are heavily dependent on adequate training data as well as prolific training data labeling.

Intent phrases might contain entities which contain additional information within the phrase. For example, here is the intent seen in figure 2.1 in section 2.2.2:

(33)

For the entity to be marked correctly, the date entity "5 September" is replaced with the entity name in capital letters in the intent phrase:

"I’m currently on vacation, I’ll be back on DATE"

By doing this, the NLU bot is given information on recognizing entities within intents. Otherwise, "5 September" is looked as plainly a string but in this way any potential date will be easily matched to the intent phrase.

To train the NLU bot, REST-based communication is required with the bot API to fill a module with data and train it. The intents and entities have been trained based on previously received emails when requesting bids over email communication. The amount of email data available are a couple of hundred emails. This data is also padded with additional intents and entities to improve the accuracy of the NLU by providing more test data.

3.2.3 MailDB

For the thesis work, the responsibilities regarding the bot application came to providing an environment to annotate training data as well as to integrate the model into the system. A MongoDB database was created and then later ported to AWS DocumentDB to store received and sent mail as well as training data for the NLU bot. MailDB uses Elasticsearch for more efficient indexing and fetching of the data. To improve searchability and help to get an overview of the data the Elastic Search tool Kibana is used to help create different views of the data. This database is referenced as MailDB. Figure 3.3 displays the AWS service setup discussed.

(34)

Figure 3.3: Image showing the AWS structure for the MailDB module. The API to interact with the server, the back-end of the MailDB system, was put on a load balanced EC2 instance, to ensure good availability, as seen in figure 3.3.

3.2.3.1 Mail storage

The motivation behind the NoSQL solution for the mail storage is based on the fact that emails tend to contain nested data, contain a large amount of optional fields, and the fact that most available Multipurpose Internet Mail Extensions (MIME) parsers, which is the format most emails are stored in, parse into JavaScript Object Notation (JSON). There are two key motivating factors to choose NoSQL over SQL for the MailDB database. One is that the key-value document design of NoSQL works natively with JSON. The other one is that the possible sparse data is better for a NoSQL solution, since an SQL table requires a schema entry for every field which goes unused for most entries when working with sparse data. All mails append meta information before they are stored such as case-ID to fetch the mail in the future as well as date information and additional procurement information if available.

3.2.3.2 Intent and entity storage

To make the task of adding new intents and entities more efficient and easy, the database MailDB was used as storage.

To provide an environment to easily be able to annotate the training data, a simple front-end was created to make API calls to the MailDB back-end

(35)

service. The goal of the front-end was to simplify adding new information and to provide an overview of the current state of the intents and entities.

The main functionality required for the front-end system was to be able to create new intent types, new intent phrases, new entity types as well as new entities. Figure 3.4 shows the design for the front-end system.

Figure 3.4: The start page of the MailDB front-end. It contains example input of a new intent phrase.

The front-end loads all existing intents and entities and provides simple forms to add new information. When a form is filled, the front-end creates a POST request with the information to MailDB. It also contains some basic knowledge of replacing existing entities with their entity type name when adding new intent phrases as seen in figure 3.5, which has replaced the example input in figure 3.4 of "Audi" with the entity type "Car".

(36)

Figure 3.5: Image showing confirmation page of front-end system, having re-placed the existing entity Audi with its type Car from figure 3.4.

(37)

3.3 Entry point: New procurements

This section covers an overview and description of entering a new procurement case into the system. It contains an API overview of how to use the module as well as definitions on which data and how the information is stored connected to the unique addresses.

Figure 3.6: Figure showing the starting of a new procurement case module in-depth.

The figure 3.6 shows an in-depth look into the logic of the module as well as the endpoints for the module with connected AWS services. The entry point to the module is through the API Gateway endpoint, noted in figure 3.6 as StartMailCaseAPI, which executes the Lambda function containing all func-tionality. The Lambda function creates a new case-ID from the input data

(38)

and checks if the ID is already active in the system, by checking the MySQL database entries, referenced as the RDS Active Mails table in the figure. If the ID is available, a new email address is generated for the case, based on index counters in a DynamoDB database. The initial mail template is then loaded and filled with the input data followed by the created mail being pushed to the Mail Analysis Queue.

3.3.1 API usage

The API entry point to start new procurement cases requires authentication and is setup within AWS API Gateway, with a POST request endpoint for REST-based connections.

Table 3.1: Table showing the API definition to enter new procurements. Required Fields Meaning

procurement-title Title of procurement

ref Reference Number

contractor-mail Contact Email to request from Optional Fields

ID Enforced unique ID

contractor-name Name of contact person for greeting purposes contractor Company Name

date Date of procurement

The fields shown in table 3.1 are used to fill the initial mail template as well as store metadata together with the case. The API has three required fields to start a new procurement case. The procurement-title is the name of the procurement and contractor-mail is the email address to request the bids from. Ref is the reference number of the procurement that defines a unique bid request in the system, but it is not globally unique so it cannot be used alone as a unique ID. The optional fields - contractor, date and contractor name - allows for adding metadata to a case to improve the stored information, but it is not required. The ID optional field allows for setting an ID to a case which is further discussed in the next section.

(39)

3.3.2 Unique ID and email address

Each procurement request gets assigned to a unique ID and a unique sender address. The email address is used to keep track of inbound emails and match each email to the correct case-ID. The unique IDs ensure that a case is not started more than once.

If a TED-ID, which is the unique ID for the procurement if the procure-ment is available in TED, is available it can be used as the enforced ID to the system, which means that the system will use this ID when referencing the case. Otherwise, a unique ID will be generated from the input data available if no unique ID is given.

To keep track of the active mail instances and unique IDs, an Relational Database Service (RDS) SQL table was used referenced as Active Mails in figure 3.6. This database table is indexed on both the unique ID and the unique email address to support efficient queries on both database fields, to fetch the information from the ID but also from the unique email address when new inbound email enter the system.

The mail addresses are locally generated and uses name combinations which are generated from the most common Swedish first and last names.

To achieve this, the most common names were first retrieved, an index counter was created for both first and last names in DynamoDB and was used to iterate through the names and fetch combinations, which leads to about 45 000 unique combinations. The counters reset when reaching the maximum index to reuse combinations. If the case is no longer active the unique email address is made available for re-use in a new procurement case.

3.4 Send Mail module

This section covers the Send Mail module which handles sending out created mails to the correct recipients. This is the module at the end of the system chain and it gets the input data from the mail analysis system described in section 3.6.

(40)

Figure 3.7: Figure showing the logic and endpoints of the Send Mail module.

This module has a Lambda function connected to an Simple Queue Service (SQS) trigger, which triggers every time new elements are added to the queue. Figure 3.7 shows the Lambda function receiving the queue element as input, which is the email to be sent, and sends the outgoing mail through AWS Simple Email Service (SES). For the mail to be sent, the from address of the mail needs to be part of a verified domain or a verified email address on SES. For the thesis, a specific domain was configured and used for all inbound and outgoing mail through AWS SES.

3.5 Inbound email handling

This module covers the logic of analysing inbound email to the system and store the connected attachments, referenced as the Inbound Email module.

(41)

Figure 3.8: Figure shows in-depth version of structure for the inbound mail module.

In figure 3.8 an overview of the logic and endpoints of the module is shown. New mails are entered to the module through inbound email entering the Sim-ple Email Service. SES stores the email in MIME format in the S3 bucket mail-storage as seen in figure 3.8. A Simple Notification Service topic is at-tached to the S3 bucket to trigger a Lambda function for every new inbound email entering the system. The Lambda function receives the meta informa-tion from the SNS notificainforma-tion and then loads the entire email file from the S3 bucket, seen as the entry point for the Lambda function in figure 3.8. It stores the email in the MailDB database, explained in section 3.2.3, and checks for the valid case-ID in the RDS database for this email receiver. Thereafter, the email content is sent to the next module called NLU bot connector to handle interactions with the NLU bot.

To guarantee that all email data is received in the Lambda function, the data needs to be stored in S3 and then loaded in the Lambda function. There exist functionality to trigger Lambda functions with SNS notifications directly

(42)

from SES. The problem with this solution however lies in the max size allowed for SNS messages, 256 KB, which supports maximum email sizes of 150 KB. Storing in S3 however supports handling emails with sizes up to 30 MB, which is an important difference when sending attachments are of interest [33].

3.5.1 Attachments storage

When an inbound email contains attachments, the attachments are stored under the case-ID. S3 is the most optimal place for long-term storage on AWS since it does not require running active servers, provides high availability and supports persistent storage in an inexpensive way. The files get stored separately and are easily indexed which makes them easily available for other services or people wanting to use the files in the future.

The email attachments are stored in an S3 bucket. The S3 bucket structure separates different cases and have one directory for each mail containing at-tachments. This is due to the fact that one bid contain many files, which makes it common to send one bid for each email when requesting multiple bids. The path syntax is as follows:

<s3-bucket-path>/<case-ID>/<mail-ID>/<attachment-name>

3.6 Mail analysis

This section covers the modules of the NLU bot connector, which is the inte-gration of the NLU bot, as well as the mail analysis system. It also presents how templates and template mapping decides responses based on classifica-tions from the NLU and explains how the NLU output is managed.

3.6.1 Templates

Entire mail responses have been written, each with their own ID and logic strings, as templates for the system. The IDs were used to map the meaning of the emails to templates and will be further discussed in section 3.6.2. The logic strings are used to connect specific mail logic to a template. This is the logical behaviour when reading email that lies outside of writing a new mail body. This is further discussed in section 3.7.

(43)

3.6.1.1 Categorizing NLU bot output

The bot application is integrated in the email response system to analyse the content of inbound email, using a trained NLU to interpret the content. A method was made to normalize the bot output to group similar emails con-taining the same intent types together. This was done to decrease the amount of different cases occurring in the reply selection process, matching emails with the same purposes together.

To normalize the bot output, the classified intents were first filtered on un-interesting intents. Examples of unun-interesting intents are greetings and signa-tures since they occur in most emails but have no impact on the meaning of the mail. Thereafter, duplicates were removed from the filtered intents since one intent occur for each word of the intent phrase. This can be seen in figure 3.9 which displays how the classification has tokens and intents matched together. In this example the intent "soundsGood" occurs on both the word sounds and good. The meaning of a sentence is not related to the amount of word occur-rences, both "That sounds good" or "Sounds good" convey the same meaning, which is why duplicates are removed. The remaining intents are then sorted to cover all scenarios where these combinations occur. This normalized the intent output and will further be referenced as the intent string of an NLU bot classification.

Figure 3.9: An example of a bot classification turned into an intent string.

3.6.2 Template mapper

This section covers how a response is chosen based on the generated classifi-cation from the NLU.

The template mapping was made very simple:

If an email being responded to has the intent string X and responds with template Y then every future email with this intent string X should use the same template Y.

(44)

Figure 3.10: Image illustrating matching between intent string and templates.

The figure 3.10 displays an example of how intent strings match templates, where multiple different intent strings can match the same template.

3.6.3 NLU bot connector

This section covers the NLU bot connector module which handles preprocess-ing of the email content and integratpreprocess-ing the NLU bot into the system.

(45)

Figure 3.11: Image showing more in-depth version of structure for integration of the NLU bot.

The email content is preprocessed and sent to the NLU bot as seen in figure 3.11 and runs within the same Lambda function as the previous module In-bound Email. When the NLU output is received, the system checks to see if a mapping exist for the intent string, which is the normalized classification of the purpose of the email presented in section 3.6.1.1. If a mapping exist be-tween the intent string for the email and a template, the matched template is used and added to the Mail Analysis Queue, referenced as scenario 2 in figure 3.11. If no match exist for this intent string, the email metadata is added to the queue to support choosing or creating a new template, referenced as scenario

(46)

3.

3.6.3.1 Preprocessing of email content

When preprocessing the data for transfer to the NLU bot, the email body re-quires filtering to only contain relevant information. This is done by parsing the email, getting the body and then filtering out previous email content. This content is the previous emails attached to each email to indicate reply chains for the email client, the information should be filtered out to prevent previous emails in an email chain to impact the classification of the new email. Only the new text in each mail should be analysed.

Listing 3.1 shows an example inbound email that is a response to an out-going email. The text field contain both the new email content as well as the email content it is responding to, seen in the field "text" in listing 3.1.

Listing 3.1: Parsed inbound email example. { " a t t a c h m e n t s " : [ ] , " h e a d e r s " : { } , " h e a d e r L i n e s " : [ ( . . . ) ] , " t e x t " : " Hi ! \ n I ’m on v a c a t i o n . ( . . . ) B e s t R e g a r d s P e t t e r Den o n s 2 o k t . 2019 k l 1 3 : 3 4 s k r e v L e n n a r t A n d e r s s o n < l e n n a r t . a n d e r s s o n @ ( . . . ) > : Hi ! \ n > Would you k i n d l y ( . . . ) " , ( . . . ) , " i n R e p l y T o " : " < ( . . . ) @eu−w e s t −1. a m a z o n s e s . com >" }

The filtered output after the data is processed contains only the new email body information and can be seen in listing 3.2.

Listing 3.2: Filtered body content. Hi !

I ’m on v a c a t i o n . ( . . . )

A regular expression solution was used for this to match the most common ways for email clients to add previous email content to a mail body. However, since email clients utilize different solutions when separating new and previous

(47)

content within an email, it does not cover every possible case. The approach is currently built by adding support for all known forms of email data stor-age but the most optimal solution would be to utilize an existing open-source framework, such as the solution Google uses in Gmail, to outsource gathering the different formats. The solution used for the thesis work utilizes the regular expression solution with capabilities of iterative improvements.

3.6.4 Mail analysis system

This section covers the mail analysis website that the administrator of the sys-tem interacts with, which contains front-end and back-end parts. Figure 3.12 shows the AWS structure of the mail analysis system.

Figure 3.12: Image showing the AWS structure for the mail analysis system.

An EC2 instance hosts both front-end and back-end of the system, which takes input data from the Mail Analysis Queue seen in figure 3.12 and sends output data in form of mails to be sent to the Send Mail Queue. Updates to active procurement cases are done to the RDS MySQL database while new outgo-ing mails are added to the MailDB database. The SNS services are further discussed in section 3.7.

(48)

3.6.4.1 Requiring human approval of outgoing mail

Content filters are widely used by ISPs to prevent malicious content. The mails to be sent in the thesis work are manually written templates automatically se-lected based on the output by the chat-bot. The templates should not be a problem with quality content.

However, if the system is mismanaged, ISPs can mark all outgoing mails from the used domain as spam which will greatly reduce the chances of the mail reaching its destination. To prevent this, the requirement for human ap-proval was made in the development and starting phase of the systems life-span. This will greatly reduce the chances of inaccurate mails going out or the system accidentally spamming authorities since the administrator of the system will take full responsibility for every outgoing email.

This makes the system not fully automatic but more of a computer-assisted service.

3.6.4.2 Mail analysis functionality

When a new mail enters the mail analysis system, it is in one out of three forms: 1. New procurement mail-case.

2. Template selected. 3. No template selected.

Emails in scenario 1 are mails added by the New Procurements module dis-cussed in section 3.3. These are the initial outgoing mails for new procurement cases being added to the system which are filled out starter templates with the meta information of the case. Scenario 2 and 3 are both added by the NLU bot connector module presented in section 3.6.3. The template selected scenario handles cases were a template mapping is found for the intent string gener-ated from the email content classification. Scenario 3 handles the case when no mapping is found for the classification. Figure 3.13 shows the logic of the mail analysis website receiving data in the different scenarios.

(49)

Figure 3.13: Image showing the first part of a more in-depth version of the structure for the mail analysis website. It is centered around fetching and cre-ating the mail to be analysed by the administrator.

Scenario 1 and 2 display the email to be sent, where the only input required by the user of the mail analysis system is to decide whether the mail should be sent or not. When a scenario 3 occurs, where no match is found, the user of the system is given three options, seen in figure 3.13:

(50)

• Write new template • Write unique response

The option to choose an existing template allows the administrator to choose from all known templates and to use that as the new mail. This creates a mapping between the intent string and the template. Writing a new template and selecting it for an outgoing email creates a new mapping in the template mapper in section 3.6.2. The reason for a unique response alternative is that sometimes the generic template might not be accurate enough and then an alternative should exist to circumvent the system to still give the user an option to respond normally to the mail.

All three scenarios lead to a mail being created, as seen in figure 3.13, by displaying an email to the user in every path of the module.

(51)

Figure 3.14: Image showing the second part of a more in-depth version of the structure for the mail analysis website. It is centered around approving the mail and updating all necessary values.

While choosing between sending or not, the user also has the alternative not to respond all together. If the template does not fit as a response, either through bad matching, wrong choice or perhaps containing a typo, the user can always change his or her mind by choosing a different template or by writing a manual response instead of the current one. This is seen in figure 3.14 which depicts the decision not to choose mail. When the mail is approved for sending, mul-tiple functions are invoked: it is stored in MailDB, the template mapper is updated if a new occurrence between intent combination and template ID is used, new template is stored if it is created and if it is a new procurement case, the email is connected to the case-ID so all future emails received on the

(52)

con-nected email address are sorted correctly to this procurement case. Thereafter, the email is added to the Mail Send Queue for sending, seen in figure 3.14.

A guide to the front-end system with use case examples can be found in appendix A.

3.7 Template logic

This section covers the supported template logic of the system. Each template has the option to contain specific functionality that shall be executed when that template is chosen for an outgoing email. The functionality that is covered in the thesis work is based on the system requirements and includes: handling sending mails at a later time, scheduling mails, to change receiver, realizing when mails are done or cannot be completed and also providing logic to force human intervention as well as the functionality to automatically schedule a reply when no response is received.

3.7.1 Scheduled mail

When the scheduled mail logic is used, the NLU output is scanned for date entities and if a date entity is found, the email is scheduled for sending on that date. The email is then scheduled on a weekday three days after this date. This was chosen to prevent the system from mailing people on the exact day they return and also giving the receivers a chance to respond themselves.

Use cases for this are for example if someone is on vacation and will be back at a certain date:

"I’m currently on vacation, I’ll be back on 5 September"