Bachelor Thesis
HALMSTAD
Computer Science and Engineering, 300 credits
Intelligent chatbot assistant: A study of integration with VOIP and Artificial Intelligence
Bachelor Thesis in Computer Science and Engineering, 15 credits
Halmstad, 2020-05-29
Erik Wärmegård
This page has been intentionally left blank
Foreword & Acknowledgements
In the dawning of this thesis, I want to seize the moment to begin expressing my sincerest gratitude to some remarkable individuals who have made this work possible. To Kevin Hernández Diaz and Fer- nando Alonso-Fernandez, my two supervisors. Yes, Kevin, you read that correctly, not an assistant.
You’ve shown your quality as a supervisor - the very highest. Thank you for your continuous support throughout the thesis, providing comfort and open-minded ideas on how to evolve the project.
Viktor (Starbirch) and Ludvig (Narratory), a warm thank you for your coaching and collaboration throughout the work. I wish you both good fortune on the progression of this start-up just initiated, and future projects to come.
Lastly, but certainly not the least. To my dear friends and close collaborators, Johannes and Linus.
Thank you for the admirable teamwork and support, which existence was proven every step of the way.
It seems that there is absolutely nothing that can deteriorate your striking efficiency. Through hardship, confusion, and perseverance - we prevailed.
Erik Wärmegård , Halmstad, May 29, 2020
This page has been intentionally left blank
Abstract
Development and research on Artificial Intelligence have increased during recent years, and the field of medicine is not excluded as a target audience for this top modern technology. Despite new research and tools in favor of medical care, the staff is still under heavy workloads. The goal of this thesis is to analyze and propose the possibility of a chatbot that aims to ease the pressure on the medical staff.
To provide a guarantee that patients are being monitored. With Artificial Intelligence, VOIP, Natural
Language Processing, and web development, this chatbot can communicate with a patient, which will
act as an assistant tool that conducts preparatory work for the medical staff. The system of the chatbot
is integrated through a web application where the administrator can initiate call and store clients onto
the database. To ascertain that the system operates in real-time, several tests have been carried out to
tests concerning the latency between subsystems and the quality of service.
This page has been intentionally left blank
Sammanfattning
I utvecklingen av intelligenta system har sjukvården etablerat sig som en stor målgrupp. Trots avancer- ade tekniker så är sjukvården fortfarande under tung belastning. Målet för detta examensarbete är att undersöka möjligheten av en chatbot vars syfte är att lätta på arbetsbelastningen hos sjukvårdsper- sonalen och samtidigt erbjuda en garanti för att patienter får den tillsyn och återkoppling de behöver.
Med hjälp av Artificiell Intelligens, VOIP, Natural Language Processing och webbutveckling kan denna
chatbot kommunicera med patienten. Chatboten agerar som ett assisterande verktyg som står för ett
förarbete i beslutstagandet för sjukvårdspersonal. Ett systemsom inte bara ger praktisk nytta utan också
ett främjande av den utveckling som Artificiell Intelligens gör inom sjukvården. Systemet administreras
genom en hemsida som kopplar samman de flera olika komponenterna. Här kan en administratör initiera
samtal och spara klienter som ska ringas till databasen. För att kunna fastställa att systemet opererar i
realtid har görs flertalet prestandatester avseende både tidsfördröjningar och samtalskvalité.
This page has been intentionally left blank
Contents
1 Introduction 9
1.1 Intelligent call-up process . . . 10
1.2 Goal, Purpose & Requirements . . . 10
1.3 Structure of the thesis . . . 11
2 Background 13 2.1 Artificial Intelligence and Machine Learning . . . 13
2.2 Natural Language Processing . . . 13
2.3 Cloud Computing . . . 14
2.4 Data Communication . . . 15
2.5 IP Telephony . . . 17
2.6 PSTN and VOIP . . . 18
2.7 Data Storage . . . 19
2.8 Object-Oriented Programming & Application Programming Interface . . . 19
2.9 Web Development . . . 20
3 Current Technologies 21 3.1 Google Duplex . . . 21
3.2 Twilio . . . 21
3.3 Restcomm . . . 21
3.4 Sinch . . . 21
3.5 Voximplant . . . 22
3.6 Dialogflow . . . 22
3.7 Narratory . . . 22
3.8 Google Cloud Platform . . . 22
3.9 Amazon Web Services . . . 22
4 Method 23 4.1 Choice of database structure . . . 23
4.1.1 NoSQL vs RDBMS . . . 25
4.1.2 Provider comparison . . . 25
4.2 Choice of VOIP provider . . . 28
4.2.1 Security . . . 28
4.2.2 Provider Comparison . . . 28
4.3 Web Prototype . . . 29
4.4 Related works . . . 30
5 Result 31 5.1 Database Integration . . . 31
5.2 VOIP Integration . . . 32
5.3 Database structure . . . 34
5.4 Prototype testing . . . 35
6 Discussion 37 6.1 Method . . . 37
6.2 Result . . . 37
6.3 Comparisons to related work . . . 38
6.4 Goal & requirements comparison . . . 39
6.5 Social Requirements . . . 39
7 Conclusions 41
A
Appendix 47
A.1 Database snapshot for authentication and mapping . . . 47
A.2 Adding a client to the user currently logged in . . . 48
A.3 Adding the client-content to the page-component . . . 49
A.4 API-request to VOIP-provider . . . 49
A.5 VOIP-provider initiates a phone call to the end user . . . 50
A.6 Raw data from performance analysis . . . 50
A.7 Dialogfow agent created within Voximplant-environment . . . 51
A.8 Live processing of input in the phone call . . . 52
A.9 Dialog log . . . 53
1 Introduction
Despite that the medical field is under constant improvement through research and new advanced tech- nologies, many patients have to undergo a long waiting-process to receive the care they need. Myndigheten för vård- och omsorgsanalys, the Swedish authority for health and care analysis reveals severe deterio- ration in the availability of medical care during the last three years [1]. That type of care includes availability by telephone, new doctor appointments both within primary and specialized care, along with treatment guarantee like surgery. Swedish health care is under vast amounts of stress.
The lack of medical staff in several medical professions is another ongoing problem. Almost half of all 21 professions, at least ten report a scarcity of staff [2]. Some factors include an increased in the amount of people suffering from chronic illnesses or complex diseases along with an aging population, which leads to an increased demand for medical staff.
Ultimately, it is the patients who will suffer the most. This thesis takes on the mission of trying to reduce the pressure on Swedish health care, more concretely the burden on the medical staff, with the tools of AI. Allowing an autonomous system to do some of the currently manual labor done by doctors and nurses in the communication with the patient.
Development and research of the top modern technologies on Artificial Intelligence (AI) have increased over the years, and the field of medicine is no exception. AI has been applied for medical purposes ever since the 1950s, when improvements of diagnosis were attempted with the assistant of computers. Today, enhanced sustainability and more efficient computing power alongside the vast amounts of digital data have made the medical AI application increase over the recent years [3].
Even in the medical literature, AI applications have made improvements in enlightening medical professionals in diagnosing and therapeutic accuracy, but also the overall clinical treatment process. AI can also assist doctors and medical professionals in general improvements of health information systems, geocoding health data, tracking of epidemics, but also predictive modeling and decision support. In some cases, AI can supply real-time updates of medical information from several sources like journals, books, and patient data. Thus, being able to predict specific health outcomes.
In general, AI has helped to monitor certain diseases, and one example of this is cancer detection, which has benefited by this technology. By collecting massive amounts of data, it is possible to discover and identify patterns and relationships within the data, which is effective in predicting cancer occurrence probabilities, even before the symptom occurs. The accuracy in detecting cancer and predicting its outcome has significantly improved by 15-20% [4] in the latest years thanks to the applications of AI and Machine Learning.
This thesis was done in collaboration with Linus Lerjebo and Johannes Hägglund[5]. A group that
has undertaken the task of creating a prototype of a web application. This project was created by Viktor
Björk, entrepreneur, founder of Starbirch AB and owner of this project which has its roots in Artificial
Intelligence. With the use of intelligent systems, develop a system to be implemented in the medical field
to relieve the workload.
1.1 Intelligent call-up process
This thesis revolves around the idea of an intelligent call-up process, which will relieve the medical staff.
Since the idea is in such an early stage, this prototype will explore the possibility to connect Artificial Intelligence tools with VOIP and other cloud services to create an intelligent system that will work as an assistant, made possible to administrate through a web application. This web application serves as the organizing tool for managing all the calls, clients, and analytics for the Swedish health care. The entire work has been split into two Bachelor thesis projects. This thesis will focus on Data storage, VOIP, and the overall integration of all components for the system. In contrast, the other thesis will dive deep into the services of Speech Synthesis, Natural Language Processing, and complex data analysis. Both works will try to answer the questions of which provider offer the most suitable product for the systems needs. This system, concerning both works, consists of several intelligent subsystems and sub-processes, explained as following steps:
1. Call an individual with Voice of Internet Protocol: Create a prompt that can contact an individual from the target audience.
2. Communicate with a synthetic voice: Using the tools of Natural Language Processing, having the AI communicate with an authentic and human-like voice.
3. Ask questions with the mentioned AI and listen for answers: Create a dialog between the AI and the called individual.
4. Collect and store answers which are to be analyzed: With a large set of data from the individuals, some conclusions and important discoveries regarding the individual’s health can be made.
1.2 Goal, Purpose & Requirements
The goal of this thesis is to ascertain the creation of an autonomous call-up process which will consist of many independent, intelligent subsystems and let them co-operate alongside each other. At the end of the thesis a prototype should have been created which will be exposed to tests and later on deployed for live usage. This prototype will serve as the foundation for the company to proceed working on and acting as the concrete example to see if the idea holds its ground in terms of relevance and functionality.
See the requirements below for more details of the thesis specifications. Thus, one primary task is sub- stantial research and creation of a solid ground for future work, the secondary task is the actual practical development of the product.
The purpose of this system is to, in a more long-term sense, being able to relieve the medical staff and resources of already existing heavy work pressure. Although the Swedish authorities are the primary target group, this product should hold the possibility of expanding its context beyond the borders of Sweden, being available to any company which might find the process useful. The ambition of this process is to create security for the patient, and also promote the process of which the concepts of Artificial Intelligence and Machine Learning are making in the medical field. Requirements and specifications of the thesis are stated below.
• Being able to answer the question how to interweave several independent techniques and intelligent
Machine Learning API:s to create a unified system. This should resolve in a prototype, at least
being able to test the core functionality of making a call, ask a question and fetch the answer if
completion of the application isn’t doable within the context of the thesis.
1.3 Structure of the thesis
Section 2 and 3 provide the introductory knowledge and background, which is required to understand better the technologies used in the thesis. These sections also cover current technologies and providers.
Section 4 include the methodology of the work, which present investigations, research, and choices. After
that, the results are presented in Section 5 with visual presentations that explain the system’s different
components. The Appendix contains detailed explanations of the integration as code fragments from
the source code. Finally, The sections 6 and 7 contribute a view of the work just been made, strengths,
weaknesses and what possibilities that remain to be explored.
This page has been intentionally left blank
2 Background
This section follows some explanation of the knowledge required for understanding the report. The term AI will act as a broad outline of many different techniques dedicated to solving specific tasks. Some of these techniques which are relevant for this thesis will be enlightened in this chapter, along with Cloud Computing, Data Storage and what building blocks are needed for a web application prototype.
2.1 Artificial Intelligence and Machine Learning
The precise definition of Artificial Intelligence (AI) and its meaning has been and is still a subject of discussion. Due to its rapid development, the proposed definitions of AI has changed over time. A more recent definition [6] describes AI as “imitating intelligent human behavior”. Although, instead of looking narrowly at one definition, AI can be classified into four categories; systems that think like humans, systems that act like humans, systems that reason and systems that act rationally.
A more formal definition of Artificial intelligence was established in 1997 [7] as the collection of computations that make it possible to assist users to perceive, reason, and act. These functions are accomplished by computational devices and include at a minimum; “representations of ’reality,’ cognition and information, along with associated methods of representation”. This representation could be of vision or language, which in the context of this thesis is quite relevant since speech synthesis and speech recognition will be wildly used. AI could also include robotics, virtual reality, and Machine Learning.
Machine learning provides automated methods of data analysis. A more formal definition [8] of this usage is that machine learning is “...a set of methods that can automatically detect patterns in data, and then use the uncovered patterns to predict future data.” Machine learning can also perform various kinds of decision making as the concept of big data is getting more relevant.
2.2 Natural Language Processing
Natural Language Processing (NLP) is a subfield in both computer science and linguistics. NLP deals with computer applications where the input is natural language and is processed and tagged to the part of speech of words [9].
NLP consists of four standard tasks, which all serve the purpose of dissecting natural language into its components. Part-Of-Speech Tagging. It is a task that labels each word with a unique tag that indicates its syntactic role (plural, noun, adverb). Chunking. The second task which aims at labeling segments of a sentence with what is a noun or verb phrases. Named Entity Recognition. Labels are atomic elements in a sentence. These elements are categories that could be "PERSON" or "LOCATION." Semantic Role Labeling. Tag words by giving them a grammatical role in the sentence. This could be as assigning tasks to a word along with the voice of the sentence(active or passive), headword, etc.
NLP is often decomposed [10] into different stages. These different stages serve a certain purpose of analysis of the input text.
Text pre-processing is one of the stages and is the task of converting the raw text file into a well- defined sequence of linguistically meaningful units, such as graphemes, words, and sentences. This stage is the foundation of the work of all further processing stages. This includes making all character in the file machine-readable, along with character encoding identification, language identification, which determines the natural language for the document. Tokenization is part of text pre-processing and is the process of text and sentence segmentation, which converts the text into its component words and sentences.
Lexical analysis is about the techniques which perform analysis of the words in a sentence. This process can be quite complicated since a word can take on different meanings due to its context. Thus, the words morphological variants are to be related to its lemma.
Syntactic parsing is grammar-driven parsing of the text. This stage has the task of determining a string of words structural description.
Semantic analysis is the process of making the computer understand the utterance of the text, which
is given. This includes information retrieval, information extraction, text summarization, data-mining,
and machine translation.
Natural Language generation, the final step is quite similar to the same process, which is made by humans to render a thought into spoken language. Although the protagonist, in this case, is the computer program. This process often takes shape into three parts; “(1) Identifying the goals of the utterance, (2) planning how the goals may be achieved by evaluating the situation and available communicative resources, and (3) realizing the plans as a text.”
2.3 Cloud Computing
Cloud computing can be defined as [11] “a set of network-enabled services, providing scalable, QoS guar- anteed, normally personalized, inexpensive computing platforms on demand...”. Cloud computing is the use of shared computing resources, which are grouped in large amounts and offer their combined capacity on an on-demand, par-per-cycle basis. This relatively new and very much trending concept is a paradigm shift - to choose cloud services instead of having local servers internal for the company to handle their applications.
The technology and machinery behind these cloud computing infrastructures are often abstracted from the user, thus shifting the focus of the actual usage of the service. These services offer scalable and easy-to-access availability through the internet. These cloud services are usually defined as having an abstraction between the resource and its underlying technical architecture. Cloud services are defined as having these following essential characteristics; on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service.
The benefits of this new concept are flooding over the brim. Still, to name a few advantages, Cloud computing exceeds resource-saving, both in the economic sense and in information storage. Cloud tech- nologies are paid incrementally, and it’s no longer required for IT personnel to manage the software since its handle by the cloud owner. The storage of data and information is scalable and optimized for the use of the customer, which provides optimal storage.
Cloud computing is often divided into three levels of services which support virtualization and man- agement of differing levels of the solution stack.
Software as a Service (SaaS) is the idea that someone can offer a hosted set of software that isn’t owned by the customer. No programming or development is needed, the user only purchase the software which is required and configure it to the company’s need.
Platform as a Service (PaaS), similar to SaaS, provides hardware and a certain amount of software such as databases as a foundation where the customer can build its application. This service diminish both the cost and the complexity of cost and management of underlying infrastructure - making all requirements available from the internet.
Infrastructure as a Service (IaaS) is the delivery of both hardware, such as a server, storage, and
network, along with its associated software (operating systems, file systems), as a complete service. IaaS
provides slight management other than keeping the data center operational, so the users must deploy and
manage the software services themselves as if it would be their own data center. Amazon Web Services
is an example of such an IaaS-service offering.
2.4 Data Communication
Figure 1: OSI 7-layer Model. A visual presentation of the different areas the network communication can be divided into. The data traverses through the different layers one by one, carried and managed by different protocols along the way from Application to Application, all the way down to the first Physical layer.
Communication between devices when transmitting data to and from any arbitrary end-point can be described by the Open System Interconnection Reference Model, or OSI Model for short, which is visualized in Figure 1. This generic model defines how applications communicate with each other, applicable for all network types. Each layer describes certain characteristics of the network communication and is helpful in troubleshooting and to simplify the workflow both with other network technicians as for individual work, to structure and pinpoint certain parts of the communication [12].
1. Physical is the first layer, and it defines the behaviour and control of the electrical aspects of physical and electrical components of data communication, e.g. physical cards or sockets.
2. Data link, the second layer, defines the access strategy for sharing the physical medium and provides bridges between several networks.
3. Network layer establish, maintain and organize network of devices.
4. Transport layer is responsible for the data reliability of the communication along with the integrity of the data. By packaging data streams into packets and forward them towards any of the upper or lower layers. This layer consists primarily of two protocols, UDP and TCP, which have different approaches to how the data should be transmitted. Consider a scale where quality is weighed against speed. The Transport layer can also implement other data stream controls and flow control to satisfy the needs of the transmission for the system [13]. As far as this thesis is concerned, some external controls are to be implemented to meet the needs of the VOIP-functionality, see the section 2.6 for additional details.
5. Session, the fifth layer, provides entities the two end-points can use to exchange data with each other. This layer is concerned with the organization of data flows.
6. Presentation layer is one of the more high-end layers, and this is where the data is either packed or unpacked depending on its direction in the communication flow. The Presentation layer is also the layer that covers the encryption/decryption, protocol conversions, and graphic expansions.
7. Application, the final layer, is where the end-user and end-application protocols are located. These
are the high-level functions of programs that may use the network as a means of communication [13], and
this covers web application, user interfaces, and primary functions.
There are some ways of actually receiving a measurement of the quality of data communication. The quality of communication can be dissected into many components. It could be the robustness of the transmission or whether or not the data is correct. With the help of mathematical formulas, one can not only discover errors but also correct them in the transmission all down on the bit-level.
The Bit Error Rate, BER, is a measure of the quality of a certain transmitting device, the transmission path, and its environment, which are exposed to external factors that can affect the communication, such as noise and jitter. Jitter is the variation in delay of packet delivery [14]. The rate which the data is flowing can be measured with an oscilloscope and computes the frequency of bits transmitting as the following:
1/t = f
Where t is the bit time interval and f is the bit frequency. BER is the ratio of the number of bits that are faulty in a given number of bits in a transmission.
b
e/b
t= BER
Where b
eis the number of error-bits and b
tis the total amount of bits that are transferred. This process is usually made by applying a pseudorandom bit stream to an interface an counting the bit errors and comparing the transmitting and the receiving data [15].
Hamming distance is another way of measuring how much the error-bits have affected the bit stream.
The Hamming distance is the number of positions in a bit stream which differ. In terms of vectors, two vectors ~x and ~x are compared, and the Hamming distance between these two vectors is denoted by d
H, which is the number of positions that x and y are different [16]. These vectors represent the transmitting value ~x and the value ~y which is received by the user.
~
x = (x
1, x
2, ..., x
n)
~
y = (y
1, y
2, ..., y
n) Where the Hamming distance between the two vectors are:
d
H(~ x, ~ y) =
n
X
i=1