Master’s Degree Project A concept of an intent-based contextual chat-bot with capabilities for continual learning

(1)

Author: Maksym Strutynskyi

Supervisor: Dr. Kostiantyn Kucher

Semester: Spring 2020

Subject: Computer Science

Master’s Degree Project

A concept of an intent-based

contextual chat-bot with capabilities

for continual learning

(2)

Abstract

Chat-bots are computer programs designed to conduct textual or audible conversations with a single user. The job of a chat-bot is to be able to find the best response for any request the user issues. The best response is considered to answer the question and contain relevant information while following grammatical and lexical rules.

Modern chat-bots often have trouble accomplishing all these tasks. State-of-the-art approaches, such as deep learning, and large datasets help chat-bots tackle this problem better. While there is a number of different approaches that can be applied for different kind of bots, datasets of suitable size are not always available. In this work, we introduce and evaluate a method of expanding the size of datasets. This will allow chat-bots, in combination with a good learning algorithm, to achieve higher precision while handling their tasks. The expansion method uses the continual learning approach that allows the bot to expand its own dataset while holding conversations with its users. In this work we test continual learning with IBM Watson Assistant chat-bot as well as a custom case study chat-bot implementation. We conduct the testing using a smaller and a larger datasets to find out if continual learning stays effective as the dataset size increases. The results show that the more conversations the chat-bot holds, the better it gets at guessing the intent of the user. They also show that continual learning works well for larger and smaller datasets, but the effect depends on the specifics of the chat-bot implementation. While continual learning makes good results better, it also turns bad results into worse ones, thus the chat-bot should be manually calibrated should the precision of the original results, measured before the expansion, decrease.

Keywords: Machine learning, intent based, chat-bot, dialogue systems, rule based, Python, TensorFlow, TFLearn, continual learning, online learning, su- pervised learning, unsupervised learning, IBM Watson, Watson Assistant

(3)

Preface

Doing this project gave me an opportunity to obtain more knowledge about the current state of chat-bots in the area of Machine Learning and AI. In the process I learned to use various tools to develop classifiers used in chat-bots and how they work. I also learned how to use tools for writing and organising large documents with references. I want to give thanks to my supervisor Dr. Kostiantyn Kucher for providing me with useful insights throughout the project and helping with organizing my report using LaTeX and BibTeX.

(4)

1 Introduction

One of the technologies that have been making the most noise in the world of technology is Artificial Intelligence [1]. AI ecosystem is vast with only a small part of it being explored today [2]. Large businesses have shown interest in applying AI in different parts of their infrastructure in order to increase the profit [3]. When they start to explore their pos- sibilities with AI, it becomes clear that it has a number of different applications [4]. From this comes the understanding that not every area of AI is equally developed. While some areas have a number of polished tools that can be directly applied to business, other areas are not completely explored and the best-developed tools turn out to be prototypes [5].

Today large companies choose to invest into the development of some area of AI that poses interest for their business [6]. They invest into research, software development and analysis trying to be the first to develop a solution that will give them an advantage over competitors. One such tool which has been adopted in business for some time already is a variety of multi-purpose chat-bots [7]. A chat-bot is a piece of software powered by AI that can hold conversations with humans using natural language. Chat-bots can help customers find the right services and products as well as answer their questions. Doing so, they aim to replace human personnel that is currently doing this job. While chat-bots are unable to hold a truly natural conversation, they perform well at being consistent as they will only do what they are programmed to do [8]. Chat-bots cannot replace humans in sales completely [9], but they can take over answering repetitive questions and taking simple orders. For cases when advanced support is needed, there is often a “Call a human agent” button present. With this short overview done, it is safe to say that since their first appearance, chat-bots are bringing in revolutionary changes [10] to the world of business and will keep doing so.

1.1 Background

Voice assistants such as Apple SIRI, Microsoft Cortana, Amazon Alexa, and Google As- sistant are widely spread in today’s world [11]. These systems are continually developing thanks to breakthroughs in speech recognition and text-to-speech technologies. A ma- jor role in this development plays new research in the area of deep learning [12], gains in computing power of GPUs [13] and releases of ever more powerful mobile devices.

Since the inception of machine learning, with increasing amount of various datasets and new algorithms, the accuracy of speech recognizers [14] and speech synthesizers [15] became good enough to be deployed in real-world customer products. At the same time, there has not been equal progress in the area of natural language understanding (NLU) and generation (NLG). The NLU and NLG components of dialogue bot systems starting from the early research in 1966 [16] to the present commercial voice assistants mostly rely on rule-based systems rather than a system that can “understand”. The NLU and NLG systems are often developed for very narrow cases [17, 18]. General understanding of naturally-spoken language across multiple dialogue steps, even in single task-oriented situations still cannot be achieved.

Most of the products using NLU, such as chat-bots, have constraints on what users can say, how the system responds and the order in which the various subtasks can be completed. They are very precise for specific inputs, but they do not cover all possible valid inputs that user can provide. Such systems are not only unscalable, but also lack flexibility and cannot have a truly natural conversation with the user. The purpose of chat-bots is processing user requests to provide a desired response. Without flexibility and ability to hold a natural conversation they fail in their purpose. This is not surprising

(7)

as natural language has been changing and developing for centuries. The texts are often extremely context-dependent and ambiguous, especially in multi-person conversations across multiple topics.

Enabling an automated system such as chat-bot to hold a steady task-based conversation with a human remains one of computer science’s most complex unsolved problems [16]. In contrast to more traditional NLP efforts, interest in statistical approaches to dialogue understanding and generation aided by machine learning has grown considerably in the past years [19–21]. However, the lack of high quality dialogue data is hindering the progress in this area [22].

There is a category of task-oriented chat-bots that are good at solving conversational tasks in areas of booking and delivery. They are called intent-based chat-bots and they operate by detecting the intent of the user for each request and providing a response based on the intent. Some of them also make use of the contextual information during the conversation and define contextual intents to provide a better response to the user [23].

These chat-bots are made to hold one-on-one conversations with a single user at a time to fulfil one or more tasks during the conversation. Although this category of chat-bots can avoid some of the previously mentioned problems, their quality still depends on the dataset used to train them [24].

1.2 Related work

There have been several attempts at solving the problem of expanding datasets. Some authors suggest using various data augmentation techniques [25–27]. For text datasets, for instance, one could add new sentences with synonyms or change the order of words to create new entries for the dataset. There is also some research for other ways of solving the dataset deficiency problem. One such way is using active learning [28] techniques to choose the data for classifier to learn from. Another method is using crowdsourcing [29].

In case with the chat-bot, it is exposed to the “crowd”. The “crowd” interacts with the bot supplying a large amount of data for future training. Another method worth mentioning is online learning [30]. This method is useful if the training data is provided in batches, it could be because the dataset is unreasonably large or the data is being collected over time.

While such methods increase accuracy, they add a lot of noise to the dataset incorporat- ing a large amount of sentences that will never be spoken. Another technique that could potentially be used to expand a dataset is continual learning. There is some research that has been done in this area to define and prove the effectiveness of this concept [31,32] and the exploration of this concept is currently ongoing [33, 34]. Despite the amount of existing work in the area of continual learning, this concept has not yet been investigated in relation to chat-bots, which is the topic of this work. A more detailed literature overview is present in the Literature overview section of this thesis.

1.3 Problem formulation

This work is related to the research area of machine learning, specifically, chat-bots and dialogue corpora used to train them. One of the reasons why intent-based chat-bots sometimes fail to perform requested tasks is that they fail to identify the intent of the user’s requests. In case when the classifier is configured well, the problem comes from the dataset used to train the bot. As there is a scarcity of datasets for all possible dialogue topics, some datasets are just too small to contain all the possible ways to express intent.

But when creating a chat-bot for a niche topic, any dataset is better than none. It takes quite some time and resources to collect more samples and further refine such dataset so

(8)

that it can progressively get better. Also, the quality will depend on how well the collected samples correspond to the actual users’ requests. In this work, we will attempt at solving the issue with available datasets being too small. We will use the previously mentioned approach of continual learning and two different chat-bots along with a smaller and a larger dataset to validate the approach for several use case scenarios. The goal is to expand their coverage over diverse inputs for the defined intents. The result would be a stronger classifier capable of correctly handling a more diverse variety of possible inputs.

1.4 Motivation

There exist several free dialogue datasets [35, 36], but they do not nearly cover all the types of different possible topics and conversations. There are small ones that do not perform enough to use in the real world [37] and there are the larger ones that often contain contextual data that confuses the AI classifier making it less accurate and more prone to false positives [22]. Datasets of smaller size usually prove to be of better quality than the big datasets. It is much easier to handle, annotate and analyse them, which allows for careful polishing and effective noise removal. Small datasets can be handpicked from the big ones, created and curated manually by enthusiasts to satisfy a niche area or even recorded using the Wizard of Oz methodology [38]. Such datasets end up complete and balanced, but because of their size, a classifier trained off one of them becomes limited to the exact sentences used in the dataset. Apart from the topics, different datasets are designed for different types of chat-bots and the datasets collected some years ago become outdated and not usable chat-bots build with latest advances in machine learning. Such are datasets for contextual intent-based chat-bots which are difficult to find in usable format.

The approach of continual learning investigated in this work aims to alleviate this problem by introducing a way to expand chat-bot datasets.

1.5 Research questions and objectives

The goal of this thesis is to investigate the applicability of the continual learning approach to the problem of shortage of dialogue datasets for uncommon topics in the context of chat-bot implementation tasks. In order to reach this goal, we define a number of research questions and related objectives presented below in Table 1.1 and Table 1.2.

Table 1.1: Thesis project research questions

RQ1 Is the continual learning approach motivated for the task of small dialogue corpus dataset extension in relation to the previously applied approaches?

RQ2 Does continual learning benefit state-of-the-art chat-bots with regard to smaller and larger dialogue datasets?

RQ3 Does continual learning benefit contextual intent-based chat-bots developed using publicly available tools?

Continual learning appears to be a promising approach for tackling the research problem discussed in Section 1.3. However, this approach has not been widely discussed in the literature in the context of chat-bot implementation tasks yet. Thus, the research question RQ1 is concerned with positioning the continual learning approach in relation to the previously applied dataset extension techniques. In order to address this research question, our objective O1 is to conduct a literature review about the existing approaches for

(9)

Table 1.2: Thesis project objectives

O1 Conduct a literature review about the existing methods of expanding small corpus datasets

O2 Investigate the effects of continual learning on a state-of-the-art chat-bot using a small dialogue dataset

O3 Investigate the effects of continual learning on a state-of-the-art chat-bot using a larger dialogue dataset

O4 Develop a case study contextual intent-based chat-bot with support for continual learning using publicly available tools

O5 Investigate the effects of continual learning on the case study contextual intent-based chat-bot using a small dialogue dataset

expanding small corpus datasets and analyze whether the continual learning approach is feasible in similar scenarios.

RQ2 is concerned with the effects of continual learning when applied to a state-of-the- art chat-bot implementation: do the results (the performance) of such a chat-bot improve, deteriorate or remain unaffected when applying continual learning (compared to the case of not applying this approach)? Here, we aim to investigate such effects in two scenarios involving a small dialogue dataset (O2) and a larger dialogue dataset (O3) and an existing state-of-the-art chat-bot implementation.

Finally, RQ3 is concerned with an even more specific issue of combining continual learning with contextuality in intent-based chat-bots. To address this question and investigate the effects of continual learning in this scenario, our objective O4 is to develop a case study chat-bot implementation using publicly available tools (in order to have full control over the design of the chat-bot and its support for contextuality). Once the implementation is ready, our final objective O5 is to evaluate this implementation with a small dialogue dataset in two scenarios (with and without continual learning enabled), similar to the strategy chosen for O2.

1.6 Scope/limitations

This project has a number of limitations. We only use two corpora to validate our approach using IBM Watson public API. The first corpus (the larger one) comes from a previous publication [39] and contains intents for mixed topics without contextual information. The second corpus (the smaller one) is handcrafted by the author of this thesis and contains dialogues on the topic of pizza ordering with contextual information present.

The developed chat-bot will only be tested with the small corpus, while both corpora will be used with Watson Assistant from IBM [40]. The developed case study chat-bot can only be used through the terminal (command-line) interface. There is a number of different chat-bots available, but we will only test our approach with the one we developed and the previously mentioned Watson Assistant. Limitations of the developed case study chat-bot are described in the Project limitations subsection of the Implementation section.

In order to automate testing datasets with Watson Assistant, we created a tool that uses the publicly available API [41]. As Watson Assistant does not support continual learning, we will imitate it by retraining the bot on the newly processed sentences in real time. A group of human subjects used to train and validate the chat-bots will consist of the author of this report and his colleagues.

(10)

1.7 Target group

This work might be interesting for researchers working in the area of Machine Learning, companies looking for a suitable chat-bot for business applications, or students seeking further knowledge about different areas of AI. This work could also help out people who are trying to find a larger dataset for their chat-bot applications, but are struggling to do so.

1.8 Outline

In the following Method section, we will define the problem we are going to solve, discuss different ways it can be solved and motivate the solution we propose. Later the Literature overviewsection will contain the overview of existing work in area of corpus expansion and description of the related concepts. Afterwards, we dive in the Implementation of our custom chat-bot and the module responsible for continual learning for expansion of the dataset. We will discuss a few possible language/library combinations suitable for implementing this type of chat-bot and motivate our choice. The implementation is followed by the Results section that will provide examples, productivity and efficiency metrics, and other materials that outline various qualities of the implemented solution—both for our custom chat-bot implementation and Watson Assistant. Every solution has its strong and weak sides, thus in the next section Analysis we look at the obtained results, find out how the developed continual learning method relates to the baseline approach (with no continual learning) and discuss additional aspects of the behavior of our method. When the analysis is done, in the following section of Conclusion we summarise our findings, answer the research questions and see if our results could be used to solve other problems in the field. We discuss if the method could potentially be improved and mention other paths that we could take to continue working on this project.

(11)

2 Method

2.1 Research approach description

In order to answer RQ1 formulated in the Introduction section, we will carry out a literature review of the existing ways of expanding a dialogue corpus that can be used to train a chat-bot (cf. thesis objective O1 in Section 1.5). The review will be included as a part of the following section Literature overview. Afterwards, we will define the new proposed method that uses the concept of continual learning [31] for expanding a dialogue corpus. As this work is about applying this method for contextual and intent-based chat-bots, along with the method we will provide details of these concepts, thus defining the boundaries of our study.

In order to answer RQ2, we need to find out if continual learning positively affects the expansion of smaller and larger datasets. For this purpose we (1) create a small corpus on the topic of pizza ordering and (2) use a larger publicly available corpus [39]. The small corpus contains 269 entries related to 16 intents. Some of the entries contain entities such as $pizza_name or $drink_name which at run time get replaced with corresponding entries from pizza and drink list from Appendixes B and C. An extract from this corpus is shown in Appendix E. The larger corpus contains 457 entries related to 14 different intents. The dataset lacks information about the context, but it is larger than our pizza ordering corpus.

The larger corpus consists of “Ask Ubuntu Corpus” containing questions and answers from [42], “Stack Exchange Corpus” containing questions and answers from [43] and

“Telegram Corpus” containing questions and answers about public transport in Munich gathered by Braun et al. [39].

The corpora will be fed to the public Watson Assistant API [41], with parts of them saved for validation of continual learning. With the training complete, we will imitate the continual learning process in order to collect performance data for this case scenario (cf. thesis objectives O2 and O3), which will allow us to give an answer to RQ2. The necessity of imitation rises from the fact that Watson Assistant is a proprietary state-of- the-art system, and the author of this thesis has neither access to its source code, nor the ability to extend the system with continual learning capabilities directly. The imitation will be carried out by retraining Watson Assistant with an extra example entry acquired after processing a single entry of the conversation.

The validation data for the pizza ordering corpus will consist of 10 handcrafted conversations with the chat-bot that aim to cover most of the available intents in the manner that would challenge the learning capabilities of the chat-bot. These conversations are presented in Appendixes F and G. The validation data for the larger corpus will consist of 10 entries for each of the intents. As we analysed the corresponding dataset [39], we found that only 8 out of 14 intents contain more than 20 examples. As we are going to reserve 10 examples for the purpose of validation, we will only include intents with 20 or more examples in our training data. The reason behind this limit is that having intents with a small number of examples would defeat the purpose of involving a larger corpus (compared to the other, small corpus mentioned above), which would question the le- gitimacy of our results in terms of RQ2. This leaves us with 8 intents and 403 entries.

The test sample for the larger corpus will consist of 10 conversation units for each of the intents, counting 80 inputs overall. The results will be organised in a table to facilitate comparison.

As RQ3 inquires about chat-bots developed using publicly available tools, the most obvious way to answer this question includes the development of a custom case study continual learning chat-bot (cf. thesis objective O4). The implementation of the case study

(12)

chat-bot will be available in the Implementation section of this work. When developing the chat-bot, we will review the existing tools for developing chat-bots and make a choice by picking the tools most suitable for this work in the opinion of the author of this thesis.

The choice will depend on novelty of the tools, the amount of available documentation and the software development experiences of the author of this thesis.

With the development complete, we will collect the results to compare the performance of the chat-bot with and without continual learning (cf. thesis objective O5). There is a number of existing corpora for chat-bots, but to use them in our chat-bot they have to be thoroughly analysed and put in a correct format that the chat-bot can accept [35].

Another problem is that most of the public corpora do not contain contextual information that is necessary to answer RQ3. While such corpora might demonstrate the impact of continual learning to some extent, it is possible that the best scenarios for testing the effects of continual learning never occur, and thus it would be difficult to argue about the effectiveness of the concept. For this purpose, we have already built our own corpus (a custom dataset) on the topic of pizza ordering that will let us demonstrate positive and negative sides of continual learning (see the discussion of RQ2 above). The parts of the corpus used as the training data are specified in Appendixes A, B, C and E. Testing continual learning with further datasets is a part of the Future work subsection of the Discussion section.

In order to collect the results, we will use a sample dialogue set consisting of 10 different conversations with the chat-bot previously mentioned as validation data above in this section. The 10 conversations can be found in Appendixes F and G and will cover the scenarios which demonstrate the effect of continual learning. The conversations contain lists of inputs created by the author of this work. The lists contain inputs that people can use when ordering pizza and drinks from a pizzeria. These inputs will be supplied to the chat-bot to collect the results along with the performance data when working in different modes. The same corpus will also be used to collect the data from Watson Assistant chat- bot [40] tested with an imitation of continual learning technique in order to answer RQ2, as discussed above.

After collecting the performance data that will consist of probabilities achieved by the chat-bot in identifying the intent of user input, we will perform an in-depth analysis of this data and find out how exactly the chat-bot behaves with and without continual learning when given the same input sequence from the user. This will allow us to have an “apples-to-apples” comparison of the data, meaning that it would be possible to argue about advantages and disadvantages of each approach. Having the results data from the case study chat-bot allows us to answer RQ3 formulated in the Introduction section.

In our opinion, validating our corpora with Watson Assistant and our own chat-bot implementation will allow us to collect enough data to validate the concept of continual learning for several scenarios of chat-bot development. The chat-bot we will develop represents a type of chat-bots that companies might develop for business purposes on a low budget. On the other side, Watson Assistant is a multi-purpose intent-based contextual chat-bot that makes use of state-of-the-art algorithms to boost its performance for a number of diverse applications. Although it can be configured for ordering pizza and drinks, it remains a black box in terms of source code availability, which makes the analysis from the outside less reliable.

(13)

2.2 Reliability and validity

The use of software and machine learning in this work introduces a number of concerns about reliability. It is safe to say that should others attempt to replicate the work described in this thesis, they will be able to get similar results only to a certain degree. In software development, there is a number of factors that might cause the same steps to produce different output results. In this project, we use a combination of a programming language, frameworks and libraries to validate our theoretical assumptions in practice. In order to replicate the results as in this work, the performer must use the same versions of language, libraries and frameworks along the same version of the OS. The list can be found in subsection Software prerequisites of the Implementation section.

When considering validity of this work, we make sure to discuss as many factors as possible in the Discussion section of this work. To prevent problems with construct validity, we make sure to support the claims we make with detailed explanations or external references. In order to deal with internal validity, we make sure to describe all the details of the steps that produced a certain result in the Results section. Previously described reliability concerns will also help us to avoid internal validity problems. As for the external validity, we will explicitly describe the limitations of this study in the Project limitations subsection of the Implementation section. The results of this work will be validated in a limited number of scenarios. We make sure to specify the scenarios with their details and only make careful assumptions about the scenarios that we do not test.

2.3 Ethical considerations

It is worth considering the way in which the suggested approach collects the data. Con- tinual learning concept in this work introduces two-way data exchange between humans and machines. In our method, we explicitly collect information provided by the user in order to retrain our classifier, and this data may also be stored for later analysis. The implementation of the method in this work does not aim to collect or deal with sensitive information, but regardless of this intention, such information could still be provided by the user. In such a case, this information would be stored within the classifier and could potentially be extracted from it, which has to be considered when implementing this solution. We suggest the ways to deal with this problem. One way is using input sanitization to filter out possible sensitive information such as names, birth dates or phone numbers before it reaches the classifier and potential logging functions [44]. Another way is restricting the input to a certain pattern the same way firewalls do [45], but this may substantially decrease the effectiveness of the algorithm. One more way is introducing the agreement about data collection, which users have to sign before interacting with the application [46]. Following one of the suggested methods will allow the system to deal with sensitive data and help resolve ethical concerns.

(14)

3 Literature overview

In order to justify the need for a new approach, we must first overview the existing methods of expanding dialogue corpora. In order to test the method, we develop a case study chat-bot that will allow us to test our new approach. Before developing the chat-bot, we specify which kind of dataset we plan to work with and motivate the choice of an algorithm to use for training. In order to support this choice, we overview the existing methods of training dialogue systems such as chat-bots.

The diagram providing an overview of this section is shown in Figure 3.1. This section is designed to complete O1 in order to answer RQ1. Here we investigate the existing methods of expanding dialogue datasets and define which category the approach of continual learning belongs to.

Figure 3.1: Overview of the Literature overview section

3.1 Data management and training methods for dialogue systems 3.1.1 Methods of expanding a dialogue corpus

There are a number of ways to approach dataset expansion. Data augmentation is one of these ways as it preserves the information the data carries while changing the way it appears. There has been a lot of research done in the area of data augmentation techniques

(15)

applied to image datasets [47], but in this work we are mainly interested in application to dialogue corpora. Another approach to dataset expansion is data collection which can be done in a manual or automatic manner.

Data augmentation based methods

Word shuffling When it comes to word shuffling, this way of creating new sentences for the dataset will not require any additional sources, but only the initial data.

There is a number of ways to formulate a grammatically correct sentence in English language, and it is possible to move around words while maintaining the grammatical correctness and the meaning of the sentence. When using this method, each sentence is tokenized into separate words and then these words are shuffled and gathered back to create a new sentence. With a modification, a third party tool can be used to validate the sentences that retain grammatical correctness after the transformation, an example is shown in Figure 3.2. The validated sentences will then will be used to extend the initial dataset. When applied to input parts of the dataset, the grammatical correctness can often be unimportant as users can provide inputs that are imperfect in following the language rules. When such cases are possible, it can be beneficial to train the classifier using this method of data augmentation [48].

Figure 3.2: An example application of word shuffling technique

Word replacement Using this technique requires a source of synonyms for all of some of the words from the initial dataset. It will also require some data prepossessing in order to identify the bases of all words and their synonyms. Replacing all words with a random synonym can alter the meaning of the original sentence resulting in worse results with the classified. The most relevant way to approach this problem is to identify a number of key words that can be replaced as shown in Figure 3.3. When the reprocessing is done, the algorithm will generate new sentences for the dataset replacing the key words with their synonyms. With some additional work during tokenizing, it is possible to apply this technique to expressions or even whole sentences providing more quality data for the output dataset. One more way of following this approach is using semantic networks [49]

that link different words with an “is-a” relation. An example of semantic network in shown in Figure 3.4. In this case the words can be replaced with their “parents” following all the way up along the network instead of working with synonyms. Additionally, a recent research in this area proposes to use predictive models [50] to predict the next word of the sentence having the beginning of the sentence as input. The output prediction can then replace the following word in the sentence thus forming a different sentence to add to the

(16)

initial dataset. As demonstrated in Figure 3.3, this method can still create unreasonable sentences which have to filtered by a human supervisor before they can be used. In this case word table was replaced with chart which doesn’t make any sense here as it is a different kind of table.

Figure 3.3: An example application of word replacement technique

Figure 3.4: An example of semantic network [51]

Syntax tree manipulation This method manipulates the sentence core structure changing the time, the clause or the voice of the sentence. This method requires a round of Natural Language Processing (NLP) which is a computationally expensive task [52].

It can be applied to a number of sentences generating a number of their altered versions while preserving the meaning and the grammatical correctness. To apply the method, we start with generating a syntactic dependency tree for each sentence in our input data. Then we make some transformations to the sentence structure such as moving from active to passive voice, splitting a sentence containing a clause, changing the tense of the sentence.

An example of such manipulation is shown in Figure 3.5. When all transformations are complete, we build a new sentence from the mutated syntax tree and add it to our dataset.

This method would face problems if the input sentence has grammatical problems. In this case the tree build could fail or might not reflect what the user meant with the sentence. Thus, the input might need a round of reprocessing before its syntax tree can be manipulated to generate new entries for the dataset.

(17)

Figure 3.5: An example of a syntax tree manipulation [53]

Data collection based methods

Active learning Active learning is based on continual human-bot interaction. The idea of active learning comes from schools where students are encouraged to actively participate in the learning process besides passively listening to the teachers [54]. This idea has then been applied in the area of machine learning to optimize the interaction between machine learning systems and humans [55–57]. An active learning system requires help of a human supervisor in order to label its data. In application to chat-bots, namely intent-based chat-bots receive direct benefit from this approach as each input sentence is being mapped to one or more intents (or labels). Other dialogue systems can also benefit from active learning by receiving user feedback about their response to the user. A common approach to active learning in dialogue systems consists of the following steps.

The user starts with sending a message to the bot. The bot will then generate a number of responses using its preliminary trained classifier. The bot then sends the responses to the user in order of decreasing confidence probability. The user pics one of the responses as

’best’ or suggests a better response, sending it back the the bot. The pair of input-response is then propagated through the layers of bots network increasing probability of a correct response for that particular input. The user sends another message to the bot and the process repeats [58]. In active learning the selection criteria of the user is subjective which is one of the challenges active learning is facing in practice. Other challenges include multi-tasking, unknown classes and variable labeling cost [59].

Crowdsourcing Crowdsourcing can be viewed as a simple straight forward collecting of data. Crowdsourcing in machine learning, apart from collecting data, can be used for different learning tasks, such as data labeling, image classification, chat-bot training and others. This method can often be less expensive than other data collection method [29], it can also be done both online and offline. For offline data collection, a number of workers are supplied with instructions for what kind of data the machine learning requires. Although it could seem that the process finishes when the workers

(18)

are done, it is often not true. The crowdsourced data has to be checked to meet some quality standards. There are a few things that can go wrong about data quality. Apart from possibility of incompetent workers, incomplete or ambiguous guidelines could result in different interpretations of concepts and inconsistent data labeling. This problem has been addressed [60–62] and has to be considered when using crowdsourcing for data collection. Online data collection can potentially use bigger crowds for machine learning tasks. A creative way of online crowdsourcing was introduced by Google implementation of reCaptcha [63]. Using reCaptcha Google protects its services against automatic web crawlers at the same time collecting free crowdsourced data for its machine learning tasks for image identification [64]. An honorable mention of application of crowdsourcing is a chat-bot from 2016 developed by Microsoft and called Tay [65]. Tay was supposed to learn more about language over time by engaging people in dialogue using tweets and private messages. It was designed to emulate observed patterns in subsequent conversations with other users. The faulty part was insufficient input sanitization which in a few hours after release made Tay sound just like the internet. Over time most of the tweeted messaged became offensive, racist or sexists as Tay was learning from random people on the internet. Having done as much damage, Tay was suspended 16 hours after the release, but it taught the ML community a valuable lesson.

When crowdsourcing is applied to dialogue systems, it can prove as an effective way of training a chat-bot. A large pool of crowdsourced workers continually interacts with the bot to teach it more natural language and feed it more data for training. Alternatively the workers can create datasets in order to feed them to the bot as learning or training data [66].

Online learning Online learning has been studied in a number of fields such as game theory [67] and machine learning [30]. The goal with online learning is making a sequence of predictions about the answer to a defined question. For every given answer, the correct answer is revealed and the classifier suffers a loss measured on the difference between the two answers. The difference with active learning method here is that the correct answer is known and there is no supervisor giving feedback. As opposed to regular training methods where the data is provided in batches, in online learning data is served in a sequential order. The classifier is being trained and retrained step by step getting closer to giving correct answer at each step. In order for this method to work, there must be correlation between the questions asked, otherwise the learning might worsen the performance of the classifier instead of improving it. Online learning is useful when the dataset gradually changes over time or the new data is being streamed over time and has to be taken into consideration as soon as possible [68]. We have not found any work in the area of online learning applied to chat-bots or other dialogue systems. Thus in this work we will explore how well it can be applied to chat-bots and find out about the advantages and disadvantages that it entails.

3.1.2 Training dialogue systems

There is a number of different methods that can be used to train dialogue systems, among widely used ones are rule based [69], sequence-to-sequence based [70], reinforcement learning based [71] and hierarchical reinforcement learning based methods [72].

Rule based methods A good example of a system developed using rule-based methods is ELIZA [16]. ELIZA analyses user replies and attempts to match them with defined

(19)

patterns. User input is analysed using decomposition rules which depends on certain keywords being present in the input. The response is generated using some of the user input along with reassembly rules corresponding to used decomposition rules. ELIZA is an early development and therefore has a number of issues including dependence on keywords, inability of using the context of the conversation, the difficulty of choosing non-controversial rules. On the other hand it has laid foundations for all rule-based dialogue systems. The advantage of using rule-based methods is having control over the selection of responses, it keeps the developed system consistent and ensures that the user is not being offended by its responses. A more recent development in systems rule-based method is NADIA [73]. NADIA enables an expert to create a set of question-answer pairs that contribute in achieving the defined goals. The conversation is considered fin- ished once all the goals were achieved. There is a possibility to add optional questions that do not contribute to completion of goals, but rather provide additional information to the user. The expert-defined model will then be used to engage in a conversation with the user by NADIA system which will follow a set of hard-coded rules to make a natural conversation with the user and attempt to achieve all the goals.

To summarise rule-based methods: when the desired dialogue flow has a known amount of states and user replies, the rule-based methods outperform other training methods [20]. However, in scenarios when it is infeasible to enumerate all possible states and replies, it is not effective to use rule-based methods. When the user input fails to match the rules, the system has no other choice that to give an “I don’t know, please ask something else”-kind of response. In this work the developed pizza ordering chat-bot will deal with simple dialogue tasks such as asking the name of the pizza to order, providing details information about the menu, confirming the order. These tasks contribute to a goal of ordering a pizza and can be completed only one at a time. This makes rule-based approach a reasonable choice considering the described advantages and disadvantages.

Sequence-to-sequence based methods Sequence-to-sequence learning has caught attention of researchers since the publication about RNN Encoder-Decoder used for statistical machine translation [74]. Sequence-to-sequence method described in the article has since been successfully applied for other tasks. Sequence to sequence methods, often referred as seq2seq, are deep learning methods which learn to transform a given input to an output using an encoder and a decoder. The transformation is done according to the defined dictionaries which contain words, expressions and sentences mapped to other figures of speech. Apart from the dictionaries, seq2seq also make use of dialogue history which consists of every word chosen before the current state of the dialogue. A probability distribution over dictionaries affected by the history is the key to choosing the response sequence in seq2seq methods. A notable application of seq2seq methods is translation tasks from English to French where seq2seq method achieved great results comparing to other methods [70]. It has later been adopted in building dialogue systems such as chat-bots. For building chat-bots, the problem was defined as translation of a certain user input to a defined output. This was achieved by adding a hidden state in between the encoder and the decoder which stores “the meaning” of the user input. During the learning process, the model is taught to map this hidden state to an output sequence, in its nature this approach is similar to intent-based chat-bots. The difference is that a certain intent is mapped to a defined number of outputs, in case with seq2seq, this mapping is not static and varies depending on the conversation history.

The recent advances in conversational NLP show that seq2seq models trained on a large sized dataset can create great quality assistants capable of answering simple ques-

(20)

tions and extracting relevant information [75]. But although seq2seq methods give good results in domain-specific tasks, such as flight booking, models build on them can be difficult to customize when the learning process does not cover all possible dialogue flows, which can lead to incorrect and often confusing responses. The case study chat-bot that we develop in this work has to complete simple tasks in the area of pizza delivery and cannot afford to have confusing or unclear responses, thus this approach is not optimal in our case.

Reinforcement learning methods Reinforcement learning is an old technique initially developed to train robots to move. This machine learning method increases its efficiency over time by using a system of rewards which can be positive or negative. The model is trained to maximize the reward by learning the best actions to take on each turn of the conversation. Each performed action is evaluated by a supervisor or an algorithm and yields a negative or positive reward points based on the expectations. Chat-bots that using this method become increasingly efficient the longer the conversation goes as long as there is a way to evaluate their response. Reinforcement learning methods have gained more attention when combined with new deep learning methods in beating humans in Atari games [76]. It is even possible to further improve the deep learning methods by matching them against each other. But as reinforcement learning is suitable for game tasks, it is notable that in games the diversity of possible actions is limited and defined.

In case with dialogues, the number of possible responses is large and depending on the limitations, matches the number of possible sentences in a language. Using this method, with ever increasing amount of possible sentences (actions) the algorithm discovers, the more the complex become the tasks of response generation.

Using reinforcement learning allows people to make ore natural chat-bots as they continually learn and improve from user feedback. The chat-bots are capable of handling long conversations while building their responses on previously acquired information. With the addition of deep reinforced learning, the model is able to optimize the future reward while at the same time maintain such properties of a good conversation as informativity, coher- ence and ease of answering [71]. However, the reinforcement learning method presents a number of limitations. Firstly, it takes time before the system is trained well enough to uphold a meaningful conversation with a human. Secondly, even for simple conversations with a single goal it may require a large amount of quality training data due to complexity of human dialogues. Lastly, the grammatical correctness is at question as pure reinforcement learning methods do not include space for additional grammatical rules on the output. In case with the case study chat-bot that we develop in our work, the responses can be static and can be defined by a set of rules. This approach could work well but it is outperformed by rule-based approach as the set of rules is not hard to define [20].

Hierarchical reinforcement learning based methods Hierarchical reinforcement learning approach is based on previously described reinforcement learning with addition of temporal abstraction concept [77]. It attempts at solving the problem of dimensional- ity in reinforcement learning [78] where each dimension represents one action that can be takes in a given state. Using hierarchical reinforcement learning, in order to make a decision in a given state the model will follow the defined domain hierarchy and enter sub-domains with their own policies until the decision can be made. Policies at lower domain levels are optimized for simpler tasks which together contribute to making a decision at master domain level. The policies at master domain will make a choice between using a primitive or composed actions to maximize the reward. In case when a composed

(21)

action is chosen, the reward is delayed until all the the sub-actions are complete and the information is passed back to the master domain. Where reinforcement learning models learn only primitive actions and need to make a decision about the output for each step of the conversation, hierarchical reinforcement learning model can learn both primitive and composite actions which decreases the number of dimensions at each level of the model and adds more control to the decision process [79].

Hierarchical reinforcement learning based methods outperform standard reinforcement learning methods in training speed and output precision. It uses several layers of abstraction to handle the increasing number of parameters which enables its use in large dialogue systems where standard reinforcement learning would be infeasible to apply.

Although hierarchical reinforcement learning solves some problems, it opens a number of new questions concerning handling deeper hierarchies, designing reward strategy for lower level hierarchies and development of automatic way of dividing complex actions into a hierarchy of composed actions [80]. For the same reason as in case with reinforcement learning method, in this work we stick with rule-based approach for development of the chat-bot.

3.2 Contextuality in intent-based chat-bots

The evolution of NLP and NLU helped drive chat-bot technologies to a new stage where they can sometimes successfully replace humans in simple tasks such as hotel booking or doing customer support. Chat-bots can be categorized in a number of different ways.

They can be developed to handle chit-chat or be task oriented. They can deal with open or close domains. They can be intent-based or flow based. While a simple chat-bot implementation can be achieved thorough simple pattern matching with defined responses, such chat-bots are very limited and cannot handle any degree of natural conversation with humans [81]. On the other hand, more sophisticated chat-bots using machine learning achieve great results in natural conversations with humans while limited to a specific domain and conversation flows [82].

One type of chat-bots well suited for goal-oriented tasks are intent-based chat-bots.

Such bots operate with utterances, intent and entities [83]. When having a conversation with a user, an input from the user is considered an utterance and will be processed by the bot in order to output a response. An utterance is being mapped to one or more intents where intent explains the motive of the user with the given input. Intent is about what the user wants at a specific step of the conversation and it is often defined as a verb or a noun, such as “latest_news” or “set_alarm”. Utterances may contain entities which provide additional information within the intent. An example of such information could be “Monday” with intent “weather_on_day” or “10AM” with intent “set_alarm”. The output of an intent-based chat-bot is usually a static list of one or more elements from which one is chosen at random, but it can differ depending on the entities within the utterance. While having semi-static output, with help of machine learning the chat-bot is able to comprehend a large diversity of inputs successfully mapping them to the intents and provide relevant responses for the user.

Although intent-based chat-bot is able to digest all kinds of inputs, it is prone to context-specific ones [84]. The bottleneck is when the same input in different contexts would require completely different responses. In case when chat-bot is taught to recognize the response, it cannot take context into consideration out of the box. This problem can be fixed with adding contextuality to the chat-bot in form of a state-machine. This technique comes from another type of chat-bots — flow-based ones. A flow-based chat-bot uses a

(22)

predefined conversation flow, which can be visualized as a flowchart, for instance. When a user starts the conversation with the bot, she/he will be guided through the states step by step until a certain defined end-goal state is reached. This restrictiveness has its pros, such as clear abilities, maintaining the context, predictability, as well as its cons, such as limited options, unnatural conversation, difficulties with handling complex requests.

By combining the flow-based approach with the intent-based approach [85], we are hop- ing to challenge some of the cons while at the same time maintaining the pros of both approaches.

User inputs can be divided into contextual and non-contextual. Contextual inputs, such as short answers to questions asked by the bot, can be handled by the flow-based part of the chat-bot. This will allow the classifier to limit possible hits to the ones residing withing the context. Non-contextual inputs, such as asking about the available menu, can be handled by the intent-based part of the chat-bot. In this case classifier will exclude the contextual inputs and only take into consideration the rest of the defined intents. In this structure, each transition in the state machine corresponds to one or more defined intents, while the intents may have two properties: context-dependent and context-changing ones.

Intents without those properties reside outside of the state machine and can be accessed regardless of the current state.

Contextis a generic term, which can be operationalized in several ways when it comes to chat-bots [86]. One type is situational context, which is the context of user location.

It is about knowing the physical location of the user or in which part of website or application he currently is. Chat-bot can use this information in order to provide correct responses, such as when the user asks about restaurants nearby. Another type is lingual context — the context of the conversation flow. Different words in a sentence can have different meanings depending on what was mentioned in the sentence or some sentences ago. A lot of information can come from a particular experience of a user or things that surround her/him. This presents a challenge to chat-bots, as they are only able to respond based on information they can access. One more type is persistence context. This context is about the current state of the conversation. Persistence context helps binding inputs and outputs into a conversation flow, where some of them depend on others. As an example, a simple input “Yes” does not carry any meaning or intent without the previously asked question. The last type is emotional context. Different words or their order can be used to empathise emotions of the user within an input. Intentionally programmed chat-bots would be able to recognise the context and respond in an appropriate way, while other chat-bots might misinterpret the user’s intent. An example of such context is providing sarcastic or rhetoric inputs to the chat-bot.

3.3 A concept of continual learning

One of the contributions of this work is the application of the concept of continual learning [31] to chat-bots. It adopts the idea of online learning with a difference that the data provided for this learning is being crowdsourced [32]. The Literature review subsection covers the mentioned concepts of online learning and crowdsourcing. The idea is to keep training the chat-bot as the conversation goes while having the same dictionary, state machine and list of intents. Retraining bot continually will enable it to learn more different ways of expressing defined intents and adapt to different structure of sentences. Doing it in real time allows the bot to change its responses should the user try to clarify his request.

The retraining affects the probabilities within the classifier which results in some requests hitting the threshold of identification which otherwise would result in a standard “Please

(23)

ask something else”-kind of message. Figure 3.6 demonstrates the steps of an interaction between a user and a continual learning chat-bot.

Figure 3.6: Steps of an interaction between a user and a continual learning chat-bot Having described the idea, now we clarify it by providing the step-by-step approach to continual learning. Assuming we have a trained intent-based contextual classifier, we proceed in the following steps. The process is initiated by the user sending some input to the bot. The bot will then process the input and consider probabilistic distribution of matched intents, picking the most likely one and preparing the response to the user. Then the bot sends his response to the user and then continues to analyze the request. If the identified intent probability exceeds the defined threshold number, the bot matches the input with the intent and re-calibrates its classifier to learn this new combination. If the intent probability does not exceed the threshold, this combination is discarded as potentially not precise or harmful. Choosing high probability for threshold allows more precise learning while lowering it allows for more learning samples, but at the same time can learn the classifier wrong. This number has to be carefully picked and tested before using it in real case applications. The disadvantage of retraining in real time is unpredictabil- ity [87]. It is hard to tell how the classifier will react to different inputs after retraining as users receive some control over training the bot. It is possible that after a number of retraining the bot can “forget” some of the intents it was previously trained. In field of artificial intelligence this effect is called “catastrophic forgetting” [88] or “catastrophic interference” [89] and we discuss it in more detail in the following subsection. An alter- native way of using this approach is to collect and process the data in some way before using it to calibrate the classifier. This way, no unwanted data will be used for training and the results of the training process will become more controlled and predictable.

Provided the description of continual learning concept, in the following section we describe its implementation with a case study chat-bot. During the testing of the implementation in different scenarios we will use a small handcrafted dataset and record the results, which we present in the Results section. To validate this approach, we will compare the results to the baseline scenario without continual learning in the Analysis section.

3.4 Concerns about catastrophic forgetting

The concept of continual learning adds a new challenge to neural networks called “catastrophic forgetting” [90]. This challenge appears due to the tendency for existing trained

(24)

skills to be lost as new data is added into the classifier. In intent-based systems this hap- pens specifically when the network was trained on one of the intents, but at the same time lacked training for the rest of intents. This causes the weights in the network to change in order to adjust all possible scenarios for trained intent which in turn results in reduction of probabilities for all other intents. This reduction will sometimes decrease the probability below the allowed threshold making the network fail to recognize “close” inputs. Con- sider the example of a case when probability of a certain input-intent combination is 0.81 and the set threshold is 0.8. After applying continual learning on other intents, new information will be considered when calculating the probability for the previously mentioned case thus decreasing the probability below 0.8 and rendering the input unrecognized [91].

This problem has been investigated and there are a number of approaches to tackle it. Ear- lier attempts suggest storing previous data and retraining the system every time new data comes in [92]. However, such methods require a large amount of memory for storing old data and more computational iterations which proves inefficient in large systems due to substantial increase in retraining time. Several new approaches include “learning without forgetting” [93], “progressive neural networks” [94], “elastic weight consolidation” [88], etc. Considering the research done on this topic, in this work we will not implement any of the methods to overcome catastrophic forgetting and instead concentrate on the concept of continual learning in chat-bots. This concludes the section of literature overview and completes O1 specified among the thesis objectives in the Introduction section.

(25)

4 Implementation

This section will describe the implementation of our custom case study chat-bot and its modules, which is necessary to address the thesis objective O4. The diagram providing an overview of this section is shown in Figure 4.1.

Figure 4.1: Overview of the Implementation section

4.1 Using Python for machine learning tasks

Machine learning is a category of artificial intelligence (AI) methods that gives machines the ability to learn without being explicitly programmed [95]. This means that the developed software changes itself when exposed to new input. The change is based on the training data supplied to a certain mathematical model that will make decisions about the flow of the program [96]. Machine learning algorithms have over time been used for a number of applications such as virtual assistants, face recognition, speech recognition, spam filtering, customer support, advertisement recommendations, etc. The mentioned applications use a number of software tools which are developed to a certain degree.

Tools can differ in quality, amount of functionality provided and ease of use. High-level tools such as Watson Assistant [40] or Cloud AutoML [97] offer possibility of classifier configuration which does not require knowledge about technologies or algorithms used to create the classifier manually. This kind of commercial tools can be used to achieve great results in pre-programmed use cases without deep knowledge of machine learning.

These high-level tools are usually build using low-level tools, i.e., certain programming language libraries. Using low-level tools allows for a wider spectre of customization, but also requires deeper knowledge about machine learning and used tool. Low-level tools

(26)

can be categorized by the programming language used. According to the Developer Eco- nomics survey report [98], the most used languages for machine learning tasks are the following: Python, C/C++, Java, R, JavaScript. The slide presenting the languages and their application areas is shown in Figure 4.2.

Figure 4.2: Programming languages and their application areas in machine learning [98]

This figure shows that there is no such thing as “best language for machine learning”. Not all listed languages are good for every machine learning application, some less common applications might have better support in another language that was not listed before. The provided list of languages can be categorized in a number of common areas of application. Every area often favors one language over the other. For example from the figure we can see that Python is widely used for sentiment analysis and natural language processing while C/C++ is a preferred language for developing AI in games and network security applications. Java is a good choice when dealing with customer support management and R can be used when dealing with AI in bio-engineering. This statistic only tells us the languages preferred by the community, but from this data we assume that popular applications provide the best tools for dealing with tasks in the specified areas.

Currently Python is the language used the most for a wide variety of applications—we can confirm it by looking at the number of libraries available in Python and the advantages of the language itself [99]. Besides the libraries for scientific computation such as Numpy, Matplotlib, Pickle, it also includes a number of machine learning libraries such as Scipy, Scikit-learn, Theano, TensorFlow, TFLearn, Keras, PyTorch, Pandas. Consid- ering that Python is the most popular language for applications in the areas of NLP and chat-bots and it contains all the tools we need to implement a case study chat-bot for the experiments, we are going to favor it over other languages in this work.

(27)

4.2 TensorFlow and TFLearn

4.2.1 TensorFlow: An end-to-end open source machine learning platform

TensorFlow is an open-source platform for machine learning. It provides the developers with a number of tools and resources that allow them to create and deploy machine learning based applications. The library is regularly updated with new features to support state-of-art methods in machine learning [100]. TensorFlow has a graphical interface al- lowing developers to create and test models directly in the browser, alternatively they can use it as a library for Python. Also there is TensorFlow.js — a framework for NodeJS applications, and TensorFlow Lite — a library for mobile and embedded devices. Ten- sorFlow documentation provides comprehensive guides and examples for getting started with machine learning as well as detailed documentation for professionals [101]. Ten- sorFlow, as the name indicates, runs computations with tensors. In terms of machine learning, a tensor is a generalization of vectors and matrices to high dimensions, which is represented as an n-dimensional array of a base datatype, such as int32, float32 or string.

The programs built with TensorFlow work by building a graph of tensors, where some tensors are set up by the developer and others are computed based on the already defined tensors. All elements of a tensor always have the same data type. When operating with tensor shapes (or sizes), most functions produce tensors of known shapes if the shapes of input tensors are known, otherwise it it only possible to find the shape of a tensor at run time [102]. Figure 4.3 shows an example of tensors of multiple dimensions. A tensor is effectively an input and output of a layer in a neural network model, where the network can have multiple layers. As we are using Python in this work, TensorFlow is a solid choice for the library to use for our case study application. It has all the tools to create a chat-bot and implement continual learning.

Figure 4.3: An example of a tensor in different dimensions [103]

4.2.2 TFLearn: A deep learning library for TensorFlow

TensorFlow by itself provides access to generic methods for model manipulation. But when it comes to customisation, the process can quickly become complex. To abstract from the complexity and concentrate on running experiments, there currently exist two popular libraries: TFLearn [104] and Keras [105]. TFLearn is a library built on top of TensorFlow that facilitates TensorFlow functionality through a high-level API. This allows for faster setup and experiments with datasets, layers and different activation functions. One of the advantages of TFLearn is the possibility to use it simultaneously with

(28)

TensorFlow. This allows for a combination of ease of use and in-depth configuration when developing neural network models. Keras is also a library for Python running on top of TensorFlow. While facilitating TensorFlow functionality similar to TFLearn, Keras cannot be used directly alongside TensorFlow. Keras comes with a number of initial datasets for testing and prototyping, and it is a solid choice for beginners. Both TFLearn and Keras are suitable for the chat-bot implementation, so here the decision is more about the preference and experience of the developer. For this project, we will choose TFLearn as more examples of various dialogue system implementations are available in this case.

This decision will let us speed up the development of the chat-bot itself, so that we can concentrate on the implementation of the continual learning concept.

4.3 Software prerequisites for running the case study chat-bot

Here we summarise our choice of tools. To implement the chat-bot we will use the Python programming language. As for the OS, we will run Ubuntu Linux with the latest available kernel version. In Python we will use TensorFlow and TFLearn for machine learning tasks and NLTK for word tokenization and stemming. We will also use NumPy for dealing with arrays supplied as input for TFLearn. To install all the Python packages in our OS of choice, we will use the latest version of PIP. The software and library versions used for our case study chat-bot implementation are specified in Table 4.1.

Table 4.1: Versions of used software and libraries

Name Version

Ubuntu Linux 18.04.4 LTS

Linux Kernel 5.3.0-46-generic x86_x64

Python 3.6.9

TensorFlow 2.0.0

TFLearn 0.3.2

Natural Language Toolkit (NLTK) 3.4.5

Panda 0.3.1

Package Installer for Python (PIP) 19.3.1

NumPy 1.17.4

4.4 Initial preparations before the implementation

There is a number of different chat-bots that can be created using the latest machine learning technologies. Some can act as virtual assistants, others provide customer support for businesses. For our case study project, we will create a chat-bot that will allow the user to order pizza and/or drinks. The development process begins with installing required packages. We will use PIP to install NumPy, NLTK, TensorFlow and TFLearn in our OS.

pip install numpy pip install nltk

pip install tensorflow pip install tflearn

When the required packages are installed, we have to define our initial training data. With TFLearn, we can specify a percentage of the dataset to be used for training and validation, thus we will define and create the starting dataset. The preview of the dataset enumerating