Conversational agent as kitchen assistant

(1)

INOM

EXAMENSARBETE TECHNOLOGY, GRUNDNIVÅ, 15 HP

STOCKHOLM SVERIGE 2018,

Conversational agent as kitchen assistant

BEATA RYSTEDT MIA ZDYBEK

KTH

SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

(2)

INOM

EXAMENSARBETE TEKNIK, GRUNDNIVÅ, 15 HP

STOCKHOLM SVERIGE 2018,

Chatbot med

konverationsgränssnitt som hjälpreda i köket.

BEATA RYSTEDT MIA ZDYBEK

KTH

SKOLAN FÖR ELEKTROTEKNIK OCH DATAVETENSKAP

(3)

Abstract

Chatbots, also called conversational agents, with speech interfaces are being used to a greater and greater extent, but there are still many areas that are not completely explored. The idea of this project was born out of the belief that there is a need for an assistant in the kitchen that is able to search for recipes, answer questions regarding them and guide and assist the user throughout the cooking process, all through conversation since the hands are busy. This paper begins with an introduction in the subject of conversational agents and the related technology, then similar, already existing studies and methods are presented with their pros and cons. After follows an in-depth explanation on how the program was constructed into a working kitchen assistant. Lastly, the users’ experiences of the performance and usability of the program was evaluated through tests and discussed. It turns out that conversational agents definitely can be integrated in the kitchen, and according to several sources, in a few years they will be implemented in all possible areas and change the technology of our time.

(4)

Sammanfattning

Konversationsrobotar med talgränssnitt används i allt större och större utsträckning men det finns fortfarande m˚anga omr˚aden som inte är helt utforskade. Idén till det här arbetet föddes ur uppfattningen att det existerar ett behov av en hjälpreda till köket som kan söka recept, svara p˚a fr˚agor kring receptet och vägleda och hjälpa användaren genom hela matlagningsprocessen i muntligt form eftersom händerna är upptagna med annat. Det här arbetet börjar med en introduktion i ämnet kring konversationsrobotar och tekniken bakom, sedan presenteras liknande arbeten och metoder som redan existerar inom omr˚adet. Sedan följer en djupdykning i hur det framtagna programmet i detta arbete utvecklats fram till en fungerande matlagn- ingsassisten. Till slut presenteras och diskuteras upplevelsen och användbarheten av konversationsroboten hos människor baserat p˚a tester som gjorts. Det visar sig att konversationsrobotar mycket väl kan vara av användning i köket, och enligt flera källor kommer de att inom en snar framtid lavinartat implementeras i alla möjliga omr˚aden och förändra tekniken i v˚art samhälle.

(5)

CONTENTS

1 Introduction

In this chapter, background about relevant technologies and background about recipes and cooking are presented. Then, the problem situation from which the idea of the project emerged from is discussed, and finally the goal and purpose of the project is presented.

1.1 Background

1.1.1 Conversational agents

Conversational agents, also called or chatbots, combine conversational interface technology and Natural Language Processing, and sometimes also web based services, to deliver interactive speech- or text-based dialogs [12]. The accessibility and usefulness of conversational agents are constantly increasing and applied in more and more areas of peoples everyday lives. Conversational agents from major companies like Apple with Siri, Amazon with Alexa and Google with Google Assistant have many functions such as home automation, music control, weather check, ordering food, reading the news, searching the web and more. Developers are constantly evolving these complex agents and more and more ”skills” are added. In addition to these multi-use agents, there are assistants that specializes on single tasks. One example of a platform that can be used to develop such specialized agents is Google’s Dialogflow (previously api.ai), that let’s users compose their own agents and then integrate it into their own app, website or platform.

Chatsbots can have no specific skills besides conversating, like a conversational chatbot, or specializations in one field for example controlling an audio player or kitchen appliances. They can also combine skills to make all round conversational assistants. A conversational bot’s purpose is to entertain the user and simulate a conversation without a specific goal. A task-oriented bot’s purpose is to interpret the request and perform the task the user is asking for. Most task-oriented chatbots also have some conversational skills. Chatbots of varying degree of intelligence have existed for years. The first chatbot appeared in the 60’s and used simple keyword matching to generate an answer [20]. Chatbots today have evolved and are using technologies such as pattern matching, keyword extraction, NLP, machine learning, and deep learning.

1.1.2 Natural Language Processing

A conversational agent uses Natural Language Processing, NLP, to perform interactive dialogs with a user. NLP uses computer and information sciences, linguis- tics, mathematics, electrical and electronic engineering, artificial intelligence and robotics, psychology, and other areas to explore how computers can be used to understand and manipulate natural language text or speech, by gathering information about how human beings understand and use language [6]. The aim of NLP research is to develop appropriate tools and techniques to make computer systems able to understand and manipulate natural languages to perform the desired tasks.

1.1.3 Speech interface

A speech interface allows the user to, instead of with a mouse, keyboard or sim-

(7)

1.2 Project

interfaces consists of two main parts: speech recognition, where an acoustic signal is transformed into textual words, i.e. the user’s speech is being recognized by the computer, and speech synthesis, which transforms text into speech. Even though a speech interface can transform speech to text and text to speech, it cannot in itself understand what the user is saying.

Speaking is a highly natural process for humans, while typing on a keyboard or clicking a mouse are not. Speech interfaces in technology have therefore been a hot topic for over 20, even up to 40, years [11]. People talked about how is would change the way we interact with our technology; our telephones, computers and other connected devices. Through the two parts of speech interfaces; speech recognition and speech synthesis, users would be free of the constraints of pointers, keyboards and even screens and instead be able to use their voice to control the devices. However, the technology was for a long time not evolved enough to provide a pleasant interactive experience for the user; the synthesized speech was unpleasant to listen too, and the recognition did not properly understand acoustic input. With time, the technology improved and today speech interfaces are widely spread and integrated in a variety of devices and programs, including but not limited to smartphones, computers, smart-watches and speakers.

1.1.4 Recipes and cooking

With the first recipes written on cave walls, followed by stone tablets and parch- ment rolls, today’s collections of recipes are quite different [15]. Cookbooks have been around for about 2000 years with the first collection of recipes written down in a cook by Marcus Gavius Apicius. Even though web based recipe sources are be- coming increasingly more popular, cookbooks are still massively relevant as the most popular cookbook in the United States in 2016 was sold in over 400 000 copies[18].

However, the accessibility and cheapness of recipes on websites and apps is not easily overlooked. Most big grocery stores have both apps and websites where they present recipes, and many recipe websites make sure to also have an app-version of their recipe database. When looking for inspiration and related works for this project, a few apps with audio-functions was even discovered. Mondelez Sverige AB (Philadelphia) has made an app called ”Voice Cooking App” where users can search for recipes with audio commands and use their voice to get guided through the recipe, but that doesn’t read anything out loud itself. An app called SideChef was also found which reads you the instructions of a recipe and listens to audio to know when to proceed to the next step of the recipe. These apps are both later compared to the project program to deepen the understanding of conversational interfaces integrated into the area of cooking.

1.2 Project

The purpose of this project is to evaluate how conversational agents can be used to assist in cooking and recipe searches by creating and evaluation a conversational agent with a speech interface for said purpose.

1.2.1 Problem situation

Modern conversational agents are wide spread and include a wide variety of functions, but some areas are still unexplored. The most advanced conversational agents

(8)

1.2 Project

can order in food for you and search the web for recipes, but other than that they are not particularly useful when it comes to food and cooking. Imagine you are in the kitchen cooking from a recipe on your phone or tablet and your hands are dirty.

You need to know the next instruction for the recipe, how much of an ingredient to use or how long you should knead the dough, but the screen is locked on your device. You have no choice but to wash and dry your hands or just make something up and hope it turns out well. Now imagine you are walking home from school or work and are wondering what to cook for dinner. It is freezing outside so you’d rather not take up your phone and take off your gloves. You could use your phone’s build in conversational assistant to help you search the web for recipes, but you’d still need to take out your phone to see the results and choose a recipe. These two problem situations lead to the decision to develop a conversational assistant that completely through conversation makes it able search for and select a recipe and get information about the recipe in all stages of the cooking process.

1.2.2 Goal

The goal of the project is to develop a conversational assistant that can eliminate all need for other interfaces and guide the user through all stages of recipe searching and cooking, and then to evaluate this use of the conversational agent technology.

The assistant should have functions such as saving the ingredients of a recipe as a grocery list on the device, going back and fourth through a recipe and providing information about amount and instructions for an ingredients. The evaluation will then be used to determine how people think conversational agent technology can be useful and helpful in the intended areas, if the developed agent works as expected, and explore possible improvements and extensions of the agent.

(9)

2 Related works & tools

In this section, similar programs to the project program will be discussed, as well as similar works about conversational agents and the opportunities and possibilities of conversational agents. Then, tools used in the project will be presented.

2.1 Related works

Conversational agents are all around and can be built in a variety of different ways.

Here, two cooking apps with conversational and visual interfaces will be presented and their features will be discussed. Then, similar works about conversational agents and different ways of implementing a conversational agent are presented.

2.1.1 Similar programs

When researching for this project two similar applications to the project were found.

They are both recipe and/or cooking apps with conversational interfaces of different complexity, and where used for inspiration for some of the features of the project.

Voice Cooking App is an application that can be downloaded to smartphones where users can search for recipes with audio commands and use their voice to get guided through the recipe. These features are similar to what is attempted in this project, but has a much simpler conversational interface and Natural Language Processing usage in the way that it only responds well to one-word commands. For example, when a user wants to search for ”Cheesecake”, he/she just says ”cheesecake” when on the search page, or when they want to go to the next step, they have to be ”in” the recipe and then say ”next”. This makes the app different from this project in the way that it has to be used together with its visual interface and the interaction with the app is less natural and conversational. The Voice Cooking App also does not read anything out loud; the user reads the instructions and ingredients on the screen of their smartphone, with of course also requires the visual interface.

A visual interface is not a bad thing and can be useful for increasing understanding of the instructions, or provide an alternate source of information if the user doesn’t understand an ingredient or what a certain kitchen gadget is. However, a visual interface can be limiting in some cases, for example the case when a user wants to search for a recipe without having the phone out of his or her pocket because of cold weather. An idea that came from examining this app was to have an visual interface, but only as addition to, and not instead of, the functions of the project program. This way the user can choose to or not to use the visual interface and has the option to just use the program with the conversational interface. Other features that the Voice Cooking App has is integrating a timer into instructions that requires keeping track of time, and functions for making notes on the recipe and charing it with other users.

The other app that was found is called SideChef. This app combines the con- versational and visual interfaces as discussed above by reading the instructions of a recipe, while showing a picture related to the instruction, and listening to audio to know when to proceed to the next step of the recipe. Similar to Voice Cooking App, this app seems to only accepts short voice commands like ”next” to show the next instructions. It does not use voice control for recipe searches or other functions than listening for when to go to the next step. To only have functions for short

(10)

2.1 Related works

commands could be considered ”simple” and ”boring”, but it comes with one clear advantage: easy interpretation. If the user knows they’re expected to just say next, and the program only accepts ”next” and nothing else, NLP is not necessary, and the disadvantages and problems that can come with NLP disappears. Also similar to Voice Cooking App, the SideChef app has functions like keeping track of time, rating recipes and sharing photos and recipes with friends and other users. To have a rating function could be considered useful because users get information about what previous users thought of that recipe. To integrate a timer into the instructions when it is relevant also eliminates the need for the user to have to switch programs in order to set a timer. Even though most smartphones have built in conversational agents that can set a timer, it can cause problems if apps are switched automatically.

In conclusion, the lessons learned from Voice Cooking App and SideChef are that it could be a good idea to integrate a visual interface into the program, to integrate a timer into relevant instructions, and to have functions for rating and sharing recipes.

This will not be pursued in this project, but could be part of future improvements.

2.1.2 Related works

Conversational agents can be implemented in several ways. This section will provide an insight in some similar works and methods, and also an idea of the existing possibilities regarding conversational agents.

Speech recognition with deep learning

A paper that is related to this thesis project is about speech recognition implemented with deep learning [7]. Since the conversational agent created in this project uses speech recognition it is of relevance to have an insight in how the realization of it can vary. The thesis of the mentioned paper was to research how speech input can be translated into text format with deep learning.

Deep learning is a way of establishing rules for processing data with help of multiple layers of non-linear information, which are varying layers of abstraction.

The definitions are created in the bottom level and then, with help of hidden layers, an output is generated as visualized in Figure 1 below. One example is that the input is a vector and the different layers are operators processing the vector into an output of desired form. Deep learning connections, or neural connections, and their algorithms can be structured in many ways, for example as recursive neural networks, convolutional neural networks and deep belief networks.

(11)

2.1 Related works

Figure 1: Visualization of Deep Learning [2]

The techniques of implementing deep learning are divided in two parts; supervised and unsupervised. Supervised deep learning is based on training examples of input and output layer pairs where the category of the input is known. Unsupervised training on the other hand, is training where the input is not labeled, and so there is not possible way of evaluating the accuracy of the algorithm. The advantage is that the classification does not have to be done in advance and therefore saves time.

Usually speech recognition is implemented with a combination of these two methods, first with unsupervised pre-training, increasing the efficiency of the supervised training by initializing the weights of the networks – pattern analysis. The created speech recognition in the study was very sensitive to noise in the input and did not perform as desired.

Conversational Agents Implemented with Neural Networks

One similar project that was encountered was a chatbot created with help of a neural network-based model, which is a way of implementing deep learning that is explained above [17]. It means that the system is teaching and learning by itself, trying to resemble biological neural networks such as the ones inside the human brain. In other words the model’s training is based directly and only on the conversational data. As a consequence, neural networks require a lot of conversational training before the actual task solving phase can start. On the other hand, the method does not entail any purposed focused pre settings, meaning once the model exists it can be used for various areas.

Figure 2: Neural networks [19]

(12)

2.1 Related works

In the study, two types of conversational agent were discussed – divided into retrieving and generating conversational agents. Retrieving agents are controlled by machine learning with databases as foundation, this is the kind of agent created in this project. It means that the NLP addresses to the invocation depending on an existing set of information. The clear benefit is that it is easier for the program to determine the context and take the relevant action, but the downside is that it requires a big amount of work for constructing the database, and that there is no way to assure that the program won’t stumble upon something that does not fit in the database and therefore can’t be interpreted by it.

For the generating agents, such as the one evaluated in the study, the answers and suitable actions are governed by earlier conversations, and generated completely based on the user’s input. The advantage of this is that all kinds of inputs can be handled by the program and with much training presumably make the program efficient, but the downside is that there is a big chance the chatbot will offer wrong, irrelevant or incomprehensible support.

The conversational agent in the study was trained to assist the user in booking a restaurant by sequence-to-sequence training, in this case learning how to map between input and appropriate answer. Some problems with such systems are that they have troubles with handling diversity of utterances, they can tend to display commonly used phrases of the training like “yes”, “no” and “thanks” to maximize the probability of success even though it is not related to the conversation and also, consistency of the agents output is unstable due to inputs from different users and variation in language. The produced agent had trouble with displaying options and provide extra information, with 0 % per conversation accuracy. This indicates that neural networks such as the one created in this study requires a big amount of data to work properly. Some improvements were presented in forms of displaying information of how sure the system was about the answer and training the program to detect when it is out of context.

Sentiment based Chatbots

Another way to create a conversational agent is by constructing it based on sentiment, in other words the agent’s reply is determined depending on the attitude of the input [4]. It could be the feeling of the user’s input, the emotional reaction the user wanted to provoke by the input or the user’s evaluation of something.

One paper regarding this was done on short texts found on Twitter, and simpli- fied to association on only negative or positive sentiment. The Natural Language Processing part in the paper was implemented with help of Python package NLTK which uses a statistical method named Naive Bayes classifying, somewhat similar to the method in the section above. The big difference is that the learning algorithm is combined with a big set of data. The advantage of using statistical methods compared to decision trees and such is that the program will be better at guessing the appropriate answer instead of being clueless if the specific situation not presented before.

In the mentioned paper, the method was to filter text on certain keywords which expressed positive or negative emotions, such as amazed, angry, lucky, excited and

(13)

2.1 Related works

disappointed. Then the agent was trained on a set of a manually annotated data since it must base the sentiment association on something. And thereafter the training set with associated sentiments was sent to a classifier that with used the information to determine the probability of a certain sentiment based on the con- taining keywords. With help of the generated statistic model the program lastly determined the probable category for a freshly selected tweet.

Sentiment based agents seem proper to use when the only purpose is to speak casually on a very simple level and when the goal is to get a proper reaction based on the emotional state of the communication. But when upholding more complexed conversations or task focused dialogues they do not appear appropriate. By using more keywords and maybe by creating a more context-based dialogue with help of expressions the accuracy of the association would be increased and then maybe they could be of use in this project for the part responsible for small talk.

Chatbot trained on movie dialogue

The large set of data that a data-based conversational agent uses can be generated in several ways; it does not have to be annotated manually. In one project similar the author used movie subtitles to train the chat bot [14]. The results were not so promising and the responses generated by the conversational agent were more often out of context than not. This can perhaps be explained by the abrupt ways of transition in movie dialogues, the changes between scenes and maybe that the database was not big enough. One other disadvantage is that the type of movie selected is crucial for how the agent becomes. The target group, genre of the movie and other things regarding it will decide the behavior of the agent. But the idea of using movie dialogues was very interesting since it is a possible replacement for manually generating databases and therefore increases the automatization of creating conversational agents.

Conversational Agents in the Enterprise

One other study treated the subject of conversational agents used in enterprises [12].

Firstly the various possible areas use for conversational agents were discussed and thereafter more specific about how it could be applied in the enterprise section and what difficulties then could be encountered. Mostly it was concluded that conversational agents can be helpful providers of information regarding companies’ products and services. It also concluded that they could act as a cost-effective substitute or at least complement to customer service, guided selling, website navigation and technical support. It was believed and predicted that the use of conversational agent would increase drastically over the years. One other paper [21] even claims that the conversational interface will substitute the need of programming by hand one day.

Summary

In summary neural network based agents, compare to those founded with a database, can be applied more generally and have a good capacity of handling unknown inputs but unfortunately have the tendency of not being able to fulfill its purpose if not trained well. Furthermore a sentiment based agent is not enough to cover the goal of this project but can be implemented as a part of is, and if developed further, cover the casual conversation part of the program. It is also noted that conversational

(14)

2.2 Tools

agents have a broad spectrum of utility, nonetheless inside the enterprise sector, and are predicted to be of bigger and bigger importance in the society, and one day even substitute the need of programming by hand.

The best approach for this thesis project’s purpose seems to be implementing a conversational agent that is trained with a big set of data and thereafter uses machine learning and statistics. Another conclusion that can be made is that the project is of great relevance since the applications and use of conversational agents are expanding drastically.

2.2 Tools

In this section, information about the data, tools and libraries used for the project is presented. A library, sometimes referred to as module, is a collection or pre- configured selection of routines, functions, and operations that a program can use.

In this report it is assumed that the reader is as familiar with coding language as a peer student on the verge of taking the Bachelor’s Degree in Engineering Physics.

Therefore the concepts of classes, objects, if statements and while loops will not be explained further.

Python

Python is the programming language chosen for this project and there are several reasons for that. For starters Python is easy both to learn and understand, it has a really applicable standard library with simple but efficient built in functions and expressions, and it is supported by many systems and platforms which makes it suitable for integration of different sources. One more benefit with Python is that is is so commonly used and for such many different purposes it has a broad amount of recourses, both in form of support for problem-solving in the coding, of SDKs for platforms and of applicable libraries that are more explicitly described below.

Scraping the recipes, part 1: Requests

Requests is a Python module that allows users to send HTTP/1.1 requests without having to add query strings to URLs, or to form-encode POST data [13]. It allows users to add content like headers, form data, multipart files, and parameters via Python libraries and can likewise allow access to the response data of Python. Requests has a long list of features, but is only used in this project to access the source code of a website. Compared to its alternatives, for example urllib and urllib2, Requests requires shorter code and is considered easier to use. Requests encodes the parameters automatically which allows the user to just pass them as simple arguments, unlike in the case of urllib, where you need to use the method urllib.encode() to encode the parameters before passing them.

Requests is thread safe and automatically decodes the response into Unicode. In addition, Requests has superior error handling. If the authentication failed, urllib2 would raise a urllib2.URLError, while Requests would return a normal response object, as expected. To see if the request was successful when using Requests, the user just has to check the boolean response.ok.

(15)

2.2 Tools

Scraping the recipes, part 2: lxml

The lxml XML toolkit is a Python module that works as a Pythonic binding for the C libraries libxml2 and libxslt [16]. A binding from Python to a library is what you call an application programming interface (API) that provides glue code to use that library in Python. It combines the speed and XML feature completeness of the libxml2 and libxslt libraries with the simplicity of a native Python API.

It is the most feature-rich Python library for processing both XML and HTML.

Other positives with this module is that it is very fast and memory efficient, and it works well in combination with Requests. The lxml package is in this project used to scrape data from the source code of websites using the xPath-functions.

An alternative to this module is Beautiful Soup, which name comes from ”tag soup” and indicates that the module is specialized to handle invalid marking. It is a beginner friendly package that creates a parse tree that can be used to extract data from HTML, and also automatically converts incoming documents to Unicode and outgoing documents to UTF-8. It is slower but more flexible than lxml. lmxl therefore seems to be the right choice for when you know that the websites will have straight forward formation and marking, and your program is simple enough to not need the complexity or flexibility of Beautiful Soup.

Scraping the recipes, part 3: Recipe Scrapers

Recipe Scrapers is an open source Python module made by Hristo Harsev that works as a simple web scraping tool for a variety of recipe sites[8]. It collects the title, cooking time, ingredients and instructions of a recipe. It uses urllib and Beautiful Soup to parse and scrape websites. Benefits from using this module is that it saved some valuable time. It had many of the features that were wanted for the program and could therefore be used in a simple way without modification. However, if the alternative of building a similar program ourselves would have been chosen, maybe with the help of Requests and lxml, we could have had a better understanding of how the recipe scraping works and the project program would have been more consistent.

Recipe website: Bon Appetit

When looking for a website to use for the project, a few aspects were prioritized. The website should be in English, have a big and varied database of recipes, have a flexible and well functioning search function, and preferably be on Recipe Scrapers’ list of recipe sites that they can scrape. The sites from Recipe Scrapers, which are all in English, were therefore examined and after having some difficulty with the search function of some of the other sites, including Jamie Oliver’s and BBC Food’s sites, Bon Appetit was chosen [5]. It is available at https://www.bonappetit.com/ and was chosen mainly due to its surprisingly big database of recipes of all kinds and its superior user-friendly search function that handles misspellings and conjunctions well. The Bon Appetit website originates from a food magazine with the same name published by Cond´e Nast. One clear disadvantage of Bon Appetit is that a few of the recipes pages have a different layout than the others, and Recipe Scrapers has a hard time scraping all of the information. However, it seems like this is the case for most modern sites due to campaigns etc. and this would probably be a problem with most other sites as well.

(16)

2.2 Tools

Exchanging Data: json

Json, short for JavaScript Object Notation, is a format on files that is used for transferring information in this project [1]. It constructs the information that is about to be transferred into data objects with two main structures: pairs between attributes and values, and an ordered array data list. This means that all the information in the jsonfile can be reached with help of keywords and indeces, making the stored information truly easy accessible.

Natural Language Processing: Dialogflow

If a conversational agent is to be created it needs to have a natural language processing, NLP, part inside the program. This is the part that will interpret the input from the user and convert this into useful information in forms of fulfillments. It is the part that translates input to action, the part that ascertains what the information means and what to do with it, from natural user requests into actionable data.

There are many different ways to implement NLP, either by programming it from scratch or using already established databases and platforms. Since the time resource for this project was limited and the main focus was not in the NLP creation, one online platform was chosen named Dialogflow. The main reason Dialogflow was chosen is because it is Google – based, and since Google is used in such big extent all over the world with so many applications and with good repute, Dialogflow was presumed to be well developed. An other argument that speaks for Dialogflow is that the information delivered by its NLP can be extracted as a json file which as mentioned has its advantages.

Dialogflow is truly a user friendly instrument for the developer, it uses only two parameters for implementing the NLP; intents and entities [9]. Intents can be explained as a group of expressions that are to be linked to the same action, or as the parameter that categorizes the input [10]. For example in this project one intent was ‘Recipe Search Trigger’ which was activated by phrases such as “Can you help me find a recipe?”, “Search for a recipe” and “I am hungry”.

A problem occurs if several intents have similar training phrases and their purpose is completely divided. Since the NLP has to interpret unknown expressions the deviation between two intents can be hard. A logical consequence of this is that the developer has to be very precise and defining on what kind of expressions should be mapped to respective intent, keeping the training phrases unlike and separated.

One offered solution, provided by the Dialogflow developers, is the possibility to pri- oritize intents, but this only works if some intents are preferred or more important.

Entity is the parameter that Dialogflow uses to sift out the useful information of the input. One entity in this project was ‘ingredients’, which is an ingredient- dictionary with 1200 words and phrases. This means that every time the input contains a word or phrase that can be found in the dictionary or is likely to fit inside it, Dialogflow recognizes it as an ingredient and saves the value.

Dialogflow has several prebuilt agents that can be imported and used together with the program. One easily activated is ‘Small talk’ which is a module for casual conversation. Of course the small talk would be of higher quality if developed

(17)

2.2 Tools

manually and customized, but for this project the simple provided small talk was good enough. The small talk - module provides a feeling of a much more developed conversational agent that is more pleasant to interact with.

Text to speech: pyttsx3

Pyttsx3 [3] is a speech synthesis module for Python that includes drivers for text-to- speech synthesizers on OSX (NSSpeechSynthesizer), Windows (SAPI5) and Ubuntu (espeak). Pyttsx3 is then used to register and unregister event callbacks, produce and stop speech, get and set speech engine properties, and start and stop event loops.

Speech recognition: SpeechRecognition

SpeechRecognition is a Python library for performing speech recognition, with support for several engines and APIs, online and offline, including CMU Sphinx, Google Speech Recognition, Google Cloud Speech API, Wit.ai, Microsoft Bing Voice Recog- nition, Houndify API, IBM Speech to Text and Snowboy Hotword Detection. It supports 120 languages, and defaults to the operating system’s language if nothing else is stated.

(18)

3 System

In this chapter the program with its sections and functions is presented. The sub- sections of the chapter follow the sectioning of the code to give a clear view of how the program was built. Each subsection gives information about functions in that part of the program, and what and how modules have been used.

3.1 Overview

The main code of the program is parted in four sections: Speech interface, Natural language processing, Web search & web scraping and lastly a section for operating in recipes. The other part of the system is the online platform for natural language processing mentioned earlier - Dialogflow.

The structure of a how a user’s input is handled and a response is triggered looks like this:

Figure 3: Flowchart of the how the user’s input is handled in the program 1. The user speaks.

2. The speech interface part of the program translates the speech to text and forwards it to the NLP part.

3. The data is sent from NLP to Dialogflow, and is then interpreted and a json file is generated.

4. The json file is extracted to the NLP part of program.

5. Depending on what the intent is activated in Dialogflow, the response from Dialogflow is either directly sent to the speech interface which performs a text to speech translation and reads the response to the user, or the response is first sent to one of the other two parts of the program; operating in recipes or recipe search and scrape, and from there to the speech interface.

6. The program goes into an infinite loop, which can only be broken if the quitting-intent is activated in Dialogflow (not shown in flowchart).

(19)

3.2 Dialogflow platform

The flowchart in figure 4shows how the program’s functions are related to each other and how a user can navigate through the program.

Figure 4: Model of the program’s functions’ relations

When the program is started a short greeting and introduction is read to the user. Then the user can choose to either small talk with the program or to directly start a recipe search by triggering the recipe search intent in Dialogflow. If small talk is started the user can choose to go to searching for a recipe at any point. When a search for a recipe is initialized, a recipe must be selected or the search aborted before another function can be used. Only if a recipe is selected can the functions in the lower part of the diagram be performed. If a new search is to be performed, an existing recipe must first be canceled. This is to prevent the program from aborting a recipe as a result of a misunderstanding of the input. For example, if the user has a recipe selected and says the name of an ingredient, the program might interpret this as a request to make a new search for that recipe, and to make sure it doesn’t cancel against the user’s will, it asks “Do you really want to cancel the current recipe”. At any point of the recipe part of the program, marked green in the diagram, except for when selecting a recipe, the user can go back and forth to small talk.

Now follows a more in-depth explanation of the components of the system.

3.2 Dialogflow platform

Since this conversational agents purpose is to assist in the kitchen, the NLP had to be focused on recipe making, recipe recitation and somewhat on causal language understanding.

3.2.1 Created intents

The intents used for recipe making are ’Asking for meal’, ’Trigger recipe making’

and ’Asking for ingredients’. The first takes into account when the user expresses the

(20)

3.3 Recipe class

will to cook a meal, but not specifically which one. It has training phrases such as

’I want a recipe for something to cook for dinner’ or ’I am in the mood for a snack’.

The ’Trigger recipe making’ intent takes many different ways of expressing interest in making a recipe but without ingredients to search for, with training phrases such as ’Can you help me find a recipe?’ and ’I am hungry’. The last intent of the recipe making kind is the one triggered when the user asks for ingredients, the input can be ’Can you search for pizza?’, ’Can you find me a recipe with arugula and feta cheese’ and ’Help me to cook something with tomatoes, beans and rice’.

The next section of intents is the one that is recipe recitation oriented. The goal with the agent is not only to find a recipe but to guide the user through it and to be able to understand different commands regarding operating in it. Intents used for this are ’Trigger specific recitation’ which is activated when the user asks for something specific regarding the recipe for example ’How hot should the oven be?’,

’Trigger start recitation’ which activates during general recitation commands such as ’Read me the recipe’ and ’Trigger step recitation’ which is called when the input is step focused, with training phrases such as ’What step am I on?’, ’What is the next step?’ and ’What is step number 5?’.

The last section of entities was developed because the agent had trouble with mixing together casual language and vital information. For example the word ’okay’

was categorized as an ingredient. Therefore intents for confirmation, opposition where created as well as an intent for presentation of the program that answers questions such as ’What can you do?’ and ’How can you help me?’. The names of the intents of this section are ’Yes’, ’No’, ’Ok’ and ’Presentation of the program’.

3.2.2 Created entities

The entities of the program were as well as the intents chosen partly recipe focused and partly to sift out unuseful information. The intents that carry important parameter values regarding the recipe are ‘ingredients’, ‘kitchen tools’, ‘time’, ‘temperature’, ‘verbs’ and ‘courses’. Entities used for maneuvering in the recipe are

‘before’, ‘now’, ‘next’ and ‘start over’ and the intents that are used to separate the recipe values from causal language are ‘filling words’, ‘no’ and ‘yes’.

3.3 Recipe class

To keep track of a current recipe, a Recipe class instance is initiated when a recipe is selected. A Recipe instance has fields for all relevant attributes of a recipe; title, cooking time, amount of servings, ingredients and instructions. Title, cooking time, ingredients and instructions are all set with the help of recipe scrapers when the instance is initiated, and the amount of servings is found with the help of the lxml module. This is because recipe scrapers did not have this function, and that the amount of servings was considered an important piece of information for a user to know about the recipe.

3.4 Speech interface

The speech interface of the system has two parts, text-to-speech and speech-to-text.

The text-to-speech part uses the pyttsx3 module that accesses the computer’s op-

(21)

3.5 Natural language processing

synthesizer is called NSSpeechSynthesizer and is the one used in the project.

The speech-to-text part uses the SpeechRecognition module. Here, SpeechRecog- nition is being used online with Google Speech Recognition that uses Google’s deep learning neural network algorithms¹. The recording works by waiting until the user has started speaking, and then recording until it encounters a set amount of seconds of silence. If no audio could be recognized, either if the set amount of seconds passed without any sound or if the sound was unrecognizable, or if it failed to request results from Google Speech Recognition service, the listening-process repeats until successful.

3.5 Natural language processing

One part of the code is named NLP and has both some functions for understanding the user and some that is more general. The conversating - function is the most frequently called function in the whole code and is the connection to the NLP platform.This function is responsible for sending in the input and extracting the json file with all the useful information. One other NLP function, yes or no,is called in any situation where it is desired to ensure what the user wants.

The most important of the general functions in this section is trigger recipe, which is called when the ’Make recipe’ intent is activated. It has the task of finding out what kind of recipe the user want to search for, making sure the keywords for the search are correct and thereafter send them to the search recipe - function in the Web search & web scraping section of the code. If a recipe is freshly selected it means that the step object has to be reseted and also a few alternatives for progression of the program has to be provided, which is the purpose of another function in this section named select or not.

3.6 Web search & web scraping

The program has a section of code called ”Web search and web scraping” that is responsible for finding recipes for keywords provided by the NLP-section of the code, present these to the user, and get the user to select a recipe. When searching for recipes, the program searches the Bon Appetit website for the keyword or keywords identified by Dialogflow, presents some of the found recipes to the user, and then asks the user to either make a choice between one of the presented recipes, to ask to hear more choices or to quit the search.

3.7 Operate in recipe

All the functions in this section are used to operate in the recipe once one is selected.

They are called inside the main function, which one depends on the activated intent. If the Recipe class object is of nonetype all these functions will return the error prompt ’No recipe selected’.

They are pretty straight forward, read whole recipe displays and recitates the whole recipe when called, read step uses the json file to chose and present a specific step of the recipe, find step returns the instruction for a certain step-number,

1https://cloud.google.com/speech-to-text/

(22)

3.7 Operate in recipe

read specific searches for a specific instruction with help of keywords and also uses the most common function to determine which sentences in the instructions that best match the search. read beginning is the function called when general recitation commands are made such as ’Read recipe’ and ’Can we start cooking now?’, and at the same time it calls several other functions if the input contains certain words.

(23)

4 Evaluation

In this chapter, the method and results of the evaluation are presented. The purpose of the evaluation is to get data from more sources than us developers as to how well the program is performing and opinions on the usefulness of it.

4.1 Method

For the evaluation of the program, 10 people were asked to test it. They were not given any prior instructions except for the introductory greeting given by the agent when the program was started saying that the user has to wait for the word

”listening” to come up on the screen before talking. They were given a few minutes to try the program freely to make themselves familiar with it, and when they felt ready, they were then asked to perform a number of tasks. The tasks were all commands that the program should be able to perform, but since different people have different expectations on how different features should work, and formulate sentences differently, the tests were expected to provide useful information about possible improvements of the program. The tasks were:

• Find a recipe with one key ingredient.

• Find a recipe with two key ingredients.

• Find a recipe with more than two key ingredients.

• Select a recipe, then go through the steps chronologically.

• Ask to hear all the ingredients.

• Get it to save the ingredients as a grocery list.

• Ask about the amount of servings.

• Ask about the amount of a specific ingredient.

• Ask about what to do with a specific ingredient.

• Ask about something specific in the instructions, (for example oven temperature).

• Ask to hear a specific step (for example step 5).

• Cancel the recipe and search for a new one.

• Try to have a conversation with it (small talk). Tell it about how you are feeling or ask about how it is doing.

When performing the tasks, the tester was asked to answer two questions for each task on a scale of 1-5:

• Did it work as expected? 1 = No, 5 = Yes.

• Do you think it is a useful function? 1 = No, 5 = Yes.

After finishing all of the tasks, the tester was asked to answer the following questions:

(24)

4.2 Results

• Did the program do everything you thought it would?

• Did it work the way you expected? If no, specify.

• Do you think any functions need improvements? If yes: what and how?

• Do you miss any specific functions? What?

• What do you think of how well the program understands you?

• What to you think of the speech synthesis?

• Would you use this program if it was available in the form of an app? Why/why not?

• What could be done with the program to increase your interest in using it in the future?

and to leave any other comments they had.

4.2 Results

Presented in table 1 is the average score of how well the program features lived up to the testers expectations and how useful it was, both on a scale of 1-5, where 5 was the highest.

Table 1: Average scores in evaluations on a scale of 1-5.

Task Expectations Usefulness

1 ingredient 4.56 4.89

2 ingredients 4.22 5

More than two key ingredients 4.33 4.89

Go through the steps 4.56 4.78

Hear all the ingredients 4.78 5

Save the ingredients 5 4.89

Amount of servings 4.22 4.89

Amount of an ingredient 4.44 5

Instruction for 4.44 4.78

a specific ingredient

Ask about something specific 3.22 4.78 in the instructions

Ask to hear a specific step 4.11 4.22

Cancel the recipe and search 5 4.78

for a new one

Small talk 3.78 3.11

As shown in the table, all features were judged useful (between 4 and 5 on a 1-5 scale) except small talk. Most features functionality lived up to the testers’

expectations, but the ones with the most problems were to ask about something specific in the ingredients, this could be asking about oven temperature or how long to cook something, and small talk.

(25)

4.2 Results

Features

Everyone who tried the program said that the program had all of the expected features for a program like this, and one third say that the program could do even more than they expected. No one missed any features that they were expecting.

However, when asked what features they would like to see added, there were a few suggestions. They were: being able to see pictures of the dish, being able to see a film of the steps/instructions, to be able to pick that it should only show a certain type of recipes, for example vegetarian or vegan recipes, or recipes excluding certain ingredients, and to be able to pick metric or imperial units in the beginning.

Feature improvements

When asked about function improvement and what could be done to increase the testers’ interest in the program, most people claimed that they needed the program to be faster. They seemed to have some problems with the speech recognition and speech synthesis and said that it was too slow and that its ability to understand simple works. For example, a few testers had problem with the program interpreting

”two” as ”to”, and ”three” as ”tree”. The feedback for improvements also had to do with the NLP, and the interpretation of the input. At one point, a user said

”I’m feeling happy” and the program answered with ”Oh no, what happened?” as it would have if it interpreted the input as the user feeling sad. The testers also had some issues with the program not understanding synonyms or alternative names for ingredients or cooking equipment. For example, one tester had the ingredient

”coconut oil” in the ingredient list, but in the instructions it referred to the ingredient as ”oil”, so when the tester asked about what to do with the coconut oil, it got the response that the ingredient could not be found in the instructions. Other testers experiences similar cases.

NLP and speech recognition

When asked about the NLP and how well the program understood the inputs, some problems were noticed. Some of them have already been mentioned in ”Feature improvements”. Most testers said that the program understood them okay, but sometimes ran into trouble with pronunciation or certain sentences. Only one tester had no problems what so ever with the NLP and speech recognition. A few testers seemed to have a problem with the program not understanding their accent, and one tester thought that maybe it prefers American pronunciation as apposed to British.

Another problem that some testers had was that they could not pause in the middle of a sentence because the program would cut them of and try to interpret only half of the sentence as input. This mostly happened when testers wanted to search for more than one ingredient and paused between the ingredients, and the program therefore didn’t perceive all of them.

Speech synthesis

Not surprisingly, most testers said that the program sounded ”robot-like”, and left comments like ”neutral”, ”not very fun” and ”sounds like a computer”. One tester noticed faults in the pronunciation of some words. Another tester said that it would be good if the user could interrupt the program in the middle of reading something, maybe with some sort of verbal command, so that the user wouldn’t have to wait until the program had finished reading in case they didn’t want to hear any more.

(26)

4.2 Results

Recipes

All of the testers said that the recipes from the Bon Appetit website were excellent and varied. One person had some problems when searching for more than one ingredient with it not presenting recipes with all ingredients, but with only one or two of the three that were searched for. Some user also noticed that the program missed one or two ingredients when it said what it was searching for, but then still searched for the right thing. It would say ”Do you want to search for ingredient 1, ingredient 2 and ingredient 3?”, the user would say ”yes” and then it would say ”Searching for recipes with ingredient 1”, but still present recipes with all ingredients. Another problem that arose during the evaluation is that sometimes the same recipe was listed two times in the same search.

Personal use

When asked if they would use the program if it was an app that they could download to their phones, testers said either yes or maybe. Two thirds say that they would use it, and two of them gave the comments ”perfect to use in grocery store to get shopping list” and ”It provides excellent tips for recipes”. Out of the ones who answered maybe on this question, one said it depended on if they were alone or not.

Small talk

When commenting the small talk feature, most testers said that it was a fun feature, but that it was not necessary for the program. One tester said ”I’m not interested in talking to my cook book”. There were also comments about the program not being able to understand enough to be able to carry a conversation with, which reflects on the rating below 4 in how it lived up to expectations in table 1.

(27)

5 Discussion

5.1 Evaluation

The evaluation of the program provided some insight into the flaws and possible areas of improvement of the program, as well as opinions on how well conversational agents can be integrated into cooking and used as a cooking assistant.

First of all, the program’s features’ ratings for usefulness were all high except for small talk. Some of them were rated 5 on a scale of 1 to 5 by all testers, and many others were rated between 4.5 and 5. The only one below 4.5 except small talk was hearing a specific step (for example step 5). It can be hard for a user to remember what a specific step is, and to ask for a step by number will therefore not be as relevant as asking about what to do with a specific ingredient or cooking equipment. The ratings of how well the functions lived up to the testers’ expectations varied some, but were for most functions between 4 and 5, which indicates that the testers were for the most part happy about the program’s performance. Asking about something specific in the instructions got a lower rating, 3.22, which indicates that many of the testers had problems when doing that task. This can be because of problems with Dialogflow discussed in section 5.1.1, but could also be because of improper coding. To know what the reason behind this the function needs to be evaluated further and the failed tries analyzed.

Regarding feature improvements, the main focus of most of the testers was the speed of the program. The problems with the speed seemed to be mostly related to the speech recognition and speech synthesis modules, but also emerged because of connectivity issues at some points. The connectivity issues had nothing to do with the programs, but are still interesting, because it seemed as though the limiting factor sometimes was the uploading speed of the network and not the downloading speed. When communicating with the Google Speech Recognition API, audio files are uploaded and analyzed by the API. If the uploading speed is too low, it will take longer to send the audio files and it will appear to the user as if the program is still listening because their words haven’t been written out in text yet. The appearance of the program being slow can also have to do with how the speech recognition module works. As explained in section 3.5, the recognition function starts recording when audio is heard, and stops recording when a set number of seconds of silence is detected, without including the silence in the recording. If the set seconds of silence is long, the recognition will appear slow. This is a fine balance, because if the number of seconds of silence is too small, the recording will be cut off when the user is pausing between words, which is also something that some testers complained about. From these evaluations it can therefore not be concluded if the number of seconds should be increased or decreased.

Another improvement area was understanding of single words and short phrases.

The Google Speech API uses machine learning, which means that it in addition to trying to hear exactly what the user says, uses statistics to determine how likely it is that a word was said. Although Google is careful about revealing to much of how their APIs work, there is reason to believe that the algorithms they use can be adjusted and tweaked to perform better in different situations. For example, when our software asks for a number input, one could argue for the possibility of increasing the likelihoods for the input being a number, i.e. making it more probable that

(28)

5.1 Evaluation

the input is “four” instead of “for”. This would eliminate some to the problems the testers had with similar cases (”two” vs ”to” and ”three” vs ”tree”). If Google’s API cannot be adjusted to accommodate for this, there are other similar modules which surely can.

The problems discovered in the evaluation that were related to the NLP and interpretation of the input have to do with Dialogflow, and could most likely be eliminated with some adjustments. The specific case here was that ”I’m feeling happy” was interpreted by the agent as if the user was feeling sad, and could be the result of that the training phrase for the ”happy”-input is ”I’m happy” and the training phrase for ”sad”-input is ”I’m feeling sad”. The algorithms should, but don’t seem to know that ”happy” and ”sad” are the keywords and not ”feeling”.

If more training phrases with different formulations of the intent are created, the algorithms would be able to better identify the keywords and understand the context.

That the program did not identify synonyms brought up some issues in the evaluation. This problem could be solved in a variety of ways, with the most straight forward ones being either using Dialogflow, or defining a dictionary of relevant synonyms and hard coding the program to check for synonyms when searching for ingredients or instructions.

The issues the program had with some of the testers’ pronunciations comes with using Google’s Speech API and is hard to do anything about without switching to another module. However, since Google’s Speech API is one of the most advanced of the kind, we would probably not have more luck with other modules either.

The speech synthesis of the program got some critique for sounding, not surprisingly, ”robot”-like and ”like a computer”. Alternatives to using the computer’s own speech synthesizer is using a platform like Google’s Speech API but for speech synthesis. There are many options like this available, and they are constantly developing and getting more ”human”-like in their speech, but often users have to pay for them. One therefore has to assess the value that a more ”human”-like voice brings against the cost of using the platform.

The problems that arose related to the recipes and the recipe searching mostly have to do with how the Bon Appetit website works. When searching for recipes with multiple ingredients or keywords, the website will try to present recipes that have all of the keywords, but if it does not find many or any at all, it will present recipes with one or two of the keywords. The fact that recipes were sometimes listed twice in the same search is also a fault from Bon Appetit’s side, but that could easily be handled by a few lines of code in the program. When the program said what it was searching for and missed one or two ingredients, it was because of an error in the response from Dialogflow. In fact, the line that coded for that sentence could just be removed since it is already stated to the user what is being searched for when the user is being asked about if the identification of keywords is correct.

The testers’ positive responses when asked if they would use the program if it was commercially available indicates that integrating a conversational agent into the area of cooking and recipe searching worked well with our program. Out of the testers that said they would use the program, only two gave comments, and both

(29)

5.2 System limitations

of the comments were about how well it worked for the recipe searches. This could be an indication of several things; that the project’s conversational agent was more successfully integrated into the recipe searching area than the cooking guidance area, that those two specific users where more interested in finding recipes than having the recipe instructions read to them, or that the recipe search area is more fitting for integration of conversational agents than the cooking guidance area. To determine this, more evaluations would have to be carried out.

5.2 System limitations

Before the evaluation was started, but after the program was finished, some errors in the program were observed. These errors are addressed in this section along with some difficulties that were stumbled upon while developing the program.

5.2.1 Recipe scraping and selecting recipes

As mentioned in section 2.2, Recipe Scrapers sometimes has trouble scraping the correct recipe information from some pages. This stems from the fact that not all recipe pages have same layout, and that Recipe Scrapers is coded for the most common format only. At one point it was noticed that only some of the instructions of a recipe were scraped, so the source code for that particular page (https://www.bonappetit.com/recipe/blue-cheese-and-bacon-lettuce-boats) was exam- ined along with the code of Recipe Scrapers. It was then discovered that Recipe Scrapers assumed that all of the instructions where written in the same section in the source code, and failed to retrieve all of the information when it was not.

In the select recipe-function, it is assumed that the user will select a recipe by saying a number between 1 and 5, but will not accept for example ”I want recipe number 5”. This is because of bad coding and could easily be fixed. It could also be remedied by using Dialogflow to interpret the input, but since the function was coded before Dialogflow was integrated into the program, this option was not considered until later.

5.2.2 Dialogflow

The difficulties encountered when using Dialogflow was to make fine deviations and separations internally between different intents and entities. A problem that oc- curred often was that the training phrases implemented in an intent took over the small talk feature, reducing in the natural flow of the conversation with the agent.

For instance one smalltalk section was ‘I need a hug’, which instead was interpret as a recipe starter – training phrase, since it is so similar to ‘I need a recipe’.

Also the balance between utilizing the maximum machine learning of the program and still have a secured path for the correct interpretations is hard. The machine learning techniques of programs makes them “guess” what to do depending on earlier experiences or settings, in this case databases of training phrases and entities. And this is of course desired, but it is not optimal for it to guess too much. An example of this that, which was also brought up earlier, was problems with the word ‘okay’

since when mentioned, the NLP started the process of searching for a recipe with

‘okay’. The program is designed to search for words that are ingredient-alike, and in some way ‘okay’ was assumed to be one of them. One could ask how the program

(30)

5.3 Improvements & comparisons

jumps to that conclusion, but on the other hand it is wanted that the program is able to search for ingredients it has never encountered before. One way to address to this issue was to create an entity which was called ‘filling words’ to categorize words without any value for the purpose of the agent.

5.3 Improvements & comparisons

The suggestions that were raised as improvements to the program in the evaluations were being able to see pictures of the dish, being able to see a film of the steps/instructions, to be able to pick that it should only show a certain type of recipes, for example vegetarian or vegan recipes, or recipes excluding certain ingredients, and to be able to pick metric or imperial units in the beginning.

To be able to pick that it should only show a certain type of recipes, for example vegetarian or vegan recipes, could be easily implemented because of Bon Appetit’s similar function. Bon Appetit has filters that allows a visitor to pick if they want the recipe to be vegetarian, gluten free, healthy or vegan. It also has filters for selecting what meal & course one wants, for example dinner, breakfast, snack or dessert. Another filter allows a visitor to check in ingredients that must be in the recipe. There is no built in filter in Bon Appetit that allows a visitor to ”block” or exclude certain ingredients, but it could integrated to the program by coding, for example by, before listing the recipes, scraping them to check the ingredients and then excluding the one with the undesired ingredient.

The first two suggestions require a visual interface for showing pictures, texts and videos. As discussed in section 2.1.1, combining a visual interface with a conversational one comes with advantages like the ones suggested, and should be considered a relevant possible future improvement of the program.

To get a more extensive picture of how the conversational agent technology can be integrated to the area of cooking and recipe searching, the evaluation could be extended. More testers with different backgrounds and interests would provide a better and wider base of results. The evaluations could also be executed in the kitchen allowing the tester to try the program in a ”live” scenario, leading to more relevant results.

5.4 Ethics and sustainability

It is well known that it is more ecologically sustainable to eat a diet without animal products, but there are still ongoing discussions about how the necessary animal products are for human beings. The project program could encourage users to eat a more vegetarian or vegan diet, and maybe even only show such recipes, but could then risk being perceived negatively by users who consider animal products an es- sential part of the human diet. This constitutes a conflict of interest in the sense that, on the one hand, we want to emphasize and encourage sustainable development, on the other hand, we want to create a relevant service that has the potential to address as many users as possible.

While the conversational agent technology is improving, so is the usage of the agents, and the difficulty discovering when they are used for destructive purposes.

(31)

5.4 Ethics and sustainability

Spambots are examples of chatbots that automatically send out unsolicited messages through different channels. The purposes of this can be advertisement, to increase a website’s search engine ranking, or more destructive purposes like tricking users in some way. When it becomes harder to discover this, it also becomes easier for people to use this technology to scam people.

Another ethics related fear among people is that computers, machines and artificial intelligences will deprive people of their jobs. Technology development has effects in many areas, and especially in the work area. At an individual level, a person’s job could be replaced by technology, but people are skilled at adapting and usually find new tasks and opportunities. For an engineer, it can be as simple as from one day counting the numbers to the other day programming the technique which instead makes the calculations. In other situations, and especially for less educated people, it can be significantly more complicated. It can be difficult to get a further or new education if you are not be able to afford or access any education.

There is usually a solution; one can trade industry completely, change geographical area and so on, but for some people, it just doesn’t work out. This has made it more important than ever to think forward in the choices we make in life and think: ”will my education be relevant in 10, 25, 50 years? Will the profession I’m aspiring to still be a profession in the future or will it be replaced by technical solutions? The choices of education and job, and even the foundations of the educations themselves, are adapted to what is needed at the time and what is expected to be needed in the future. From a broader perspective, looking at an entire society, and considering the situation over a long period of time, the question becomes simpler. You can then look at data in the area and conclude that jobs do not disappear but are usually created by technology development. Of course, what the jobs are will change, for example, when a job within service is lost, maybe two jobs within IT are created. In summary, it can be said that from a short-term individual perspective technology can be problematic for people’s job situation, but rather from a longer-term social perspective, it is probably positive for the job market.

Conversational agent as kitchen assistant

Conversational agent as kitchen assistant

BEATA RYSTEDT MIA ZDYBEK

Chatbot med

konverationsgränssnitt som hjälpreda i köket.

BEATA RYSTEDT MIA ZDYBEK

Abstract

Sammanfattning

Contents

1 Introduction

1.1 Background

1.2 Project

2 Related works & tools

2.1 Related works

2.2 Tools

3 System

3.1 Overview

3.2 Dialogflow platform

3.3 Recipe class

3.4 Speech interface

3.5 Natural language processing

3.6 Web search & web scraping

3.7 Operate in recipe

4 Evaluation

4.1 Method

4.2 Results

5 Discussion

5.1 Evaluation

5.2 System limitations

5.3 Improvements & comparisons

5.4 Ethics and sustainability