• No results found

Different recipient designs with dialogue partners : An experimental comparison between a Chatbot and a Human communication partner

N/A
N/A
Protected

Academic year: 2021

Share "Different recipient designs with dialogue partners : An experimental comparison between a Chatbot and a Human communication partner"

Copied!
74
0
0

Loading.... (view fulltext now)

Full text

(1)

Linköpings universitet SE–581 83 Linköping

Bachelor thesis, 18 ECTS | Cognitive Science

2018 | LIU-IDA/KOGVET-G--18/026--SE

Different

recipient

designs

with dialogue partners

An experimental comparison between a Chatbot and a

Human communication partner

Skillnader och anpassningar i en kommunikativ övning med en

dialog partner

Anna Westin

Supervisor : Henrik Danielsson Examiner : Arne Jönsson

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin-istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam-manhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circum-stances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the con-sent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping Uni-versity Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

c

(3)

Chatbots are becoming more common in modern society, but there are almost no studies that explore both the differences and causes that divides human communication from com-munication with a Chatbot. The aim of this thesis was to explore different recipient design people take when communicating with a human and a Chatbot. A Chatbot was built and an experiment was conducted that measured the performance and experience of the partic-ipants. A thematic analysis then found out causes for these experiences. The study focused on finding new differences in addition to exploring people’s boredom, frustration, under-standing, repetition, and performance in a task. The study found differences and causes in people’s recipient design when communicating with a human compared to a Chatbot, as well as differences in the performance of a task. Hopefully, this will help future research figure out solutions for the differences found.

(4)

Contents

Abstract iii Contents iv List of Tables v 1 Introduction 1 1.1 Aim . . . 1 1.2 Research Question . . . 1 1.3 Limitation . . . 2 2 Theory 3 2.1 Communication and recipient design . . . 3

2.2 Earlier studies . . . 4

2.3 Ethics of Chatbots . . . 4

2.4 The DiapixUK material . . . 5

2.5 Likert scale . . . 5 2.6 Thematic analysis . . . 5 3 Method 7 3.1 Participant . . . 7 3.2 Material . . . 7 3.3 Procedure . . . 9 3.4 Questionnaire . . . 9 3.5 Analysis . . . 10 4 Result 12 4.1 Task . . . 13 4.2 Questionnaire . . . 13 4.3 Thematic analysis . . . 14 5 Discussion 24 5.1 Result . . . 24 5.2 Method . . . 28 5.3 Ethical issues . . . 29 5.4 Future studies . . . 30 5.5 Conclusion . . . 30 Bibliography 32 6 Appendix

(5)

3.1 Codes from the thematic analysis . . . 11

4.1 Occurrences on three of the codes from the thematic analysis . . . 15

4.2 Sample of transcript from the human condition . . . 15

4.3 Sample of transcript from the human condition . . . 15

4.4 Sample of transcript from the chatbot condition . . . 15

4.5 Hierarchical arranged coding for Frustration in the human condition . . . 16

4.6 Hierarchical arranged coding for Frustration in the chatbot condition . . . 16

4.7 Sample of transcript from the chatbot condition . . . 17

4.8 Sample of transcript from the chatbot condition . . . 17

4.9 Sample of transcript from the human condition . . . 17

4.10 Hierarchical arranged coding for Participant understanding in the human condition 19 4.11 Hierarchical arranged coding for Participant understanding in the chatbot condition 20 4.12 Sample of transcript from the chatbot condition . . . 21

4.13 Hierarchical arranged coding for Human/Chatbot understanding in the human condition . . . 21

4.14 Hierarchical arranged coding for Human/Chatbot understanding in the chatbot condition . . . 22

4.15 Sample of transcript from the human condition . . . 22

4.16 Hierarchical arranged coding for Repetition in the chatbot condition . . . 23

4.17 Hierarchical arranged coding for Repetition in the chatbot condition . . . 23

(6)

1

Introduction

Chatbots are becoming a bigger part of our modern society. Gartner predicts by 2020 more than 85% of customer interactions will be managed by Artificial Intelligence (AI) (McQueen 2018). Today people are experiencing interaction with a Chatbot on an almost daily basis. People encounter Chatbots when they go online to do airline reservation, buying of clothes, and get customer support. This suggests humans have no difficulty transferring their skills of language to virtual applications (Hill et al. 2015). Chatbots has made the computer to not only be a mediating object between humans in a conversation, but as a participant in it. Norbert Wiener, a progenitor in Cybernetics emphasize the importance on ethic and social questions the development of has brought on us. These questions included our responsibility to this new Other as he called it, how should we respond to it and how show this Other respond to us (Norbert 1952). It is important to research differences that divides different types of dialogue partners and to not underestimate the complexity Chatbots has as participant in a communication process.

Earlier research has explored if a difference can be found when communicating with a human compared to communicating with an intelligent agent. However, there is almost no research made to both specify what the differences are and the cause of them.

1.1

Aim

This study aimed to investigate general people’s experience in a communicative cooperation with two different dialogue partners, a human and a Chatbot, to explore the different recip-ient designs people take to adapt. The study focused on finding new differences in addition to exploring people’s boredom, frustration, understanding, repetition, and performance in a task.

1.2

Research Question

• Can differences be found in people’s recipient design when communicating with a hu-man compared to a Chatbot?

(7)

• Can differences be found in people’s performance and experience in a communicative cooperating task with a human compared to a Chatbot? - and what are the causes of the differences found.

1.3

Limitation

The experiences that will be most focused on in this thesis are boredom, frustration, under-standing problems and repetition. Other differences not concerning experience will however also be presented.

To measure performance of the participant, execution time and how well they do in the task will only be measured.

(8)

2

Theory

To explore different recipient designs in communication, the theory chapter includes a pre-sentation on communication, explanation about maxims, and recipient designs. The chapter will also explain differences found in communication between Chatbots and humans today, the ethics of Chatbots, as well as explaining the material of the experiment and methods used to analyse data.

2.1

Communication and recipient design

Communication, according to Grice (1975), is characterized to some degree of cooperative efforts. The participants in a communication process recognizes to some extent a common purpose or a set of purposes, and/or a direction mutually accepted by both parties. This can take the form of the participants in the communication want to mutually come up with a so-lution of a problem. This direction or purpose can be set at the beginning or evolve during the interaction exchange. This results in there being conversational moves that are considered to be unsuitable at each stage of the exchange. Grice express we can formulate a general prin-ciple in which the participants of the communication are expected to observe throughout the exchange. This would consist of making the conversational contribution such as is required from the purpose of each stage of of the exchange in the communication. Grice labels this the Cooperative principle, and it involves four categories of maxims and submaxims. The four categories are called Quantity, Quality, Relation, and Manner. Quantity refers to the quantity of information to be provided during the exchange. It should not be more or less informative than is required from the purpose of the exchange. Quality relates to the contribution in the exchange to be true. Relation refers to the contribution to be relevant to that stage in the ex-change. Manner refers to the contribution to be expressed clearly to the other party. To reach these maxims, the speaker must make assumptions on the knowledge of the listener (Grice 1975). If however any of Grice’s maxims are broken by any of the participants in the interac-tion, it can result in the other participant to start to break the maxims too. This also applies to interactions with computers (L. Bell 2003). Sacks et al. (1978) uses the term recipient design, and A. Bell (1984) uses the term audience design to express the different practices used by the participants in an interaction to adjust to the situation and their contributions to each other. This involves the speaker to be considerate of the listener’s abilities (like hearing ability) and

(9)

adapt the exchange in the dialogue from that consideration. Blokpoel et al. (2012) further describes recipient design as perspective taking. This means a behaviour is selected by the speaker based in hypotheses made about beliefs and knowledge of the recipient. A recipient design can take the form of for example the speaker speaking more clearly, use simpler words and shorter sentences when meeting someone with a foreign accent that is in a hurry. When communicating, and especially when cooperating, a common ground needs to get es-tablished to get people on the same level with each other. Humans possess knowledge and experiences that can be unique to the person, and to establish common ground, information is shared on this knowledge and these experiences (Enfield 2006). As small pieces of this information are presented, grounding appears (Brennan 1998), which is when the listeners confirm they understand the information and intentions given (Clark, Brennan, et al. 1991). Common ground can also take the form of the speaker making assumptions on knowledge and experience of the listener and that the listener will draw conclusions from them with-out more information having to be presented. This makes communication faster and more effective (Enfield 2006). Another way to make communication more effective is turn taking. Turn taking regulates whom should speak by the speakers taking turns with each other in the conversation. This controls the speakers from interrupting and talking simultaneously with each other. This skill requires the speakers to be able to identify a turn’s construction and whose turn will be next (Lindström 2008).

2.2

Earlier studies

Earlier comparative studies on conversation between Chatbots and other humans has re-vealed people communicate with Chatbots for longer duration but uses shorter messages and smaller vocabulary in the dialogue, than when they communicate with another human. Peo-ple also exhibited greater profanity with a Chatbot than with a human and have also shown to alter their communication ability to match the one of the Chatbot (Hill et al. 2015). Another study has also shown student’s interest in a task drops significantly when engaged with a Chatbot than with a human partner (Fryer et al. 2017). In terms of communication attributes and personality traits, people also change when talking to a Chatbot versus a human. People are more open, agreeable, extrovert, conscientious and self-disclosing when communicating with a human than with an Chatbot (Mou and Xu 2017).

The gender of the voice of the Chatbot has also shown to effect how people perceive the Chatbot. A female voice makes the Chatbot being perceived of having more knowledge on love and relationship, while a male voice makes the Chatbot perceived on being more credible and dominant, and having more knowledge about technical issues (Nass et al. 1997).

2.3

Ethics of Chatbots

Social Chatbots are a popular subject within ethics of AI. Gehl and Bakardjieva (2016) ex-plained some of the ethical problems to be Chatbots acting as humans. Chatbots are used in customer services to help the customers, but the customer do not always know it is a Chat-bot they are talking to. This undermines social bonds, and increased group solidarity and meaningful discourse in the common world.

Gehl and Bakardjieva (2016) went on to also explain the problem of Chatbots increasing in the home and in mobile phones as virtual assistants. The assistants are most often female, which encourage a long heritage of feminizing obedience and a predominated masculine behaviour. Gehl and Bakardjieva (2016) also explored the discussion on if social Chatbots should have rights. We lack convincing arguments to why we should include machines as moral patients, but the same time we lack reasons to exclude them from consideration as they are becoming

(10)

2.4. The DiapixUK material

a bigger part of our society. To be accepted both as artifacts and agents in the ecosystem of social and verbal interaction, Chatbots have already underwent a trial of legitimacy. This meant its meaning, compatibility and relevance with group’s norms were validated. The question on if they should have rights can however also be asked in a different manner, as to can we hold these entities responsible if they do wrong? This question is yet to be answered.

2.4

The DiapixUK material

Diapix is a material inspired by Map Task. Map task is developed by Anderson et al. (1991) and it involves an instruction type of communication between two parties; the giver and the follower. The giver has a map with a route drawn on it. The giver must describe details from the map route and key elements to the follower whom will try to draw the exact same route on the other map. The conversation created in tasks like Map Task has a natural turn-taking and shows the participant’s communication ability. Researchers can control which words that will most probably be used (Baker and Hazan 2011). This also affects the expressions used in the conversation (Van Engen et al. 2010). The structure of Map Task results in an even relationship between the giver and the follower (Baker and Hazan 2011).

Diapix consists of a set of image pairs instead of one person having a map to describe to the other person. The image pairs have 10 differences in every pair and the two parties cooperating must find the differences on the pictures. DiapixUK is a further development of Diapix, to make the images able to be adjusted to fit different nationalities. The image pairs involve 12 differences instead. The differences are equally scattered by three into every quadrant of the image. The pictures consist of images of farms, beaches and streets. Every image also has 12 keywords linked to them that are most likely to come up by both parties when trying to solve the task. Examples of keywords for the Street image are "Pill", "Sign", "Pie" and "Shop". These are objects seen or read on signs in the picture. However, the images differ when measuring the median number of occurrences of keywords produced per image pair per speaker. When tested on human participant by Baker and Hazan (2011) the farm scene’s median was between 0 and 4, and the street scene’s median was between 0 and 8.5. The median range of the street scene was larger because of the keywords "sign" and "shop" being repeated a lot. One image took approximately 8 minutes to solve per participant, and an average number of 613 words was used per participants. They concluded the material to give enough speech material for linguistic and acoustic analysis. The same study also examined if a learning effect was created when doing more than one DiapixUK task. In this test they concluded the material to not create a learning effect, and that all images take about the same time to find the 12 differences on.

2.5

Likert scale

Questionnaires using a Likert scale are often used to measure characteristics, attitude and opinions on a matter. The scale can differ, but the most used is the five or seven graded ordinal scale that starts from Never (1) to Always (5 or 7) (Sullivan and Artino Jr 2013). The questionnaire used in this study contained a seven graded Likert-scale.

2.6

Thematic analysis

A thematic analysis is a descriptive method that attempts to examine what is going on in the data by finding themes or categories to describe it. The procedure starts with first transcribing the recordings from an interview or study, and then code it. Usually passages of text are coded but it can also be sections of an audio or video recording (Ryan and Bernard 2003).

(11)

When coding the researcher typically have some ideas on codes the text will produce (Ryan and Bernard 2003). If the researcher has more open mind set and want all ideas to completely arise from the data, a few suggestions on questions was written by Charmaz (2006) that the researcher can ask about the data when coding:

• What is going on? • What are people doing? • What is the person saying?

• What do these actions and statements take for granted?

• How do structure and context serve to support, maintain, impede or change these ac-tions and statements?

Charmaz wrote it for the purpose of being used in Grounded theory, but it is often used in thematic analyses as well. The codes can be organized by non-hierarchical coding or Hierar-chical coding, called flat and tree coding respectively. (Gibbs and Taylor 2010).

When the codes are generated they are assembled to find one or more themes. Ryan and Bernard (2003) wrote a few suggestion on how themes can be found. Of these suggestions three were relevant for this paper:

• Key-words-in-context – look for the range of uses of key terms in the phrases and sen-tences in which they occur.

• Transitions – one of the discursive elements in speech which includes turn-taking in conversation as well as the more poetic and narrative use of story structures.

• Social science queries – introduce social science explanations and theories, for example, to explain the conditions, actions, interaction and consequences of phenomena.

Then the text or transcript is reviewed again, once for every theme. When this process is finished the themes are summed up and an interpretation of what the data means is made (Howitt 2016).

(12)

3

Method

The Method chapter will first present information on participants that partook in the study and material used in the experiment. It will then explain the procedure of the experiment, and the analysing method, in which codes used in the analysing process will be presented..

3.1

Participant

A pilot study was conducted on 8 participants. The participants in the pilot study did not participate in the final study.

The final study included 11 women and 13 men. In total there were 24 participants with the average age of 21 (SD = 1.78). All participants had Swedish as their first language and were re-cruited through a comfort selection of university students at the undergraduate or advanced level. All participated partook in both the Chatbot condition and the human condition.

3.2

Material

The DiapixUK material for the images Farm and Street was used in the study. The material is meant for an English audience, resulting in the names of stores and text on signs in the images being in English.

A Chatbot was built that runs through a Python script. The Chatbot was built by considering problems that had come up in an earlier study by Fornander (2017) where the same material had been used with a Chatbot. The study had multiple problems with the Chatbot and an evaluation was made from that study of the problems. The problems were compiled into a list.

• Multiple repetition of lines by the Chatbot and made the participants give up on finish-ing the task.

• The voice recognition of the Chatbot was bad. A python module of Google Speech-To-Text was used as voice recognition.

(13)

• The Chatbot was considered to be less communicative competent, because of the inabil-ity to listen and talk at the same time.

• The test coordinator had to manually push send to the Google module because the program failed to do it automatically sometimes.

• The voice quality of the Chatbot was bad and hard to understand. Microsoft Speech was used as speech synthesis.

• Two of the measured variables in Fornander (2017) study could have been affected by the participant not having English as their first language. A study of Van Engen et al. (2010) demonstrated that the word-to-token-ratio and execution time differ depending on if both participants in a Diapix-dialogue have English as their first language or not.

The repetition of lines was fixed by using Chatscript. Chatscript is both a natural language processing engine and dialog flow scripting program. Sentences can be chosen to be linked to keywords, in which the word Sheep covers all different forms of the word, or randomly selected when there is nothing else to talk about. Most importantly lines can be chosen to only be spoken once by the Chatbot. Chatscript has also won the Loebner prize four times, which is a competition that awards computer programs considered by judges to be the most human-like (Wilcox 2015).

For speech recognition, Google Speech-to-text was used. A script performing asynchronous speech recognition running against Google speech-to-text API was used instead of python’s Google module. Google Speech-to-text was chosen because studies have shown it’s the speech recognition service with the lowest Word-Error-Rate (WER). Google speech has 9% WER, compared to for example Microsoft Speech, that has 18% WER (Këpuska and Bohouta 2017). Using the API instead of the python module allowed the Google speech-to-text script to be modified. It was modified to perform better and react to input, which resulted in the test coordinator to not manually having to send away the responses when the program failed to do it on its own. Words likely to come up were also added to the script to help the speech recognition. The vocabulary of words consisted of keywords the Chatbot reacted too, and words participants had used to describe objects in the pilot studies.

A solution to the Chatbot’s inability to listen and talk at the same time was tested but termi-nated because it raised more problems than it solved. Google asynchronous speech recog-nition could only run 90 seconds at a time and would restart in the middle of the people speaking.

A Swedish voice synthesis was chosen to avoid the effect English has on the execution time when it’s not the participants first language. For a Swedish voice synthesis Amazon’s cloud service Amazon Polly was used. The voice synthesis was evaluated from the pilot study to be sufficient enough, but the Human understanding Chatbot-variable in the questionnaire would also test this.

Turtle Beach Ear Force P11 Headset was used to talk and listen to the Chatbot. The headset was chosen because it has noise cancellation that would help the speech recognition.

The conversations between the participant and the two conditions were recorded using a smartphone app. The lines of the Chatbot could not be heard from the recordings, and an automatic transcription program was created that took all lines the Chatbot said and put it in a text file. The transcription program also included what the participant said, but as it had a lot of errors because of the speech recognition, later correction of the transcript was needed.

(14)

3.3. Procedure

3.3

Procedure

The experiment consisted of two sessions after each other with one condition in each ses-sion. The conditions consisted of human condition and Chatbot condition. The order of the sessions changed so fatigue of the task would not be a variable affecting the result. The im-ages also changed between the Street and Farm image for the two conditions, to prevent the differences in the images affecting the result.

The experiment started with the participants getting information about the study and signing a participation consent form. The participants were then first informed to start in the left corner and only speak of one object or object pairs at a time. "Two green melons" were given as an example as an object pair. None of the images included any melons and the phrase was only an example phrase. The participants were also informed when talking to the Chatbot to say the phrase "What did you say?" when they wanted something the Chatbot has just said to repeat again.

In the Chatbot condition the participant was handed an image from the DiapixUK material and sat in front of it by a table marking with a pen the differences found during the conver-sation. A computer was placed next to the participant with the screen turned away. It was positioned there to give the participant a headset with headphones and microphone to talk to the Chatbot. The participant was left alone in the room with the computer during the session, to not make the participants uncomfortable.

The human condition also consisted of the participant sitting by a table. Another image from the DiapixUK material was given. The test coordinator and the participant sat back to back to each other to not let face gestures or any other gestures affect the result. The test coordinator had a manuscript that was the same implemented in the Chatbot. The manuscript only in-volved the topics in the image. A manuscript was chosen to keep consistency in the answers and to ensure the choice of words or the sentence structure would not be a be variables that would affect the result.

3.4

Questionnaire

From the tasks the participant made, two variables were measured regarding performance:

Execution time The time it took for the participant to solve the task.

Performance in task How many correct differences the participant found in the images dur-ing the task.

A questionnaire was answered by the participant after finishing the tasks with both condi-tions. The questionnaire included ten questions that measured the experience of the partici-pant in terms of:

Boredom If the participant got bored at any point with the human or the Chatbot. Frustration If the participant got frustrated at any point with the human or the Chatbot. Participant understanding If the participant felt he or she understood the human or the

Chatbot.

Human/Chatbot understanding If the participant felt he or she was understood by the hu-man or Chatbot.

(15)

Boredom and Frustration were answered with free text for the participants to write the rea-sons of why they experienced what they had, or just write "yes/no" to it.

Participant understanding, Human/Chatbot understanding and Repetition were answered through a likert scale where 1 was "Never", and 7 was "Always".

The question about boredom was based on earlier research that showed participants get bored with Chatbots easier than with humans when doing a task. This effect had earlier only been seen in exercises that included learning a new language (Fryer et al. 2017).

During the pilot study frustration was a word that came up by the participants to explain their experience. There is a lack of research on measuring if frustration is a factor that could be different in a conversation with a human or a Chatbot, and therefor a question was added on this.

The questions regarding understanding and repetition was added because these two factors had been a big problem in Fornander (2017) study when using the same material with a Chatbot.

3.5

Analysis

A thematic analysis was made with the purpose of finding more differences, as well as causes on if the participant had experienced boredom, frustration, repetition and understanding problems.

The Chatbot condition was first transcribed and coded. Then the human condition was lis-tened to and the important parts that fitted the codes or other interesting parts were chosen to only be transcribed. This method was used because there were approximately 8 hours of recordings in total and it gave the test coordinator less of a fatigue effect when using two different methods to code the data, which made the test coordinator more effective for more hours.

Charmaz (2006) suggestions was used when coding the data, with more focus on the ques-tions "What are people doing?" and "How do structure and context serve to support, main-tain, impede or change these actions and statements?". The codes that were produces can be seen in Table 3.1

(16)

3.5. Analysis

Chatbot condition human condition Explanation

Word failure Word failure Participant stumbled on a word

Profanity Profanity Participant used curse words

Talking to herself Talking to herself Participant talks to herself Simultaneous talking Simultaneous talking Participant talked or made

agreeing noises at the same time as the human/Chatbot is speaking

Informing of difference Informing of difference Participant says there is a differ-ence by using keywords: Other, different, difference

Tell what difference Tell what difference Participant describes object that she has different in the image

Confusion Confusion Participant don’t understand

Correction Correction Participant correct something

she think is described wrong Participant-repetition Participant-repetition Participant repeats herself

Repetition of speaker Participant repeats what speaker said

Unwanted repetition Human/Chatbot describes a

topic again without participant asking for a repetition by saying "What did you say?"

Untopic-repetition Human/Chatbot do a repetition

but not about topics

Mimic Mimic Participant mimics sentence

structure and words of the human/Chatbot

Confirmation Confirmation Participant asks to confirm

something in the image

Successful confirmation Successful confirmation Human/Chatbot answer on the participant’s question of confir-mation

fail to confirm Human/Chatbot do not answer

on the participant’s question of confirmation

Off script Something is said by the hu-man/Chatbot that is not in the manuscript

Weird comment An irrelevant comment is said

by human/Chatbot

Interruption Human/Chatbot interrupts

par-ticipant

Change topic Human/Chatbot changed topic

without natural transition

Irritation Participant shows signs of

irrita-tion by profanity or haughtiness in language by for example say-ing "k" instead of "okay". Frus-tration in the voice of the partic-ipant.

(17)

The results are divided into three subsections. The first subsection includes the result of the task, the second subsection includes the result of the questionnaire, and the third and last subsection includes the thematic analysis.

A significant difference could be found between the two conditions on all measured vari-ables apart from execution time. The Chatbot condition had a higher score in the measured variables Frustration, Boredom, Participant understanding, Human/Chatbot understanding, and Repetition.

The themes from the thematic analysis are presented in hierarchical arranged coding tables, and samples from the transcript are presented as well in tables. The themes from the thematic analysis for the human condition are following:

• Frustration:

Annoyed with herself • Participant understanding:

Participant try to understand Participant fail to understand

Participant try to make herself more understandable • Human/Chatbot understanding:

Test coordinator try to make herself more understandable • Repetition:

Participant repeats what test coordinator said Participants repeats themselves

(18)

4.1. Task

• Frustration:

Annoying Chatbot Hypothetical causes • Participant understanding:

Participant try to understand Participant fail to understand

Participant try to make herself more understandable • Human/Chatbot understanding:

Chatbot fail to understand • Repetition:

Chatbot repeats itself

Participants repeats themselves

In the human condition there was also a much higher occurrence of the coding Simultaneous talking, Informing of difference and Tell what difference. A table will present this as well in the thematic analysis section.

4.1

Task

Execution time A paired-samples t-test was conducted to compare the execution time in the human condition and Chatbot condition. There was not a significant difference in the scores for human condition (M=7.5750, SD=1.1812) and Chatbot condition(M=7.3892, SD=1.0960) conditions; t(23)=0.562, p = 0.580.

This meant in the Chatbot condition participants spent in average 7.57 minutes solving the task, and in the human condition participants spent 7.38 minutes on the the task. Performance in task A paired-samples t-test was conducted to compare how many correct

differences from the images the participant found in the human condition and Chatbot condition. There was a significant difference in the scores for human condition (M=9.42, SD=1.349) and Chatbot condition (M=8.13., SD=1.569) conditions; t(23)=3.332, p = 0.003. These results show the participant found more correct differences in the human condi-tion that in the Chatbot condicondi-tion.

4.2

Questionnaire

A paired-samples t-test was used on the likert-scale data in the questionnaire. All data showed it had normal distribution except for the Participant understanding-parameter. Not all assumptions to do a valid t-test was fulfilled for this parameter. However this is quite common because "real world" data is never as perfect as we would like it to be. A paired t-test can therefore still be done and be valid as long as it fulfills the rest of the assumptions, which it did.

In the questionnaire the participants only answered "yes" or "no" on Boredom and Frustra-tion, and did not write more of an explanaFrustra-tion, which made it into paired nominal data. A McNemar test therefor done on these variables.

(19)

Boredom Twentyfour participants were recruited to test if they experienced more boredom with a Chatbot or a human. An exact McNemar’s test determined that there was a statistically significant difference in the "yes" and "no" answers on their experienced boredom, p = 0.001. The Chatbot condition had more yes answers than the human condition.

Frustration Twenty four participants were recruited to test if they experienced more frustra-tion with a Chatbot or a human. An exact McNemar’s test determined that there was a statistically significant difference in the "yes" and "no" answers on their experienced boredom , p = 0.001. The Chatbot condition had more yes answers than the human condition.

Participant understanding the conditions A paired-samples t-test was conducted to com-pare participants experience of how well the participant understood the Chat-bot/human in the human condition and Chatbot condition. There was a significant difference in the scores for human condition (M=6.92, SD=0.282) and Chatbot condi-tion (M=6.54, SD=0.658) condicondi-tions; t(23)=-2.584, p = 0.017. These results suggest the participants experienced understanding more of the human than the Chatbot.

Conditions understanding participants A paired-samples t-test was conducted to compare participants experience of how well the Chatbot/human understood the participant in the human condition and Chatbot condition. There was a significant difference in the scores for human condition (M=6.75, SD=0.442) and Chatbot condition (M=4.17, SD=1.761) conditions; t(23)=-7.597, p = 0.000. These results suggest the participants experienced the human understood more than the Chatbot.

Repetition A paired-samples t-test was conducted to compare participants experience of having to repeat themselves in the human condition and Chatbot condition. There was a significant difference in the scores for human condition (M=1.79, SD=0.833) and Chatbot condition (M=3.00, SD=1.319) conditions; t(23)=7.198, p = 0.000. These results suggest the participants experienced having to repeat themselves more in the Chatbot condition than in the human condition.

4.3

Thematic analysis

The thematic analysis will be presented by hierarchical coding. The first level is the theme, the second level involves the codes involved in that theme, and the third level involves the codes that came right before the code on the second level. The hierarchical order can be read as "caused by..." after the theme.

Finding when or why the participant experienced boredom in the data was not successful. The experiment was conducted in Swedish, but all samples of the transcript presented in this thesis are freely translated from Swedish to English.

It was discovered human condition had a much higher occurrence of the coding Simultane-ous talking, Informing of difference and Tell what difference. A table will be presented to lift this out. This can be seen in Table 4.1

Simultaneous talking

Most of the participants added information or made noise of agreement while the test coor-dinator or the Chatbot was speaking. There were more occurrences of simultaneous talking in the human condition than the chatbot condition. This meant the participants talked more at the same time as the human talked than when the chatbot talked. Samples can be seen in Table 4.2 and Table 4.3.

(20)

4.3. Thematic analysis

Code Occurrences in chatbot

condi-tion

Occurrences in human condi-tion

Simultaneous talking 2 37

Inform of difference 15 35

Tell what difference 21 49

Table 4.1: Occurrences on three of the codes from the thematic analysis

Test coordinator In the enclosure there is a sign that says [Participant: what sign] in English Meet Sue and Ted.

Simultaneous talk-ing

confusion Participant In the enclosure there is no sign. mimic

Tell what differ-ence

Table 4.2: Sample of transcript from the human condition

Table 4.3: Sample of transcript from the human condition Test coordinator To the right there is a sort of

game store or a casino [Par-ticipant: mm] with green roof and green windows [Partici-pant: mm]. The store has a big signboard on the roof that in En-glish says Sals shop of betting.

Simultaneous talking

Participant Okay, I don’t see the green sign, men it says Place your bets in the window.

Informing of difference

Participants informed of the differences found in the images. There were more occurrences of informing of the differences in the human condition than the chatbot condition. This means the participants told more often when they had found a difference to the human than to the chatbot. Samples can be seen in Table 4.4.

Diapix bot Nobody is saying anything.

Participant You have we a difference Informing of differ-ence

Table 4.4: Sample of transcript from the chatbot condition

Tell what difference

Participants told exactly what was different in their image. There were more occurrences of telling this difference in the human condition than the chatbot condition. This means the participants gave a more descriptive answer on what was different to the human than to the chatbot. Samples can be seen in Table 4.2

(21)

Frustration

Seen in Table 4.5, one theme for causes of frustration in the human condition was created; Annoyed with herself. This theme is based on the irritation from the participant being caused by word failure by the participants. The participant got mad at herself rather than the test coordinator. A sample from the transcript of an occurrence of this can be seen in Table 4.9. Seen in Table 4.6, two themes for frustration in the chatbot condition was created; Annoying chatbot and Hypothetical causes. The Annoying chatbot theme is based on the participants’ cause of irritation being change of topic and rude comments, both done by the chatbot. Sam-ples from the transcript of occurrences of this can be seen in Table 4.7 and Table 4.8.

The Hypothetical causes theme are guesses on what more could had caused the frustrating experience for the participant. These guesses have no support in the data and should only be seen as subjective hunches by the test coordinator. The Hypothetical causes theme in-cluded the chatbot to interrupt the participant, repeat itself without asked, giving weird and irrelevant comments, and fail to confirm when the participant asked a question. The weird comments by the chatbot seem to have been triggered by the participants seeking confirma-tion, meaning the participant probably continued asking about a topic that had already been described by the chatbot. As the chatbot did not repeat lines unless asked for, the chatbot’s natural language engine chose other lines from the manuscript, that unfortunately didn’t al-ways make sense.

• Annoyed with herself Irritation

∗ Word failure

Table 4.5: Hierarchical arranged coding for Frustration in the human condition

• Annoying chatbot Irritation ∗ Change topic ∗ Rude comment • Hypothetical causes Interruption Untopic-repetition Unwanted repetition Weird comment ∗ Confirmation fail to confirm ∗ Correction ∗ Confirmation

(22)

4.3. Thematic analysis

Participant No, it doesn’t say that on mine. It says Place your bets.

Diapix bot Do you think I care if you agree with me?

rude comment Participant Wait what? Yes we’re suppose

to solve this together. K.

irritation

Table 4.7: Sample of transcript from the chatbot condition

Participant There it’s a pharmacy with blue doors and there are two signs there one says Extra shine with pearl toothpaste.

Diapix bot There is a bird cage, and a paper, in the red window of a pet store. A red sign is also hanging on the closed door.

change topic

Participant But what the f*ck she said we were suppose to start to the left.

irritation

Table 4.8: Sample of transcript from the chatbot condition

Participant Below is one, one, sh*t, wood-beige hut where it says Sell your second hand goods and inside there are lots of items in differ-ent colors.

word failure profanity

Test coordinator I have a small store under the betting shop. [Participant: mm]. It says Antiques on the sign-board. [Participant: okay].

Simultaneous talk-ing

Table 4.9: Sample of transcript from the human condition

Participant understanding

Seen in Table 4.10 and Table 4.11, the same three themes were created for participant under-standing in the both the human and chatbot condition; Participant try to understand, Partici-pant fail to understand, and ParticiPartici-pant try to make herself more understandable. The codes that created the themes are very similar in both tables, but what caused those reactions (level three of the lists in the table) are almost all different.

The theme Participant try to understand involved the participant giving an impression of trying to create understanding for herself by correcting the chatbot/human when something was described wrong according to the participant. It also involved the participant asked questions to ensure of something, or the participant talked to herself to individually regroup what had been said. In the human condition this occurred after the participant had talked at the same time as the test coordinator. In the human condition the participant also repeated what the test coordinator had just said sometimes. The participant did the same in the chatbot condition except for repeating what the chatbot had said. No causes for this behaviour was found in the chatbot condition.

(23)

The theme Participant fail to understand involved confusion from the participant. This con-fusion was caused by the participant speaking at the same time as the test coordinator in the human condition, and in the chatbot condition confusion was caused by the chatbot giving a weird comment. In the chatbot condition participant failing to understand was also caused by the chatbot failing to confirm something the participant asked about, which in turn was triggered by the participant seeking confirmation on a matter or correcting something the chatbot had described that the participant thought was described wrong. A fail to under-stand by the participant in the chatbot condition was also caused by the chatbot changing topic. The chatbot changed topic when the participant started talking to herself, gave a more descriptive answer on what she saw was different in her image, wanted confirmation on a matter, or repeated herself. All these behaviour seemed to trigger keywords unrelevant to the topic they talked about, and therefor a change of topic occured. Samples from the transcript of this can be seen in Table 4.12

The theme Participant try to make herself more understandable involved the participant to start to mimic sentence structure and words used by the chatbot/human. Example from the transcript of this can be seen in Table 4.2. This was done sometimes after a confusion had occurred in the human condition, or a weird comment had been said by the chatbot in the chatbot condition. The participant also gave a more descriptive answer on an object that was different in images. This was done after confusion had occurred or the participant had spoken at the same time as the test coordinator in the human condition. No causes of this behaviour was found in the chatbot condition. The participant also informed when a differ-ence was found in both conditions, as well as spoke at the same time as the chatbot/human with agreeable noises or adding information. In the chatbot condition this simultaneous talk-ing could occur after betalk-ing interrupted by the chatbot. The Participant also repeated herself, which was done after talking at the same time as the test coordinator in the human condition. In the chatbot condition it occured when the chatbot did a repetition that was off topic or made a change of topic.

(24)

4.3. Thematic analysis

• Participant try to understand Correction ∗ Simultaneous talking Confirmation ∗ Simultaneous talking Talking to herself Repetition of speaker • Participant fail to understand

Confusion

∗ Simultaneous talking

• Participant try to make herself more understandable Mimic

∗ Confusion Tell what difference

∗ Confusion ∗ Simultaneous talking Informing of difference Simultaneous talking Participant-repetition ∗ Simultaneous talking

Table 4.10: Hierarchical arranged coding for Participant understanding in the human condi-tion

(25)

• Participant try to understand Correction

Confirmation Talking to herself • Participant fail to understand

Confusion ∗ Weird comment fail to confirm ∗ Confirmation ∗ Correction Change topic ∗ Talking to herself ∗ Tell what difference ∗ Confirmation

∗ Participant-repetition

• Participant try to make herself more understandable Mimic

∗ Change topic Tell what difference Informing of difference Simultaneous talking ∗ Interruption Participant-repetition ∗ Untopic-repetition ∗ Change topic

Table 4.11: Hierarchical arranged coding for Participant understanding in the chatbot condi-tion

(26)

4.3. Thematic analysis

Diapix bot In my picture a happy farm worker is riding a tractor to the left of a big building.

Participant He isn’t happy. correction

Diapix bot Okey I have the same in my im-age.

fail to confirm Participant You said he was happy. Is that

a difference? It’s a interpreting question I guess.

confirmation

Diapix bot Nobody is saying anything. change topic Participant What no one in the whole

pic-ture or just the tractor driver? I don’t get it.

confirmation

Table 4.12: Sample of transcript from the chatbot condition

Human/Chatbot understanding

Seen in Table 4.13, the theme for the human condition is Test coordinator try to make herself more understandable. This theme includes the test coordinator going off script to answer a question by the participant. Samples from the transcript can be seen in Table 4.15

Seen in Table 4.14, the themes for the chatbot condition are Chatbot fail to understand. This theme includes the chatbot not understanding the participant which lead to confusion, change of topic, repetition that was both unwanted and untopic, fail to confirm in a question, a weird comments by the chatbot, and the participant being interrupted by the chatbot. The participant wanting confirmation in a question or repeating herself caused most of these re-actions from the chatbot. The participant stumbling on words was the cause of the chatbot interrupting sometimes.

• Test coordinator try to make herself more understandable Off script

∗ Confirmation

Table 4.13: Hierarchical arranged coding for Human/Chatbot understanding in the human condition

(27)

• Chatbot fail to understand Confusion

∗ Weird comment Change topic

∗ Talking to herself ∗ Tell what difference ∗ Confirmation ∗ Participant-repetition Untopic-repetition ∗ Participant-repetition Unwanted repetition ∗ Participant-repetition ∗ Confirmation fail to confirm ∗ Confirmation ∗ Correction Weird comment ∗ Confirmation Interruption ∗ word failure

Table 4.14: Hierarchical arranged coding for Human/Chatbot understanding in the chatbot condition

Test coordinator I don’t have a sign that says that. I only have the one about tooth-paste in the right window. Participant Okey so no sign that says

any-thing about pills?

confirmation Participant repeti-tion

Test coordinator No offscript

Table 4.15: Sample of transcript from the human condition

Repetition

Seen in Table 4.16 the themes for repetition in the human condition are Participant repeats what test coordinator said and Participant repeat herself. These were the two kind of repeti-tions that occurred in the human condition. The participants sometimes repeated the test co-ordinator, which occurred after the test coordinator went off script or successfully confirmed something the participant had asked. The Participant also repeated themselves sometimes after speaking at the same time as the test coordinator, in a fashion to make sure they had heard what had been said.

Seen in Table 4.17 the themes for repetition in the chatbot condition are Chatbot repeats itself and People repeat themselves. The chatbot repeated itself by doing an unwanted repetition

(28)

4.3. Thematic analysis

about a topic without the participant asking "what did you say?", or did a repetition that wasn’t about the topic they had, by for example asking "What else do you see?" two times is a row. Samples from the transcript of this can be seen in Table 4.18. Both kind of repeti-tions done by the chatbot occurred when the participant repeated herself, but the unwanted repetition could also be triggered by the participant asking a question to confirm something. People repeated themselves which occurred when the chatbot did a repetition that was off topic or changed topic completely.

• Participant repeats what test coordinator said Repetition of speaker

∗ Off script

∗ Successful confirmation • Participants repeat themselves

Participant-repetition ∗ Simultaneous talking

Table 4.16: Hierarchical arranged coding for Repetition in the chatbot condition

• Chatbot repeats itself Untopic-repetition

∗ Participant-repetition Unwanted repetition

∗ Participant-repetition ∗ Confirmation

• Participants repeat themselves Participant-repetition

∗ Untopic-repetition ∗ Change topic

Table 4.17: Hierarchical arranged coding for Repetition in the chatbot condition

Diapix bot What else do you see?

Participant I see two boys that are dressed up as ghosts.

Diapix bot What else do you see? Untopic-repetition Participant I see two boys that are dressed

up as ghosts.

Participant-repetition

Diapix bot What else do you see? Untopic-repetition Participant One of them has red pants and

the other one has blue pants.

(29)

The Discussion chapter will discuss the results, method used in the study, ethical issues and future studies. Because of the result being considerately big, the discussion on result will be divided into subsections separating the focus areas for this thesis. This is to make it easier for the reader to read.

5.1

Result

A significant difference could be found between the two conditions on all variables from the questionnaire apart from execution time. The Chatbot condition had a higher score in the measured variables Frustration, Boredom, Participant understanding, Conditions under-standing participants, and Repetition. The human condition however had higher occurrences of the codes Simultaneous talking and Informing of difference, and Tell what difference. The thematic analysis found differences and causes for all measured variables except for Boredom. Cues on boredom is difficult to find unless the person specifically expresses she is bored. The questionnaire however found a significant difference on people’s experience of boredom with the human/Chatbot. People get more bored with Chatbots than with humans. This goes in line with Fryer et al. (2017) study that interest for a task also drops significantly when engaged with a Chatbot than with a human partner.

Performance

The result of the task showed on there not being much difference in execution time of the task, as earlier research has suggested. The average number on execution time differentiated with only 19 seconds between the two conditions. This could have been influenced by the material used. When the Diapix material was tested by Baker and Hazan (2011), the median time it took to solve the task was around 8 minutes per image. The result shows on the average time being around there too, which suggests this material is not fitted to find differences concerning execution time between two different conditions.

There was however a significance found in performance of solving the task between the two conditions. The results suggest a task is easier to solve with a human than a Chatbot, but it

(30)

5.1. Result

takes the same amount of time to solve the task. There will later in this result discussion be suggestions of recipient designs people take when cooperating with a human or a Chatbot. One of these recipient design suggestions are that people use more of a cooperative recip-ient design when cooperating with human than they do with Chatbots. The result of the performance in the task being higher with humans also supports this theory.

Higher occurrences of certain codes

It was discovered in the thematic analysis that human condition had a much higher occur-rence of the coding Simultaneous talking, Informing of diffeoccur-rence and Tell what diffeoccur-rence. A table presenting this can be seen in Table 4.1.

Simultaneous talking meant the participant started to talk before the Chatbot/human fin-ished talking. This could be as a noise of agreement by the participant (see Table 4.3). This is called grounding, as people are confirming that what has been said has been understood. The participants also sometimes added information in the middle of the Chatbot or the test coordinator talking. In the Chatbot condition this could occur after the participant been inter-rupted by the Chatbot. It was coded as a way for the participant to try to make herself more understandable. This suggests the participant tried to establish common grounds. However, the sentence the participant were to say and the added information did not always match. The result of Simultaneous talking suggests the participant knew when listening to a human that the human was listening as well as talking. The participant did not get any information on how the Chatbot worked, but it seems people assumes that Chatbots works as a human. This assumption might had added frustration, as Chatbots do not work as humans and there-for disappoint the assumption made.

The results also suggest a people have trouble identifying a turn’s construction in a turn taking with both humans and Chatbots, as the participants did not always wait for their turn to speak in the communication.

Informing of difference meant participant confirmed differences found by for example saying "Then we have a difference" (see Table 4.4). To inform on a difference is a mean of establishing common ground. There were more occurrences of this from people when cooperating with a human than with a Chatbot. This could have been caused by people making an assumption of the listener’s knowledge of the world. People are known for with making these assump-tions and assume the listener will draw conclusions themselves without a confirmation being needed. This assumption of already having common ground seem to have been stronger with the Chatbot, while when communicating with a human a need to establish common grounds by grounding seem to have been needed more often.

Tell what difference meant people told exactly what the difference was in their image. This is also a mean of establishing common ground. People did this more with the human than with the Chatbot, suggesting people have more of a cooperative recipient design with humans and want the human to solve the task as much as them. People not telling what exactly is different in their image to the Chatbot as often as with the human suggests the Chatbot was regarded as more of a tool to help them solve the task.

All these findings however need to be tested for significance to make a conclusion on there being a real difference between the two conditions in regard of Simultaneous talking, Inform-ing of difference and Tell what difference. The result was still presented as a suggestion on there being a difference there because of the notably big difference of occurrences of these codes from the thematic analysis between the two conditions.

(31)

Frustration

The result from the questionnaire suggested people experience more frustration with a Chat-bot than with a human. The thematic analysis found one theme to explain the irritation the participant sometimes expressed, which was interpreted as a sign of frustration. In the hu-man condition peoples cause of frustrated seem to have been themselves. The participant stumbled on words which caused profanity and irritation from the participant to occur (see Table 4.9). In the Chatbot condition peoples cause of frustration seem to have been the Chat-bot. The Chatbot did what was coded as a change of topic (see Table 4.8), meaning the par-ticipant said something and the Chatbot misunderstood the keywords and started describing another object. The Chatbot also did what was coded as rude comment in the thematic anal-ysis. A rude comment was when the Chatbot expressed a rude comment to the participant (see Table 4.7). The script on what the Chatbot were to say composed of lines used by already implemented Chatbots in Chatscript. The reason for using this was for the Chatbot to handle unexpected phrases from the participant and make the conversation seem more natural. The Chatbot reacts to keywords, and if none of those keywords are found the natural language engine kicks in and produces what the script considers a relevant answer. For the experi-ment lines were added to describe the images and some of Chatscrips original lines were deleted. The rudeness the Chatbot never occurred in the pilot study, and this rudeness from the Chatbot was therefor unexpected.

Another theme called Hypothetical problems was created from the thematic analysis of the Chatbot condition as well. This theme consisted of subjective guesses from the test coordina-tor to explain the experienced frustration. These subjective guesses were based on what the test coordinator herself found frustrating with the Chatbot. This theme included the codes In-terruption, Untopic-repetition, Unwanted repetition, Weird comment, and fail to confirm. All these codes are explained in Table 3.1, and are actions the Chatbot did. They are also the exact same codes to explain what happened when the Chatbot showed signs of not understanding. See Table 4.14. Based on this a parallel can be drawn and a theory on a hypothetical cause of the people experiencing more frustration with Chatbots could be that Chatbots demonstrates signs of not understanding the person.

Participant understanding

The result from the questionnaire suggests people experience more trouble understanding Chatbots than humans. From the thematic analysis three themes regarding this were created; Participant try to understand, Participant fail to understand, and Participant try to make herself more understandable. People tried to understand by correcting the Chatbot/human when things were described as what they considered wrong, and asked questions to confirm something (see Table 4.12). Why people did this with the Chatbot was not found, but with the human it occurred sometimes right after simultaneous talking. The same result could be found in the human condition on reasons to why confusion occurred, as well as the partici-pant repeating herself or gave a more descriptive answer to what the differences were in the image. Simultaneous talking being the reasons in the human condition to most of people’s trouble to understand or having an urge to create a better understanding suggests people’s recipient design when talking to humans are more focused on themselves. People focus on conveying what they want to say, which leads to having to confirm and repeat themselves. The participants in the experiment also talked to themselves from time to time when trying to understand in both the Chatbot and the human condition. One more difference concerning the participant trying to understand was that the participant sometimes started repeating the human, but never did this to the Chatbot.

(32)

5.1. Result

When the participant failed to understand in the Chatbot condition, confusion occurred there as well. The cause of this was sometimes a weird comment from the Chatbot. A failing to understand for the participant could also occur when the Chatbot failed to answer a question or it changed topic. These two behaviours from the Chatbot were both caused by behaviours from the participant trying to understand or make herself more understandable. By for ex-ample trying to confirm or repeat something about a topic that had already been cleared according to the Chatbot, and the Chatbot not having the option of repeating itself unless asked to do so, it instead gave answers that made the participant confused. This broke max-ims in the Cooperative principles by Grice (1975), as the contribution by the Chatbot was not relevant, clear, true, or informative.

People also tried to make themselves more understandable to both the human and the Chat-bot by for example mimicking the other speaker or repeat themselves. This could occur after some confusion or simultaneous talking had occurred in the human condition, and in the Chatbot condition it occurred after a change of topic or an off topic repetition had been done by the Chatbot. In both conditions the need from the participant of making herself more understandable seem to have been caused by an assumption from the participant that what was said might have been missed, or a help was needed from the participant to clear up the understanding problems.

The quality of the speech synthesis was approved in the pilot test, but as people experienced a problem to understand the Chatbot more than the human, it could suggest that the quality of the speech synthesis also could had affected this result and should be tested more for evaluation.

Human/Chatbot understanding

The result from the questionnaire suggests people experience Chatbots have more trouble understanding them than humans. The theme that came from the thematic analysis in the human condition is called Test coordinator try to make herself more understandable. This meant the test coordinator went off the script sometimes to answer the participant’s question. In the Chatbot condition the theme that came out of the thematic analysis was called Chatbot fail to understand. This led to confusion caused by weird comment by the Chatbot, and the Chatbot failing to understand was demonstrated by the Chatbot changing topic, making repetition both unwanted and off topic, and fail to answer a question.

The two different themes of the two conditions suggests that the human making the same effort as the participant to make herself more understandable, results in people experiencing themselves as being more understood. When the same effort is not made by both parties in a communication process, the other party does not seem to give an impression of understand-ing the first party.

Google speech recognition got a lot of words wrong, even with the help of added words to help trigger correct keywords. Perhaps an even better headset could had made it better.

Repetition

The result regarding if the participant experienced they had to repeat themselves showed a significant difference between the two conditions human and Chatbot. The thematic anal-ysis revealed there were two different kind of repetitions in both conditions. In the human condition the themes were called; Participant repeats what the test coordinator said, and Par-ticipants repeat themselves. In the Chatbot condition the themes were called; Chatbot repeats itself and Participants repeat themselves.

(33)

People repeated what the human said, but not what the Chatbot said. This was caused by the human went off script and/or successfully answered a question by the participant. Peo-ple repeating what the human said have already been discussed as an attempt by peoPeo-ple to understand what had been said. The repetition themes in the Chatbot condition were how-ever both caused by the participant repeating herself, which in turn made the Chatbot repeat itself. People seem to have been more influenced by the Chatbot and adapted more to its behaviour by acting the same, while in the human condition people imitated the exact words of the human. This suggests people are obtained with two different recipient designs when talking to a human than with a Chatbot.

5.2

Method

The study could had been divided into two different studies. One study to develop and evaluate the Chatbot, and one study for the experiment.

The Chatbot was implemented in Chatscript, which is a natural language processing engine and dialogue flow scripting program. It was chosen because of the rule-based engine that prevent the Chatbot to repeat lines. Repetition of lines was a big problem in an earlier study (Fornander 2017), and made the participants give up on the task earlier when cooperating with the Chatbot than with a human. This problem was fixed by using Chatscript. However, it also gave new problems as a keyword could only trigger a line once. It resulted in the Chatbot changing subjects as it was out of lines on that topic, but the participants sometimes did not feel finished and tried to get back to the subject. Hence, the participants felt repetition in the communication. Fixing the problem in this way meant the Chatbot broke maxims from the Cooperative principles by Grice (1975). More lines added to the script regarding each topic could fix this problem. There not being a significant difference however in the execution time suggests the participants did not give up on the task as in the earlier study, and the problem from that study can be considered to be solved.

A change in the manuscript caused some problems. A line had been changed in the manuscript from "OK" to "Okay I have the same in my image". The speech synthesis couldn’t pronounce "OK" or "Okay" if nothing followed after the word. In the pilot study every time "OK" had occurred it had been because the participant had described something that was the same in both images and the Chatbot agreed. In the final study however, it was used by the Chatbot as a change of topic when a topic had already been discussed, and it could alter what the Chatbot had said as it confirmed something was the same, even though it previously had mentioned it was not. A better answer needs to be implemented in future studies.

The street scene was challenging to implement because the scene consists of almost only signs and shops, resulting in the repeating keywords "shop" and "sign" being used by the participants repeatably, but the Chatbot could only react to it once. Baker and Hazan (2011) mentioned the repetition of keywords being higher in the street scene when testing the Di-apixUK material. Another choice of images from the DiDi-apixUK material might had solved this.

For the speech synthesis a female voice was chosen because the test coordinator was female. Studies have however shown on the gender of the voice influencing how people perceive the Chatbot. According to Nass et al. (1997) a female voice gives the impression on having more knowledge on love and relationship, while a male voice gives the impression on being more credible and dominant, and having more knowledge about technical issues. This suggests participants might had found a male voice to be more easily to cooperate with as it would had been perceived as more credible. Seeing as the test coordinator was female, a female voice was still chosen to prevent gender variables affecting the result.

References

Related documents

The third function at Arla Foods, Human Resource Business Partner (HRBP), focuses mainly on transformative tasks and work in direct collaboration with line managers under

Kommunen anser att det är viktigt att följa lagen, men man tycker dock inte att blandmodellen ger en rättvisande bild av kommunens ekonomi varför man i sin årsredovisning

The lexical diversity and the average number of ‘words per message’ were examined in order to answer the third and fifth research question of the study: whether the students

(2018) reported that the closed-loop communication model, where a message is repeated and confirmed to verify that the intended message is received, is not always used, re- sulting

Lq wklv sdshu zh frqvlghu wkh sureohp ri krz wr prgho wkh hodvwlf surshuwlhv ri zryhq frpsrvlwhv1 Pruh vshflf/ zh zloo xvh pdwkhpdwlfdo krprjh0 ql}dwlrq wkhru| dv d uhihuhqfh

helande, tungotal, och kvinnligt ledarskap. Pingströrelsen avisades av World Fundamentalist Association på grund av detta. Kanske fann de tidiga pingstvännerna inspiration och tröst

Kampen mot Somozadiktatur- en, revolutionen och motstånd- et mot USA:s aggression hade inte varit möjligt utan en mass- mobilisering för en social för- ändring2. I solidaritet med

[r]