• No results found

Designing a Digital Voice-Controlled Travel Guide: Investigating the User Experience of Voice-Controlled Customer Service

N/A
N/A
Protected

Academic year: 2022

Share "Designing a Digital Voice-Controlled Travel Guide: Investigating the User Experience of Voice-Controlled Customer Service"

Copied!
51
0
0

Loading.... (view fulltext now)

Full text

(1)

Master Thesis in Interaction Technology and Design, 30 ECTS M.Sc in Interaction Technology and Design, 300 ECTS

Spring term 2019

Designing a Digital Voice- Controlled Travel Guide

Investigating the User

Experience of Voice-Controlled Customer Service

Lovisa Carlsson June 6, 2019

(2)

Abstract

Automated text answers can be used to make customer service more efficient. However, text is not always a suitable type of interaction. The aim of this study was to find out, if the integration of a Voice User Interface, could improve the user experience of a messenger function, provided by a large digital travel agency. It has also been examined how useful the integration would be and how it affects three parameters that are used to measure customer satisfaction: time, quality of answers and quality of personal treatment. Existing guidelines for how to design a multimodal, voice user interface were used to create a hi-fi prototype of a digital voice-controlled travel guide. The prototype was tested on users using three different types of interaction; voice only, voice and visual feedback and text only. The results showed that the integration of voice commands did not have any negative impact on the three parameters. It was also shown that voice can be used to increase the accessibility of the messenger function. Which implied that it would be useful to integrate voice technology. In summary the study showed that the integration of voice technology will not have a negative effect on the user experience. Further research is recommended to motivate the increased accessibility that voice commands can bring to the messenger function.

(3)

Acknowledgements

First of all I would like to send many thanks to my thesis advisor Ulrik Söderström at the department of applied physics and electronics at Umeå University. Söderström has been a support through out the process by me feedback on my choice of objective and method.

I would like to send my thanks to everyone at TUI who encourage, motivated and supported me to proceed with this study. A special thank you to Hanna Andersson for reflecting ideas with me an helping me improve my report. Thank you to my team members in the UX team at TUI for spreading joy throughout the process. A special thanks to Victoria Cleverby for advising me on how to create customer journeys.

Thanks to Ann-Louise Erl, Product owner for the "Ask the guides" function, for letting me investigate your product and encouraging me to come with suggestions on new solutions.

Thank you Chris Carmichael, head of Innovation Lab at TUI Nordic, for your thoughts on the future of voice technology. Thank you Christopher Riddersäter for giving me tech advice when creating the prototype.

I would also like to thank TUI Customers for answering my survey and the participants who participated in the usability tests.

Thanks to Matilda Nilsson, Sofia Hedlund and Alfred Ödling for peer-reviewing this thesis. Also, many thanks to Simon Asp who was the opponent of this study.

Last but not least I would like to send acknowledge to Thomas Mejtoft who was the second reader of this study. I am beholden for his feedback and comments on this thesis.

(4)

Contents

1 Introduction 1

1.1 Objective . . . . 2

1.2 Thesis Outline . . . . 2

2 Background 2 2.1 Case Study Description . . . . 2

2.2 TUI "Ask The Guides" Function . . . . 3

3 Theory 4 3.1 Brief History Of Voice User Interface . . . . 4

3.2 Why Voice User Interface? . . . . 5

3.3 Different Types Of Voice-Enabled Interfaces . . . . 5

3.4 Environments And Contexts Where Voice Agents In Screen-First Devices Are Frequently Used . . . . 7

3.5 The Conversational Interaction-Loop . . . . 8

3.6 Guidelines For Designing a Voice-Enabled Multimodal Interface . . . . 9

4 Method 12 4.1 Literature Study . . . . 12

4.2 Questionnaire . . . . 12

4.3 Interview . . . . 13

4.4 Analysing Data . . . . 13

4.5 Affinity Diagram . . . . 13

4.6 Customer Journey Map . . . . 13

4.7 Prototype Concept . . . . 15

4.8 Test Concept . . . . 16

5 Result 19 5.1 Questionnaire . . . . 19

5.2 Interview . . . . 21

5.3 Analysing Data . . . . 22

5.4 Affinity Diagram . . . . 22

5.5 Customer Journey Map . . . . 24

5.6 Prototype . . . . 29

5.7 Flow Diagram . . . . 29

5.8 Multimodal Interface . . . . 30

5.9 Test Results . . . . 33

6 Discussion 40 6.1 Literature study . . . . 40

6.2 Questionnaire . . . . 40

6.3 Interview TUI Innovation Lab . . . . 41

6.4 Analysing Data . . . . 41

6.5 Affinity Diagram . . . . 41

6.6 Customer Journey map . . . . 42

6.7 Prototype Concept . . . . 42

6.8 Test Concept . . . . 43

7 Conclusion 45 7.1 Future Work . . . . 46

(5)

1 Introduction

Professional and qualitative customer service is crucial for many companies in the travel industry. Many tourists save up for a long time before going on their annual holiday.

Therefore they often have high expectations on agencies to proffer qualitative customer service [1].

New technologies that can improve customer service are entering the market in a fast speed. In 2016 a certain interface for consumer interaction became popular i.e chatbots [2]. Ever since they were established, chatbots are continuously becoming smarter, more autonomous and self-learning. Digital travel agencies are snatching up this sort of human- computer interaction. They are figuring out how to use this technology to improve their customer service. One of the many examples is the AI chatbot called AVA, it was launched by the low-cost airline Air Asia in January 2019 [3]. This bot helps users to book trips and make changes in bookings among other things.

Another technology that is continuously becoming better and more popular to use is Voice User Interface (VUI) [4]. VUI is strongly connected to chatbots. This technology enables users who can not see or type to interact with the chatbot using voice commands [5]. A study made by the tech company Capgemini1showed that among users who use a voice assistant2 81% consume it via their smartphones [6]. Travel agencies can take advantage of the fact that, a great amount of travelers today have a chatbot and voice-enabled device on them during their holidays.

TUI Nordic3is one of the leading digital travel agencies in the world. They want to drive the digitisation and continuously develop their products and the whole travel experience for their customers [7]. TUI Nordic provides costumer service in several formats. One of them is a messenger function called "Ask the guides"4. Via the messenger function, customers can ask guides located at the destination about anything. From questions about accommodation to weather conditions.

At the time of writing developers at TUI Nordic are integrating a chatbot into the

"Ask the guides" function. According to developers this improvement implies that the users will receive more prompt answers. Unfortunately, not all customers will be able to benefit from this function since it requires the user to have clear vision and the ability to type.

Approximately 25% of the population i Sweden has reading problems and 5-8 % has dyslexia.

More than half of the population over 16 years old has reading glasses. Extrapolated to the number of registered people at the hospitals for partially sighted, 120000 people are visually impaired. At least 30000 people are severely visually impaired[8].

Voice technology could be used to increase the accessibility for the "Ask the guides"

function, especially for people with different disabilities [9]. Therefore it is important that TUI take voice technology in consideration when implementing their chatbot. Apart from the increased accessibility, predictions says that people will be weaving voice technology into their daily lives and routines [4]. A study made by Google shows that one of the top reasons why people turn to their voice activated speakers5, is because it empowers them to instantly get answers and information. 41% of people who own a voice-activated speaker say it feels like talking to a friend or another person. This could be a strong indicator that voice technology can be used to improve the customer experience for travellers.

Cathy Pearl, Head of Conversation Design Outreach at Google says that you should never add a chatbot for the sake of adding a chatbot [5]. This study has the same approach towards integrating a VUI into the "Ask the guides" function.

1https://www.capgemini.com/us-en/

2A voice assistant is a digital assistant that aids the user through their phone or other voice enabled devices. The user uses voice commands to interact with the assistant. Speech recognition and natural language processing is used to accomplish this.

3https://www.tui.se/om-tui/om-foretaget/

4A detailed description of the "Ask the guides" function can be found in the section 3.1

5A voice activated speaker is a speaker that can receive and interpret dictation or understand and act upon spoken commands.

(6)

1.1 Objective

The objective of this study is to find out if a Voice User Interface could be used to improve the customer experience for the "Ask the guides" function. The three following research questions will be examined and by answering them the aim will be attained:

• How does a VUI affect the three parameters (speed, quality of answers and quality of personal treatment) that are used to measure quality of customer service?

• How useful is it to integrate a VUI into the "Ask the guides function"?

• Can a VUI be used to improve the customer experience of the "Ask the guides"

function?

1.2 Thesis Outline

This thesis has 7 chapters. Chapter 1 introduces the context and the objective of the study.

Chapter 2 contains a brief background about TUI and the "Ask the guides" function that the thesis is focused on. Chapter 3, the theory, gives a brief history of voice user interfaces, along with information about why VUIs are important and what types of voice-enabled devices that are out on the market today. It is also described in which environments and contexts voice agents in screen-first devices are used. This chapter also provides a more technical description of how the conversational interaction-loop works. Last but not least a set of guidelines for how to design a voice-enabled multimodal interface are explained.

Chapter 4, the method presents how a literature study, a questionnaire, an interview with an expert, a data analysis, an affinity diagram, customer journey maps, a prototype and a test were made. The questionnaire was made in order to collect user opinions and the data analysis together with the customer journey maps were made to observe user behaviours.

The affinity diagram was used to make sense of the mixed data collected at that point. The prototype was made in order to preform the most important part of the study, the test.

The test was performed so that the questions in the objective could be answered. Chapter 5 presents the results from the different parts of the method. Chapter 6 Discusses the qualitative and quantitative data collected from the results. This chapter also brings up the errors that occurred in the study. Chapter 7, the conclusion, states the outcome of the study and answeres the questions in the objective.

2 Background

With each recent decade we have seen a new form of human-computer interaction emerge and quickly become commonplace, see Figure 1. Two decades ago we gained access to the web. 10 years later when the mobile devices grew stronger we could carry the web with us anywhere. People can now experience the potential of voice user interfaces. Voice commands can be used to perform several tasks, for instance control lights, play games and order food [10].

Voice can also be used to improve customer service. Companies in the travel industry have acknowledged this and they are figuring out how to use voice in the best way. Several hotels has replaced their room service via phone with an in-room voice assistant. Instead of phoning the receptionist the hotel guests can ask the digital assistant for restaurant recommendations, what time the breakfast is served and to set the wake up alarm [1].

2.1 Case Study Description

This study was made in collaboration with the digital travel agency TUI Nordic. They are continuously working with new technologies and they want to drive the digitisation.

Therefore they are gradually integrating voice technology into their products. They have integrated a search function into their website, that enables the users to search for a holiday using voice commands.

(7)

Figure 1: Figure by Amazon that describes the evolution of human-computer interaction.

Image retrieved from the web page for Amazon Alexa (https://developer.amazon.com/alexa- skills-kit/vui)

TUI wants to continue exploring new contexts where voice technology can be used to increase the customer experience. One of the functions where voice could be useful is the messenger function "Ask the guides". The function is described in Section 2.2.

2.2 TUI "Ask The Guides" Function

"Ask the guides" function is a messenger platform that enables the customers to talk to a guide located at the holiday destination. The guides send service messages throughout the customers holiday and the customers can ask questions. The queries are answered as soon as possible. Before departure all incoming queries are answered within 24 hours and during holiday the guides answer incoming queries within 30 minutes, 24/7. The service is accessed via the TUI-app (iOS and Android), via My account6and via SMS. The customers can also reach the guides when on holiday, via phone 24/7.

Measuring The Customer Experience of the "Ask the Guides" Function

In order to measure the customer experience of the "Ask the guides" function TUI follows three parameters:

• Speed: How fast the guide replies to a question or feedback from the customer

• Quality of answers: How well the guide can answer a question or reply to the feedback

• Quality of personal treatment: How the customer perceive the personal treatment from the guide.

It is important to take these parameters in consideration when making changes in the "Ask the guides" function. None of the above mentioned parameters should deteriorate when integrating voice technology in the "Ask the guides" function.

6TUI.se/TUI.no/TUI.dk/TUI.fi

(8)

Figure 2: The iOS mobile interface for the "Ask the guides" function ("Fråga Guiderna" in Swedish). The user can access the messenger platform by clicking on the button that says

"Fråga Guiderna"

3 Theory

In this theory section a brief history of voice user interfaces are presented. Along with information about why VUIs are important and what types of voice-enabled devices that are out on the market today. It is also described in which environments and contexts voice agents in screen-first devices are used. This section also provides a more technical description of how the conversational interaction-loop works. Last but not least a set of guidelines for how to design a voice-enabled multimodal interface is explained.

3.1 Brief History Of Voice User Interface

The first version of a VUI was a single-speaker digit recognition system, built in the 1950’s by Bell Labs. In 1960s and 1970s along with research, the slight vocabularies of these systems were broaden into "continuous speech recognition". Meaning that pauses between every word in the conversation were no longer needed. 20 years later the first feasible speaker- independent (anyone could talk to it) system emerged [5].

Interactive voice response (IVR) systems belongs to the first generation of VUIs, which had the capacity to perform tasks after processing human speech over the telephone. Rapidly IVR systems became prevail, in year 2000 all owners of a landline phone could perform several tasks using just a phone and a human voice [5], such as:

• Book plane flights

• Transfer money between accounts

• Order prescription refills

We are today in what we could call the second generation of VUIs. The prevail systems

(9)

of today are mobile apps such as Siri7, Alexa8 and Google Assistant9. These virtual assis- tants combine visual and auditory information, and voice-only devices like Apple HomePod, Amazon Echo and Google Home [5]. The fact that our ears are the second-most important sensors and speech is the most common and fundamental means of human communication, makes it clear that VUIs has increased the accessibility of electronic devices [9].

3.2 Why Voice User Interface?

VUI technology has several advantages, some of them are briefly described in this section.

No Hands Or Eyes Needed

The most essential advantage is that voice enables the user to interact with a product while performing a second tasks. For example using Google assistant to set a calendar event while doing the dishes allows the user to interact with the system in an hands-free and eyes-free way [11].

Speed

Speed is also an advantaged of voice. A study produced by Stanford University showed that speech is three times faster than keyboard text entry for short messages in English and Mandarin on touchscreen phones [12].

Empathy

Humans often struggle with understanding tone of text messages. Therefore empathy is an important advantage of voice. It is easier to know if a person is annoyed or just being sarcastic by listening to the tone in voice rather that distinguish it from a written message [5].

Accessibility

Last but not least voice has increased the accessibility of human computer interaction for people with different disabilities. For example a person with one hand can find it easier to use speech instead of text. Visually impaired doesn’t have to use graphical interfaces when setting a reminder, instead they can with voice commands ask their assistant to place the reminder for them [9].

3.3 Different Types Of Voice-Enabled Interfaces

Voice can be used in different ways. This section gives a short introduction to the three most common ones.

Voice-Only Devices

Voice-only devices have no visual displays and users can only communicate with the machine using voice. Since the level of communication and options is restrained on these devices, people often use them to complete simple tasks such as getting answers to straightforward questions [13]. One example of a Voice-Only Device is the "Google Home Mini" developed by Google, see Figure 3.

7https://www.apple.com/siri/

8https://developer.amazon.com/alexa

9https://assistant.google.com/

(10)

Figure 3: Google’s "Google Home Mini" is a voice only device developed by Google. Image collected from Google (https://support.google.com/googlehome/answer/7029281?hl=en).

Voice-First Devices

For these devices voice is the primary user interface but not the only one. A voice-first device has an built in screen display that displays a user interface that is not loaded with app icons.

Instead the system uses the visual interface to call for voice demands from the user [13].

One example of a Voice-First Device is the "The Echo Show" developed by Amazon, see Figure 4.

Figure 4: Amazon’s "The Echo Show" is a Voice-First Device developed by Amazon. Image collected from Amazon website (https://www.amazon.de).

Multimodal Interfaces

A multimodal interfaces uses several channels for input and output. This type of user interface combines voice, touch, audio and different types of visuals. It can be more complex to build a multimodal interface compared to an interface that uses only visuals or voice. The reason why we should consider creating multimodal interfaces can be tracked down to the fact that humans have five senses. How our sense work together constitute how we perceive things around us. Imagine watching a movie in a cinema and cutting out one of our senses, for example sound. Removing one sense, in this case hearing, creates a different experience [13].

(11)

Figure 5: The GUI for Google assistant. The user can ask questions and give demands with voice or text and the system will respond with text and voice.

Voice Agents In Screen-First Devices

A screen-first device is for example an Iphone or an Android mobile device. Google Assistant and Apple Siri are examples of voice agents. A voice agent in a screen-first device is a Graphical User Interface (GUI) which is improved with voice functionality. This type of agent is a multimodal interface. The user can interact with the GUI using typing or voice.

The assistant can also be used using only voice or typing. Googles app Google Assistant is a popular example of a voice agent that can be used in a screen-first device, see Figure 5 [13].

3.4 Environments And Contexts Where Voice Agents In Screen- First Devices Are Frequently Used

For three years in a row the digital marketing agency Stone Template has published studies on mobile voice usage trends. The most recent study was made in 2019 [14]. In this study they asked 1000 people where and when they use voice.

The study showed that people were becoming more comfortable with using voice com- mands on their mobile devices in public environments. Even though the stigma of talking to our devices was dissipating, most people were still using voice commands at home alone or with friends [14].

(12)

Regarding what context people use voice for, making a call, text and get directions are the three top use cases. The study shows that the majority of the participants would prefer to use voice commands instead of their hands when they have their hands full. The number one application that people use voice for turned out to be making a call. In 2018 the most popular one was texting that dropped into the number two spot in 2019 [14].

3.5 The Conversational Interaction-Loop

It is important to understand how machines handles conversation and the way humans and machines communicate. This section will describe the interaction with voice commands focusing on the technical perspective.The voice agency Voxable10has stated what they call the Conversational Interaction-Loop [15]. The stated expression describes what a machine must be able to do in order for a human to converse with it. The loop is divided into four parts which are described below.

Figure 6: Visual representation of the conversational interaction-loop. Image recom- posed from image in a blog post by Voxable: https://www.voxable.io/blog/2018/05/09/the- conversational-interaction-loop

Listen - Automatic Speech Recognition

Machines needs to have the means for understanding the assortment of sounds from human speech and translate them into text. Generally developers use machine learning algorithms trained to recognize human language from streaming audio in order to accomplish this.

This technology is called automatic speech recognition (ASR). ASR is a process in which a computer recognizes the words a user is saying and converts them into text in real time. ASR accurancy is continuously improving and the technology is built into browsers, operating systems, phones and voice-first devices like Amazon Echo and Google Home [15].

Understand - Natural Language Understanding

Machines must be trained to understand text originated from human speaking and typ- ing. Natural Language Understanding (NLU) is the technology that render messy human language into structured data. There are several NLU platforms that are accessible and affordable for non-partisan developers to use, such as; Google’s Dialogflow, IBM Watson, and Microsoft LUIS [15].

Process - Bot Intelligence

The system must make sense of the data from the previous phase and logic must be put in place for how the machine handles that information. Usually this is called bot intelligence.

10Voxable is an agency that designs and develops conversational and voice interfaces:

https://www.voxable.io/

(13)

Bot intelligence combines data extracted from the NLU with contextual data and business logic to handle the complexity of the conversation. In order to perform the correct action, the machine, for example a chat bot, may have to ask a follow-up question to clarify the user’s intent [15].

Respond

When the system has captured data and made sense of it, it is time to generate a beneficial response to the human. There are different types of responses, for example it can commu- nicate the state of an action, relay information, or request more information from the user [15].

Voice responses can be delivered using three different technologies, audio feedback, syn- thesized speech, and recorded audio. Synthesized voice or text-to speech (TTS), is a ma- chinery that can mimic a human voice. Today this machinery translates written language into sounds that humans recognize as tone, with a very high quality. In a VUI, responses can also be made with text based interaction as a supplement. The responses are then conveyed through a graphical interface such as the one in the Google Assistant app, see Figure 5 [15].

3.6 Guidelines For Designing a Voice-Enabled Multimodal Inter- face

In this section a set of guidelines for designing a voice-enabled multimodal interface will be explained. The guidelines are stated by Nick Babich who is a developer and UX enthusiast.

His guidelines can be found in an article that was published at Smashing Magazine in December 2018 [13]. He writes about how multimodal interfaces gives users more power and how they should be designed and built. The guidelines implicate what preparatory work a designer should carry out. What he/she needs to consider when designing conversations and how to on-board the user.

Make Sure You Solve The Right Problem

When starting the process of implementing a VUI it is important to determine whether voice can improve the UX of the function or product. This is important because design should solve problems. Thus a designer should conduct user research in order to figure out what problem he/she is solving when implementing a VUI. It can be useful to create a customer journey map11 to find problems that can be fixed with voice technology.

Create Conversational Flows

Conversations that only last one turn are not common when people talk to each other. Thus interactions with a system should be designed in a way that gives the user a natural feeling.

When talking about voice technology a natural feeling of a conversation can be created by building and testing conversational flows. According to Babich (2018) a conversational flow consist of dialogues

"the pathways that occur between the system and the user. Each dialog would include the system’s prompts and the user’s possible responses."

Conversational flows can be visualized by creating flow diagrams. A VUI designer should create a flow diagram for each use case. For this study a typical use case can be asking for the departure gate of a certain flight.

There are three crucial components that constitute a voice command:

• Intent: is the purpose of the customers interaction with a voice-enabled interface

11A customer journey map is a visualization of a customers accord with a product or function in different contexts and situations.

(14)

• Utterance: is how the customer expresses their inquiry. For example a customer can ask for the right gate by saying "Gate number for flight DY123?" or "From what gate does my flight to Thailand departure?"

• Slots: are variables that are expressed in a customers voice command. In the above mentioned example "flight" is a slot. The customer might have to add additional information to the inquiry, in this case a flight number.

Do Not Put Words In The Users Mouth

People know how to talk, thus voice commands should not have to be explained. A VUI designer should create natural language conversation and take different speaking styles in consideration. It is not good practice to send out phrases to the user like:

"To find out the right gate number for your flight, you need to say, ’travel guide give me the gate number for flight DY123’."

Instead the system should tell the user what it can help them with, for example:

"You can ask me questions about your flight and other things related to your travel."

Strive For Consistency

Regardless in what context the users find themselves in the VUI needs to have consistency in language and voice. Consistency creates closeness in interaction.

Always Provide Feedback

It is important to keep the users informed of the current status of the system. This can be achieved by providing feedback at the right time. The system should be designed so that the user is aware that the system is listening. This feedback can be a flashing light, a specific sound or an animation. The user should also be provided with conversational markers that update the user with where they are in the conversation. Last but not least the system should confirm when a task is completed. For example if a user make a request to set an alarm, the system needs to confirm that the action has been executed. Without confirmation the user needs to go in to the alarm clock app and check if the alarm is set.

Avoid Long Sentences

Humans short-term memory can not retain a lot of information, thus it is not smart to provide the user with to much information. Sending out long sentences can entail that users forget important information. Audio is a so called slow medium which means that people listen to information slower compared to reading it.

Provide Next Steps Sequentially

Not only long sentences can be tricky for the user, providing to many options is also a bad idea. A VUI designer should limit the scope of options the user has and break down the interaction with the system into smaller parts.

Have a Strong Error-Handling Strategy

The system will not be able to understand the user at all times. When error situations occur the system should be able to handle this. It is important to not blame the user when an error occur, instead handle these kind of situations gracefully and let the system show the user its lack of understanding. When people talk to each other one might have to repeat them selves, thus the interaction with a system could be the same.

(15)

Keep Track of Context

At any time the system should be able to save input from the user and remember what was expressed recently. For example if a user ask for the departure time of a certain flight and after receiving an answer from the system responds with "When does it arrive at the destination". The system has to remember that the user refers to the flight when he/she says "it".

Learn About Your Users to Create More Powerful Interactions

A voice-enabled system can be more personalized when a technique called "intelligent inter- pretation" is used. This means that the system continuously learn about the user in order to adjust its behaviour accordingly. The technique will enable the system to answer to more complex questions such as "What should I not forget to bring for my flight and vacation?".

Give Your VUI a Personality

A higher level of user engagement can be achieved by giving the VUI a personality. People accomplice voice with humans rather than systems. VUI designers can create a personality by constructing a persona of the voice-enable agent and refer to this persona when designing dialogues.

Build Trust

Users needs to trust the system in order to gain motivation to use it. Thus the VUI designer needs to build a trust between human and machine. Voice-enabled systems might require user on-boarding to seize the users trust. One way of doing this is to offer meaningful examples of what is and isn’t possible with the system. For example the system can introduce the user by saying "Hi I’m your virtual travel guide. You can ask me anything related to your holiday. I will try to answer your question in the best way possible.".

Handle Security And Data Privacy

There can be situations where the user provides the system with personal information even if the system has not asked for it. The system needs to be able to handle this information and make sure it does not end up anywhere that could harm the user. For example if a user for some reason gives the system its bank credentials it would be harmful if the system speaks out "I did not understand the following message: ’my credit card number is 123456789’".

Conduct Usability Testing

All systems should be tested early and often, including voice-enabled systems. Multimodal interfaces has its own benchmarks when it comes to testing. The following phases should be considered when testing multimodal interfaces:

• Ideation phase: When creating dialog flows, practice reading them out loud. Record conversations and listen to them in order to understand whether they sound natural.

• Testing with lo-fi prototypes: Create a Wizard of Oz test. Let the user interact with a system that they believe is operated by a computer but in-fact is operated by a human.

• Testing with hi-fi prototypes: Compared to testing a graphical interface the user can not speak out their opinions about the system alongside interacting with the voice- enabled system. It is better to observe the user’s interactions with the system and ask for opinions when the test session is finished.

(16)

4 Method

The method is divided into 8 parts. (1) A literature study, (2) a questionnaire, and an interview with Chris Carmichael at the TUI Destination Experiences department, (3) an analyse of what questions customers ask via "Ask the guides", (4) an affinity diagram that creates the basis for a (5) customer journey map, (6) prototyping the concept, (7) testing the concepts and (8) analysing the results. A visual representation of the method can be seen in Figure 7

Figure 7: The figure describes the order of the different parts of the method for the study

4.1 Literature Study

A literature study was conducted to create a theoretical basis for the forthcoming parts of the method. Following fields were researched:

• History of voice user interface

• Why and when a voice user interface should be used

• The technology behind voice user interface

• In what contexts and environments users tend to use voice technology

• How to design a voice user interface with a perspective on user experience design

4.2 Questionnaire

A questionnaire was sent out to TUI customers in order to collect their opinions about and habits with voice technology. It was of interest to look into if the customers use a voice

(17)

assistant or would be open to use one. How they would use voice technology when they travel and what type of questions they would ask a virtual guide via voice. The questionnaire consisted of 10 questions and was published on the TUI Nordic web page. The user testing site and tool called Hotjar was used to create and publish the questionnaire.

4.3 Interview

The department called TUI Destination Experiences has implemented automated answers into their version of the "Ask the guides function". Their goal is to answer 25% of the questions with automated answers. The aim with the interview was to see how they work with customer experience and if they have any plans on integrating a VUI in their version of

"Ask the guides". An Interview was made with Chris Carmichael who is head of innovation at the innovation Lab at TUI Destination Experiences. The interview was held via Skype12 and it was recorded in order to recall what was said. The Interview was divided into two parts. In the first part of the interview Carmichael held a brief presentation of how the automated answers works and how they have integrated the functionality. In the second part we talked about the user experience of automated answers and how they will further work with this functionality. Last but not least Carmichael was asked to share his thoughts about the future of voice technology.

4.4 Analysing Data

During the execution of this study no Swedish data regarding the usage of the "Ask the guides" function could be accessed. Therefore data collected by TUI Destination Experiences for their version of the "Ask the guides" function was analyzed. This part of the method was conducted in order to find out what questions customers were asking the guides via the

"Ask the guides" function. In what context they are asking the questions and what sort of messages they are sending; complaints, neutral or positive.

4.5 Affinity Diagram

In order to make sense of all mixed data and information collected at this point of the study, an affinity diagram13 was made. Insights from the literature study was brought together with:

• Users opinions, habits and needs collected in the questionnaire

• Insight from interviewing an expert

• Data for usage of the "Ask the guides" function

All the information and data were grouped into clusters of similar themes and patterns.

Afterwards the clusters were named in order to create an information structure and to discover themes. The clusters were ranked over one another with regards to the users needs and the objective of this study.

4.6 Customer Journey Map

Four different customer journey maps14were created with the aim of visualising the process that a TUI customer goes through in order to get in contact with a guide through the "Ask the guides" function. Each journey had a unique context and question formulation. The contexts and questions were based on the data collected in section 4.4. All of the journeys were based on a unique customer persona that TUI Nordic works with. Each Persona were addressed with the following parameters:

12A communication tool for free calls and chat (https://www.skype.com/en/)

13Affinity diagram is a method that is used in many phases of design thinking (www.interaction- design.org/literature/article/affinity-diagrams-learn-how-to-cluster-and-bundle-ideas-and-facts)

14A journey map is a common UX tool that is used to visualize the process that a person goes through in order to accomplish a task. (https://www.nngroup.com/articles/journey-mapping-101/)

(18)

Figure 8: Some of the data collected for the affinity diagram

• Name

• Family

• Living location

• Personality

• Education

• Online behaviour type

• Question: Which question the customer asks the guide

• Context: In what situation and environment, as well as the time of the day that the customer uses the "Ask the guides" function

A visualisation of the customer journey map can be seen in Figure 9. The journey map is divided into 4 rows and 6 columns. The row stands for user approaches and opportunities for improvements. The columns represents different steps of the user journey. The rows and the columns are described below.

Rows:

• Doing: Describes each steps the user has to go through in order to accomplish a task.

• Thinking: What the customer is thinking when performing each step.

• Feeling: The feelings that arouses when the customer is performing a step.

• Opportunities: Solutions that could rectify the obstacles a user is experiencing.

Columns:

• Step 1: Open application or click on notification.

(19)

• Step 2: Click on the button that says "New message from the guides".

• Step 3: Feeling: Write a message.

• Step 4: Await answer.

• Step 5: Receive answer.

• Step 6: Respond to answer.

Figure 9: A template for the customer journey map that was created and used for this study.

4.7 Prototype Concept

In order to test how the integration of voice technology could affect the customer experience of the "Ask the guides" function, a prototype was created. A Hi-Fi prototype of a digital voice assistant15 was created. The prototype was created using the guidelines presented in section 3.6. To build the conversational experience a developing tool called Dialogflow16was used. Using Dialogflow an action17for the Google Assistant was implemented.

Before implementing the action, a flow diagram was created for two of the scenarios from the Customer Journey map. The flow diagrams were tested on co-workers at TUI. The dialogues in the flow diagram was spoken out loud between two persons in order to locate errors in the flow.

Two icons were created using the design tool Sketch18 and design guidelines provided by TUI. The purpose with the icons was to give the VUI a personality and create a more personalized feeling.

15See section 3.3 for explanation

16https://dialogflow.com/

17Actions on Google enables developers to extend the functionality of the Google Assistant with Actions.

18https://www.sketch.com/

(20)

4.8 Test Concept

In this study a checklist for planning usability tests were used. The list is written by Hoa Loranger (2016) and was published on the web page for the Nielsen Norman group who are world leaders in research-based user experience [16]. A test plan was created according to the checklist. The steps of the test plan are listed below and described in this section.

• Name of the product being tested

• Study goals

• Logistics: time, dates, location, and format of study

• Participant profiles

• Tasks

• Metrics

• Description of the system (e.g., mobile, desktop, computer settings) Name of the product being tested

The product that was tested is a prototype of an automated version of the TUI “Ask the guides” function. The prototype was a Hi-Fi prototype of a digital voice assistant, presented as a digital TUI assistant.

Study goals

The main goal of the study was to find out if a VUI could be used to improve the customer experience for the “Ask the guides” function. In order to investigate this matter, the following questions were in focus during the study:

• How does a VUI affect the three parameters speed, quality of answers and quality of personal treatment. Which are used to measure quality of customer service via the

“Ask the guides” function.

• How useful is it to integrate a VUI into the "Ask the guides" function?

• Can a VUI be used to improve the customer experience of the “Ask the guides” func- tion?

Logistics

There were two test sessions, the first session was conducted in the lunch restaurant at the TUI office and the second one in a quiet meeting room. The choice of different locations, was made on the basis that; people can experience using voice in different ways, depending on what environment they use it in. The format of the study was in-field and in-person for both sessions. One test took approximately 15 minutes and all the tests were conducted during two days.

Participant Profiles

Twelve participant were tested and they all had booked a trip via TUI or some other travel agency. The participants were people within the age range 20-61+. 9 of the participants were employees at TUI. The participants had a mixed educational background and different levels of tech skills.

(21)

Tasks

Two different tasks were tested. Task 1 was tested in the crowded lunch restaurant and Task 2 was conducted in the quiet meeting room. In the beginning of each test the participants were informed about the task and that they were tested anonymously. The participants were also informed that they could ask questions or terminate the test at any time. Each task description started with letting the participant know that he/she could assume that the digital TUI assistant knew all details regarding their trip. I.e the participant did not have to memorize flight-number, booking-number, name of hotel and etc.

Both of the tasks were created from the customer journey maps, see section 4.6. The two tasks were tested 6 times respectively using three different types of interaction. Different types of interactions were used. In order to compare the differences in how they affected the three parameters; speed, quality of answers and quality of personal treatment. The interaction types are described below:

• Voice Only: The participants could only interact with the prototype using their voice.

They could communicate with the system by talking and listening. The participant could not use visual feedback in order to finish the task.

• Text Only: The participants could interact with the prototype using their hands.

They could communicate with the system by typing messages and receiving visual feedback. The participants could use the visual interface in order to finish the task.

• Voice and Visual: The participants could interact with the prototype primarily by using their voice. They could communicate with the system by talking, listening and receiving visual feedback. The participants could use the visual interface in order to finish the task.

Each type of interaction was tested twice by two different participants. Table 1 below describes the set up for each task.

Task 1

Type of Interaction Participants

Voice Only 2

Text Only 2

Voice and Visual 2

Task 2

Type of Interaction Participants

Voice Only 2

Text Only 2

Voice and Visual 2

Table 1: One table for each task. Describes how many participants who tested the different types of interactions. Each participant conducted one test.

(22)

Task 1 - Check if flight is delayed: The participants were told that they were leaving on a flight to Malaga in a few hours. They were also informed that there had been announcements on social media that there was an ongoing flight strike in Spain. The strike might have affected their flight. The task was for the participant to ask the digital TUI assistant if their flight to Malaga was on time. The tests for this task were conducted in a crowded lunch restaurant.

Task 2 - Look up what transfer is included in trip: The participant were told that they had booked a trip to Malaga with departure in one week. The task was to ask the digital TUI assistant what transfer they had from Malaga Airport to the hotel they were supposed to stay at. The tests for this task were conducted in a quiet meeting room.

Metrics

Three parameters were measured in this test; speed, quality of answers and quality of per- sonal treatment. Observations of how the participants interacted with the system were also made in order to see how useful the system was and what obstacles the participants came across. After performing each test, short interviews were held with the participants, in order to collect their opinions and experiences.

Speed: The time it took for a user to finish the task. In order to measure the time, a timer was used. The timer was not showed or mentioned to the participant, since that could have caused unnecessary stress.

Quality of answers: The participant’s opinion of how well the digital TUI assistant could answer their question. In the interview that was held after the test, the participants were asked to grade the quality of the answer that they got on a scale from 1-9. Where 1 was equivalent with "not good at all" and 9 was equivalent with "very good". It was also mandatory for the participants to motivate why they gave a certain grade.

Quality of personal treatment: The participant’s opinion of how personalised they experienced the service to be. In the interview that was held after the test, the participants were asked to grade the quality of personal treatment that they got on a scale from 1-9.

Where 1 was equivalent with "not good at all" and 9 was equivalent with "very good". It was also mandatory for the participants to motivate why they gave a certain grade.

Description of the system

The system that was tested was the prototype described in section 4.7. A smartphone (OnePlus 619) with Android as operative system, was used to test the Google assistant action. For the tests that involved voice as interaction type, a pair of noise canceling headphones (model Jabra Evolve 7520) was used. A laptop (model MacBook Air21) was used in order to time each test session and to take notes during the interviews.

19https://www.oneplus.com/se

20https://www.jabra.se/business/office-headsets/jabra-evolve/jabra-evolve-75

21https://www.apple.com/shop/buy-mac/macbook-air

(23)

5 Result

In this section the results from the study are presented. The section starts with a summary of the results from the questionnaire followed by answers from the interview with Chris Carmichael from TUI Destination Experiences. Thereafter the results from the analyzed data are presented along with insights and decisions stated from the Affinity Diagram. The results from each customer journey are presented as well as the outcome of the prototype and the tests.

5.1 Questionnaire

The questionnaire had 157 respondents. None of the questions were mandatory and therefore the amount of people who answered each question is varying in the result. The of the respondents and their interest in technology are listed below, see Table 2 and 3.

Age

Age Amount (people in %)

-20 1,3

21-30 2

31-40 3,3

41-50 16,7

51-60 23,3

61+ 53,4

Total: 150 respondents

Table 2: Table that lists the age of the participants.

Interest in new technologies 1 = not interested at all

5 = very interested Grade Amount (people in %)

1 14,4

2 10

3 32

4 20,4

5 23,2

Total: 125 respondents

Table 3: Table that lists how interested in new technologies the respondents were.

Out of 113 customers 36,3% thought that the "Ask the guides" function works well and 34,5% thought that it worked very well. 24,8% answered neutrally and 3,5% thought it was poor and only one person found that it worked extremely poor. The customers appreciated that the "Ask the guides" function enabled them to get in contact with a guide located at the holiday destination. Both before and during their holiday. It gave them a feeling of reassurance. One customer expressed the following:

"I really appreciate the fact that I can get in contact with the guides at the destination of my holiday, before I get there. Sometimes I have questions or an inquiry regarding the hotel or the room."

Several customers expressed that they received answers within a decent time span, using the "Ask the guides" function. They also expressed that the quality of the answers were satisfying:

(24)

"They replied to my questions quickly and the answers I got were correct and qualitative."

"They were there to help me at any time and they replied quickly to my in- quiries."

Accessibility was also a reason to why some of the customers liked the "Ask the guides"

function:

"I like the fact that I at any time can contact the guides via my mobile phone."

"The "Ask the guides" function enables me to in an easy way receive answers to my questions."

"It was easy to find answers to my questions and I didn’t encounter any difficul- ties."

The question that asked the customers to give feedback on what could be improved with the "Ask the guides" question, only received 9 answers. Among these answers, people thought that the response time should be decreased and that they want to be able to contact the guides at an earlier stage:

"It would be better if I could start contacting the guides after I have booked my holiday. Not just two weeks before departure."

When asking the customers if they would be willing to use the "Ask the guides" func- tion with voice commands instead of typing the majority answered no, the remaining part answered yes, see Table 4.

Would you consider using "Ask the guides" with voice commands (as a substitute for typing)?

Answer Amount (people) %

Yes 39 37,8

No 64 62,2

Total: 103 respondents

Table 4: Table that displays how many of the respondents that would be willing to use the

"Ask the guides" function with voice commands instead of typing.

The top applications that the customers have controlled with voice commands were;

receiving an answer to a question. The second most popular was to set a reminder, see Table 5.

(25)

Which applications have you controlled using voice commands?

Answer Amount (people) %

A - Order food 5 4,9

B - Calendar event 5 4,9

C - Set a reminder 20 19,6

D - Receive answer to a question 42 41,2

E - Play music 18 18

F - Other Call a contact Send message Translate text

Get restaurant recommendations Control lights

Car navigation

12 11,6

Total: 102 respondents

Table 5: Table that displays what applications the customers have controlled using voice commands

The most popular voice assistant among the customers were Apple’s Siri. 54 out of 73 respondents had used Siri, 18 had used Google Assistant and only 1 person had used Amazon Alexa. The customers were asked if they would want to use voice commands in combination with typing or only voice commands when interacting with the "Ask the guides" function if voice was available. The majority (50% of 103 respondents) answered that they would not use voice commands and 44% answered that they would use voice in combination with typing.

5.2 Interview

The Interview with Chris Carmichael head of innovation at TUI Destination Experiences, came to revolve around one great topic; how to solve problems before they happen.

"The more we understand our customers the more we understand what they need from us. The more likely we can solve problems before they happen." - Chris Carmichael

At TUI Destination Experiences they analysed what kind of questions the customers were asking in their version of the "Ask the guides" function. They could see that customers often asked for the departure and arrival terminal of their flights. Having the customers repeatedly ask these kind of questions were causing a heavy workload for the guides who responded to the messages. Due to this, customers received answers in a slow speed. Therefore Carmichael and his team could see the benefit of using AI to lower the workload and improve the customer experience. By providing the customers with automated answers the response time would be improved, which could decrease the workload for the guides. Automated answers were implemented in order give the customers more prompt answers to simple questions.

Automation also enabled the users to receive answers to questions before asking them. For example the automation enables the system to send out reminders of which terminal and gate the customers should go to.

Carmichael explained that they in his team, focus on investigating areas that they believe will be a part of everyday technology in a near future.

(26)

"What will our customers expect from us in the future?" - Chris Carmichael

When talking about the future of the "Ask the guides" function, Carmichael expressed that voice will have a great impact. He believes in the future of voice technology and think that voice is the next area that should be investigated with regards to the "Ask the guides"

function. He think that it is important to look at how the customers will use voice in the future and how this can improve the customer experience.

5.3 Analysing Data

Due to the fact that the implementation of the chatbot for the Swedish version of the "Ask the guides" function was in the making during this study. No data had been collected yet. Therefore data from the UK version had to be used in this study. The data consisted of messages that customers had sent to guides via the "Ask the guides" function. Each message had been given a main tag and sub tag depending on what type of message it could be categorised as. For example questions about flight strikes or departure terminal were structured under main tag "flight" and sub tags "info" and "terminal". The questions were also labeled with one of three sentiments:

• Neutral: not negative message

• Concerned: a guest waiting, worried, or talking about something important or urgent

• Angry: a guest disgusted, unhappy, or complaining hardly.

The results from looking at the analysed data are summarized below:

Most frequent messages: The top three categories of messages that customers had sent were; flight, transfer and mixed requests. Messages regarding flights can be anything from general flight information, questions about luggage, check in or flight changes. Exam- ples of transfer questions could be "Can I change my pick up time to 4pm" or "Is it possible to change our return coach transfer to a private taxi transfer?". Mixed requests is a category that contains requests such as "Can you remind me how much the departure tax is please."

and "Who do I speak to about a lost hand luggage bag? It was was left on the coach."

Customers tended to ask simple questions such as "When is my flight departure?". As well as more complex questions such as "Do they have strawberry flavoured ice cream at the hotel?".

Sentiment of messages: The majority of the messages had a neutral sentiment and a low amount was concerned or angry. which means that customers tend to send messages in more neutral situations rather when they are angry or concerned.

Context when sending messages: There were no structures of the data in terms of what context or environment the messages were sent. However, by analyzing a large chunk of messages the conclusion could be made that messages had been sent in different types of situations and environments; at home, at the airport, at the hotel room etc.

5.4 Affinity Diagram

The result from the affinity diagram22 came to be a summary of important insights and decisions on what type of prototype that should be implemented in order to test the purpose of this study. The insights and the decisions are presented below.

Insights

In section section 3.4 it was stated that people are most comfortable with using voice com- mands at home. However, people are becoming more comfortable with using voice commands

22See section 4.5

References

Related documents

Interviews with employees at ASSA ABLOY were conducted early in the study to be able to establish a framework of how the company are currently working with innovation and VoC and

Få hjälp att till exempel boka biljetter, starta Instagramkon- to eller annat på din telefon, platta eller bärbara dator eller välkommen att bara fika och

The three studies comprising this thesis investigate: teachers’ vocal health and well-being in relation to classroom acoustics (Study I), the effects of the in-service training on

In the startup process of the voice recognition program, the user records a word that is associated to the home appliance, the word will be frequency analysed and stored.. When

(2010), although ethnographic research is rated as a highly effective method that provides great insights into customer needs, behavior, problems and

Visitors will feel like the website is unprofessional and will not have trust towards it.[3] It would result in that users decides to leave for competitors that have a

The hypothesis itself suggest that decision-making isn't as rational as explained by older theories, such as the expected utility theory, and that emotional mechanisms in the

In the previous chapters I have shown how the separate diegetic narratives each correspond to different depths within the subconscious of the implied author, and also how