Identifying Similarities and Differences in a Human – Human Interaction versus a Human – Robot Interaction to Support Modelling Service Robots

(1)

INOM

EXAMENSARBETE COMPUTER SCIENCE AND ENGINEERING,

AVANCERAD NIVÅ, 30 HP STOCKHOLM SVERIGE 2009,

Identifying Similarities and

Differences in a Human – Human Interaction versus a Human – Robot Interaction to Support Modelling Service Robots

FARRAH SAM

KTH

SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

(2)

Identifying Similarities and Differences in a Human – Human Scenario Interaction versus a Human – Robot Interaction to Support Modelling Service

Robots

(3)

Abstract

With the ongoing progress of research in robotics, computer vision and artificial intelligence, robots are becoming more complex, their functionalities are increasing and their abilities to solve particular problems get more efficient. For these robots to share with us our lives and environment, they should be able to move autonomously and be operated easily by users.

The main focus of this thesis is on the differences and similarities in a human to a human versus a human to a robot interaction in an office environment.

Experimental methods are used to identify these differences and similarities and arrive at an understanding about how users perceive robots and the robots’

abilities to help in the development of interactive service robots that are able to navigate and perform various tasks in a real life environment. A user study was conducted where 14 subjects were observed while presenting an office environment to a mobile robot and then to a person. The results from this study were that users used the same verbal phrases, hand gestures, gaze, etc. to present the environment to the robot versus a person but they emphasized more by identifying the different items to the robot .The subjects took less time to show a person around than the robot.

Keywords: Human robot interaction – human communication – hand gestures – human augmented mapping – service robots.

(4)

Sammanfattning

Genom forskning i robotik, datorseende och artificiell intelligens kommer robotar att bli mer och mer komplexa. Robotars funktionalitet ökar ständigt och deras kapacitet att lösa specifika problem blir mer effektiv. För att dessa robotar ska finnas i våra vardagsliv och vår miljö måste de kunna röra sig självständigt (autonomt) och de måste vara lätta att hantera för användare.

Detta examensarbete fokuserar på skillnader och likheter mellan interaktion människa - människa och robot - människa i en kontorsmiljö. Med hjälp av experimentell metod är det möjligt att upptäcka dessa skillnader och likheter och därmed förstå hur människor uppfattar robotar och deras förmåga. Detta kan bidra till utvecklingen av servicerobotar som kan navigera och utföra olika uppgifter i vardagslivet. En användarstudie utfördes med 14 försökspersoner som observerades medan de presenterade en kontorsmiljö både för en människa och för en robot. Resultatet av denna studie var att försökspersonerna använde samma typer av muntliga uttryck och handgester, blickar, osv. för att presentera miljön för en människa som för roboten. De uttryckte sig mer detaljerat för roboten när det gällde att identifiera olika föremål i miljön. Försökspersonerna behövde mer tid för att presentera miljön för roboten än en människa.

Nyckel ord: Människa-robot interaktion– mänsklig kommunikation– handgester – robotnavigering – servicerobotar.

(5)

Chapter1

Introduction

Robots will soon be of great value in many domains, due to the ongoing progress in research on artificial intelligence, computer vision, and robotics.

Some of the most promising applications of robots are within industry, military, hospital care, home care and entertainment. The idea of a general service robot is not as far fetched as many would like to think. In order for these robots to take part in users lives and share the environment, a mobile service robot needs to move within this environment from one location to another to provide its services and the users should be able to interact and operate the robot system easily.

Humans communicate naturally with each other through direct and indirect channels. Direct channels include speech, facial expressions and gesture.

Indirect channels are not under control by the sender, like emotions shown by the sender the interpretation of which are dependent on the receiver.

Understanding how users, ordinary people, communicate with robots in every day environments would help researchers design appropriate interactive robot interfaces that could be used by a variety of users in different situations.

This thesis presents an effort to identify the differences and similarities in a human to a human versus a human to a robot interaction scenario in an office environment.

1.1 Robots in General

Since the beginning of mankind, humans strive for new inventions to facilitate and improve their lives’ quality. The word robot which refers to a mechanical or virtual artificial agent was introduced by Karel Capek in his play R.U.R (Rossum’s universal robots), which premiered in 1921. The word’s origin is Czech, where robota means labor. Since then the research in robotics has come a long way. The idea is that robot would take over uncreative manual work or

1

(8)

CHAPTER 1. INTRODUCTION

difficult and dangerous work from humans. Using robots in industry resulted in mass production of goods with higher quality than would otherwise have been possible. Teleoperated robots are used in semi- structured environments such as undersea and nuclear facilities. They perform non- repetitive tasks and have limited real time control. Autonomous robots are used when it is no longer feasible to remotely operate the robotic system. These robots can do different tasks in unstructured environments and adapt to changing surroundings without continuous human intervention, e.g. when used in outer space where it takes a long time to send and receive signals form earth. The robot types that are mentioned above are designed to complete their tasks without large human interaction.

Recently, there has been an increasing demand on domestic robots such as service robots. These robots aim is to assist people in a real life environment.

Human robot interaction research plays major part in designing these robots.

1.2 Problem Definition

One can think of the scenario where a house owner needs domestic help. On the first encounter the house owner will show the house around to the aid and verify for her/him different locations and objects and give her/him the instructions that will be followed. If the aid is a robot, the robot has to be designed to operate together with humans and share their environments and it should be designed to be safe enough to coexist with humans and be easily operated by different categories of people, young, old, handicapped people, etc.

The general idea behind this thesis is to find the common differences and similarities in how a human to a human present a filmier environment versus a human to a robot in the same environment. This would the basis for a study that consists of observing how a person guides around another person presenting an office environment and how the same person guides around a robot presenting the same environment. The results from this work could help with the design of interactive service robots that easily can be operated even by persons who have never interacted with robots before.

2

(9)

1.3. OUTLINE OF THE REPORT

1.3 Outline of The Report

Chapter 1 contains an introduction to the problem.

Chapter 2 contains an overview of the background of the problem with description of the fields that have the most contribution to it.

Chapter 3 describes the study that was made and the chosen methods.

Chapter 4 provides the results of the study.

Chapter 5 provides observations made during the study.

Chapter 6 discusses the results, the differences and similarities in a human to a human versus a human to a robot interaction in an office environment.

Chapter 7 contains a conclusion of the work based on the gained results.

3

(10)

Chapter2

Background

The idea for this study had come up during work with the licentiate thesis

“Initial steps toward human augmented mapping” (Topp, 2006). The idea behind human augmented mapping is that a human and a robot interact to make an association between how humans perceive their environment and what the robot learns as a map¹, see figure 2.1. Robotic maps are often metric where as humans adopt a topological way of presenting the environment.

Figure 2.1 shows how the robot sees the environment with data from a laser range finder (Topp, 2006)

1 A mobile robot can move within an unknown environment while at the same time creating and updating a map of the area, this is achieved by simultaneous localization and mapping (SLAM) (Folkesson, 2005) technique.

SLAM refers in mobile robotics to the process of creating geometrically accurate maps of the environment (see figure 2.1). This map gives the robot the ability to know its location in terms of geometrically defined positions or coordinates. SLAM can use many different types of sensor to obtain data to use in building the map such as laser range finders, cameras and sonar sensors. You may think that one night you are sitting comfortably at your house and suddenly there is a blackout. You move around blindly you may touch objects and walls so that you will not get lost. Identifying the things that you have felt through touch, like a doorframe may help you estimate of where you are. Sensors are the robot’s feeling of touch. The distance of the reflection of the laser beams of walls in the room will help in building a map (XY) of the room that will be updated as the robot moves.

4

(11)

CHAPTER 2. BACKGROUND

Humans have their own personal preferences and terms for the entities in the environment. The robot’s first task is to learn the environment as humans do, sharing a common concept like “the kitchen”. For example a user wants the robot to clean “the kitchen”. The robot has to be able to go to “the kitchen”, according to the users perception of “the kitchen”. By integrating the user into the robot mapping process, the result map will contain the user personal preferences, see figure 2.2. While both the user and the robot are in the

“kitchen”, the user can label the map by saying, “this is the kitchen” to the robot.

Figure 2.2 shows how the robot can see the environment with help of a user.

Through this thesis the need came up to comprehend, how people perceive service robots. Will they use the same presentation strategies to instruct a human aid in their homes, workplace etc.? Such an investigation might give an idea about the mental model users have of service robots and their abilities, which can help researchers to design robot interfaces that are easy to operate by users.

2.1 Human Communication

The old methods of interaction with robots are entering commands through keyboard, mouse, etc. As robots get more complex, their functionality increases, and they become capable of doing multiple tasks, it becomes essential for users to interact with robots using their natural methods of communication. Humans communicate with each other by direct and indirect channels. The direct channels, which are easily recognised by the receiver, include both verbal and non-verbal means. Verbal communications are those that use words such as written or spoken communications. The non-verbal communications use gesture,

5

(12)

facial expressions and other bodily movements and they also include the use of colours, lights, sounds etc. The indirect channels are those that are recognised subconsciously by the receivers and not under control by the senders in the sense of how receivers comprehend or feel those signals from senders (Michael Tomasello, 2006). One type of non-verbal communications is gestures. Gestures are bodily movements that coincide with speech (McNeill, 1992). People unwittingly produce gestures along with their speech during communication.

They can give a clue about the content of the speech and at times, they clarify the reasoning process that the speaker cannot articulate. People use gestures even when they cannot see the listeners. They even produce gestures while talking on the telephone (Rime, 1982). Although the speaker at times hesitates and makes errors, gesturing almost always reflects the speakers’ intentions.

Gestures could be classified as follows:

• Conscious gestures include the emblematic gestures. These are culturally related such as the victory sign and they vary from one culture to another (Kendon, 1997). Also they include prepositional gestures, as when one uses his hands to measure the size of a symbolic space. They also include the speaker pointing to an object and asking for change of its location (Bolt, 1987). These conscious gestures constitute the minority of gestures.

• Unconscious, unwitting or unplanned gestures are the majority. They include the following (Cassell, 1998):

1. Iconic gestures: these depict by the form some features of the described event. For example only with gesture can the narrator show the user how the handle of a calk gun is manipulated.

2. Metaphoric gestures: these are representational but what they represent has no physical form e.g. when someone rolls his hands while saying the actions continued for a long time.

3. Deictic: these populate the space between the speaker and listener with discourse entities. They do not have to be pointing an index finger; the speaker could use the whole hand to represent entities, ideas or events in space.

6

(13)

CHAPTER 2. BACKGROUND

4. Beat gestures: small baton like movements that do not change in form with the content of the speech as one moves his hand up and down while he is saying that he is the first speaker to be followed by another speaker.

However not all researchers agree on the types and the names of different gestures, neither on their value. (Krauss et al., 2001) named emblematic as symbolic gestures. They also included deictic gesture and what they called motor gestures that consist of simple repetitive rhythmic movements that bear no definite relation to the content of accompanying speech. They are the same as beat gestures mentioned above. Finally they used the term lexical gestures; those are not easily defined. They include gestures that do not fit the description of the above three types i.e., the symbolic, deictic and motor gestures. They proposed that lexical gestures accompany speech and appear to bear no relation to this semantic content. They describe ideas for this reason some call them ideational, other representational, gestures still others “illustrators”. The lexical gestures vary in length and are non repetitive. Restriction of gesturing might adversely affect speech. They proposed that lexical gesturing and speech operate in concert. The researchers believe that contribution of gestures communication has been overstated.

Gestures are found in every culture but what is universal, are the types not the shapes. Humans communicating with a computer or a robot could use these gestures. They might replace the keyboard, the mouse or the speech as a direct command language (Cassell, 1998).

2.2 Human Robot Interaction

Human robot interaction (HRI) is a rather new research field that studies interactions between people (users) and robots. HRI is multidisciplinary with contributions from the fields of human-computer interaction, artificial intelligence, robotics, natural language understanding, and social science (psychology, cognitive science, anthropology, and human factors).

The target of HRI is to develop principles and algorithms to allow more natural and effective communication and interaction between humans and robots.

7

(14)

HRI is based on studies about how humans collaborate and interact and use those studies to motivate how robots should interact with humans².

2.2.1 Human Computer Interaction and HRI

Human computer interaction is a research field based on the study of interaction between people (users) and computers. It is a combination of computer science, behavioural sciences, design and several other fields of study. This research field is one of the contributor fields to HRI.

There is fundamental difference between HCI and HRI. HRI concerns systems that have complex, autonomous and dynamic control systems. These systems operate in real world environment non-static and changing (Scholtz, 2003).

The evaluation methods from HCI that can be adapted for HRI are inspection methods, empirical methods and formal methods. The empirical evaluations are more used in HRI because they involve the users performing typical tasks in as realistic an environment as possible (Kiesler et al., 2004).

Kiesler et al., (2004) has pointed out the following three aspects that differentiate autonomous robots from other computer technologies.

• Users perceive autonomous robots differently from other computer technologies (Friedman et al., 2003). Peoples mental models of autonomous robots are often more anthropomorphic than are their models of other computer systems. There are two factors that explain this:

1. The role of science fiction in media books etc.

2. The powerful impact of autonomous movement on perception.

• In the future, robots will be fully mobile and this will bring them into physical proximity with people, other robots, objects, etc. in real life

2 http://www.ro-man.org/

8

(15)

CHAPTER 2. BACKGROUND environments. In some cases a complex feedback system is required so that the users could be able to help the robot to act from a distance.

• Robots have to learn about themselves and their environments. They should have at least some control over the information that they process and the action they emit. Computer agents in desktop, automotive applications, etc. make decisions and the functionality of these agents is increasing rapidly, but the robotic system has to make decisions that take into consideration the safety of the users and the robot, detect and respond to the changes in the environments and its users, etc.

2.2.2 Human Interaction With Service Robots

Service robots can assist humans in different settings whether at home or office work etc. Recently and currently there are many research projects in HRI trying to develop a service robot that operates in the co- presence of humans.

HRI research includes many aspects such as the evaluation of robot behaviours and designing them so that they appear as natural as possible for humans (social robots). Research other than that what dealing with service robots is not included in this chapter since the purpose of this thesis is to help facilitate human robot interaction during the presentation of an environment. The main focus in this section will be on studies that were conducted with service robots in an office environment.

The following are some studies that are relevant to the presented study:

A pilot study by (Topp et al., 2006) investigated users who presented a familiar environment to them to a service robot. The aim of the study was to use and validate a proposed generic environment for a service robot. The researcher modelled the environment by using a hierarchy of graphs to incorporate locations and regions. Locations are specific positions that can represent the position of large objects that are considered static, for example a table or a countersink. Regions are any portions of space large enough to allow for different locations in it, such as rooms or hallways.

9

(16)

Green et al., (2006a) describes the development process of a contextualized corpus for HRI. Two user studies were conducted in a scenario called home tour, where the users showed a single room in the first study and a whole floor in the second study to a robot using combined speech and gestures. The aim of these studies was to support for the development of a cognitive robot.

Green et al., (2006b) describes and discusses how Hi Fi simulation methods or Wizard of Oz can be employed to develop natural language user interfaces for robots with cognitive capabilities. Data from a Hi Fi simulation study can be analyzed with methods from psychology, linguistics, etc and visualization of the human robot interaction and assessment of the users attitudes towards robots will enable the designers to conceptualize what the system could or should do in real situations. The aim was that a new system would be developed to replace the Hi Fi simulation with real components. In this study the Hi Fi simulation methods were taken to inconsideration.

Huttenrauch et al., (2006a) investigated spatial positioning and interaction episodes in HRI. The study involved subjects that showed a robot a living room setting and taught it new places and objects. The study focused on the transitions between interaction episodes by investigating how users organize the task of showing a new environment to a robot.

Huttenrauch et al., (2006b) investigated the spatial relationships between a human user and a mobile robot. In the scenario of the study a user introduces her surroundings to a newly bought service robot by showing it around to learn relevant places and objects so that the robot could do the tasks on its own. The aim was to improve the design of the robots’ behaviour strategies regarding spatial management.

10

(17)

Chapter 3 The study

The study to be described in this chapter was set up to find out about differences and similarities in strategies that humans might use when presenting an office environment to another human versus a robot.

The questions quilting the study were:

• What is the presentation strategy of a human to a human versus a human to a robot?

• The ways of presenting the surroundings from a human to a human versus a human to a robot, are they the same?

• How long is the time required to show an environment for a human to a human versus a human to a robot?

• What are the anomalies or the unexpected in these forms of presentation if there are any?

Answers to these questions might help us to find if humans treat service robots as individuals. Further, it is not clear if their treatment of a robot as an individual is the same as the treatment of a human. This would give us an understanding about the mental model users have of service robots and their abilities and how this will implicate the design of underlying models and interaction capabilities.

For the study a proposed hierarchy of graphs model to incorporate location, objects and regions for a service robot was used to help in the evaluation of the presentation of the environment to a robot or a person.

The following list explains regions, locations, and objects as defined by (Topp 2006) see also section 2.2.2.

A Location is the area from where a large, not as a whole manipulated object is reachable/ visible e.g. (sofa, fridge, pigeon-holes).

An Object is a small object that can be manipulated e.g. (cup, plate, remote control).

11

(18)

A Region is a container for one or several locations. It offers enough space to navigate (rooms, corridors, delimited areas in hallways).

3.1 Scenario

Figure 3.1: the floor plan of the office environment on which the experiments took place.

The scenario of the study was that an office employee shows around a portion of an office building for the first time to a newly employed cleaning aid. Figure 3.1 shows the floor plan where the trials were conducted, the floor plan with offices marked Elin A. Topp’s office, the computer vision laboratory, the hallway, the bathroom and the kitchen with some of their contents (see appendix A), the conference room, which was discarded after the pilot study (see the pilot study section). The subjects were instructed to first show a person around in the environment, then show the same environment to the robot or vice versa, in a way so that the person or robot could come later and perform a number of service tasks. In order to do this it was necessary for the subjects that they had seen the environment already.

3.2 Method

The following section explains the selection of subjects, the instructions given to them, and the methods used for data collection.

12

(19)

3.2 METHOD

3.2.1 Subjects

As important precondition to both the pilot study and the experimental study was that the subjects should have no significant knowledge about robotics. This assumption was made because the user will probably not have large knowledge or none about robotics, when they purchase a service robot. The experiments took place on a floor of a building at the KTH university campus. The recruited subjects were KTH students who had not studied robotics. The subjects’ age was between 20 and 29 years.

The subjects were divided into two groups. The first group were the subjects who presented the environment to a person then to the robot or vice versa.

The second group were the persons who were shown around, and acted as a cleaning aid. They would be referred to as participants but in general as a person. For example “the subject showed around a person”.

Females Males Total

The Pilot study subjects 1 1 2

The main study subjects 7 7 14

The main study participants 3 4 7

Total 11 12 23

Table 3.1: Shows the distribution of the subjects

All but one subject were asked if they have previous knowledge about robotics and they all denied. One subject is a PhD student and has some knowledge of robotics and this factor was not significant because she was only shown around.

The subjects and participants had different ethnics backgrounds. They were from Bangladesh, France, Germany, Greece, Iran, Iraq, Kazakhstan, “Palestine”, Russia, Sweden, and Twain. All subjects were given a cinema ticket for the participation in the study.

13

(20)

3.2.2 Instructions

On arrival at the experiment the subjects were given an instruction sheet (see appendix A) that explained the task and the functionalities and abilities of the robot. The participants were simply being shown around and they must not know anything about the place of the experiment so they were given verbal instructions only. All the subjects were told that they had the right to abort the experiment at their leisure. The experiments leader and the advisor Elin Topp were available for help at any time.

The initial task for the subjects was to go around in the environment and get familiar with it. After they got familiar with the environment they were asked to show a person around. Then the robot was shown around. The estimated time needed to show the robot around was approximately 15 minutes but not more.

There was no time limit for the first to parts of the experiment, however if the 15 minutes time limit to show the robot around was exceeded the subjects were asked to abort the experiment.

The subjects were free to choose from where to start and finish their task. They were free in their conduct and the way they talked to the participant or the robot.

The participants were also given freedom to talk to the subjects during the experiment. When the subjects had completed their task they were interviewed regarding their behaviour and thoughts. The interview was estimated to last for less than five minutes. The interview was semi structured (see appendix B).

The subjects knew that the robot was following them autonomously, however they were not given any commands regarding how to interact with the robot, so that the subjects’ actual perception of the robot and its abilities could be observed (Scholtz et al., 2003). When they started showing the robot around the subjects were given commands (follow me, stop, turn left, turn right) to control the movement of the robot only if they demanded. The subjects also were given general information about the robot to avoid problems during the interaction time (see appendix A).

14

(21)

3.2 METHOD

Figure 3.2 shows the robot that was used in the experiments.

The robot used for this study was a performance PeopleBot commercially available by MobileRobots (see figure 3.2 a). The robot is 1.1m tall, and has a laser range finder (the blue part on the lower platform). The laser range finder is a sensor device (like the “eye” of the robot) by which it finds its way. A person who is standing behind the laser ranger finder cannot be detected because she or he is actually standing behind the baseline of the laser range finder (see figure 3.2 b).

The maximum field of view of the laser range finder is slightly more than three meters. A person cannot be detected outside the field of view.

The subjects got the explanation that in order to be detected, they had to move a few steps in front of the robot and that the minimum distance between them and the robot should be one meter. The robot would come a bit closer before it stops.

The subjects were instructed not to walk too fast so that the robot would lose track. The maximum speed of the robot must be reduced in approaching passing doors, other narrow passages and cluttered areas. By reducing the speed, it would avoid colliding with anything such as users or a doorframe and it would need a while to go through a door or a narrow passage.

15

(22)

3.2.3 Technical Realisation

The robot navigated autonomously when it was following a user but if the users safety were compromised, the robots’ navigation would be switched to remote control immediately. The system was controlled by a graphical user interface and it was not possible to use a speech recognition system. The advisor Elin Topp simulated the dialogue system. The subjects’ utterances were interpreted into commands and fed manually into the interface. The advisor decided the robots’ behavioural strategies. There were two different behavioural strategies.

If a location or object was presented, the robot did not move and stated immediately, that it stored the given information. If a region was presented, the robot stated, that it needed to have a look around and performed a 360º turn before confirming the information.

3.2.4 Data Collection

There were 14 trials. In each trial the subject presented his/her surroundings to a person and the robot. The experiment leader recorded the trials with the help of a digital video camera. All the subjects were interviewed after finishing the experiment. The interviews were recorded. Due to battery problems with the digital camera, the subjects for trials 1-8 were asked to repeat the interviews but in a shorter version. In trials 9-14 the digital camera was changed but the same shorter version of the interview was applied. Time was calculated using a computer watch.

3.2.5 Pilot Study

The pilot study was conducted a week before the study to make sure that the experimental setting, instruction to the subjects and data collection method were satisfactory. After the pilot study one change was made, namely that one of the regions and its contents (a safe, a table and chairs) was discarded because it took the subjects too long time to go through six regions.

16

(23)

3.3 HYPOTHESES

3.3 Hypotheses

The study was set up to investigate the differences and the similarities in the presentation of an environment to a human or a mobile robot by a human and whether the same verbal communications and non-verbal communication is used for the robot versus a person. The assumption was made that subjects would not consider the robot able to comprehend what they were presenting to it in a controlled environment. Two hypotheses were made to test whether humans would treat the robot as an individual and if they would present the environment to the robot as they would present it to humans.

H1: Humans present the environment thoroughly to a robot but less thoroughly to the other humans.

H2: Humans take less time presenting the environment to other humans than presenting it to the robot.

3.4 Evaluation

Every experiment was divided into two parts, a human to a human interaction and a human to a robot interaction. In the analysis, each part was divided into the verbal communications that were used to present regions, locations or objects and non-verbal communications that used to present locations or objects.

Non- verbal communications were evaluated for hand gestures and gaze. A comparison was made between how the subjects presented regions, location or objects to the robot and a person. The time that the subjects took to show the robot or a person around was measured.

17

(24)

Chapter 4 Results From The Study

In this section the results from the study are presented. The study consisted of 14 trials, which gives a reasonable amount of data to analyse in terms of occurrence of different phenomena. The observations and the answers obtained in short interviews allowed us to investigate also how subjects reasoned about their experience regarding presenting an environment for a person/ robot.

4.1 Results Regarding Presentation Strategies

The subjects used verbal and non-verbal ways to present the regions, locations or objects to the robot or a person.

4.1.1 Verbal Communications

Verbal communications were used to identify and present regions, locations or objects, taking into account that both the presenter and the receiver speak the same language. For example person A shows person B her/his kitchen that contains a table and dishes beside other artefacts, after a while person A could ask person B to fetch the dish that lies on the table. The same thing can be done with a robot³.

4.1.1.1 Presentation of Regions

All the subjects except one, when they showed either the robot or a person the regions, used the phrases like “this is the computer vision lab” or “here is the hallway” to present to the robot or a person. Only one subject said to the robot

18

3 The user could use natural language to command the robot.The user spoken utterance will be parsed and nterpreted to a text, checked the class of verb and status of goal then translate into commands ( Perzanowski et al.,1999)

(25)

CHAPTER 4. RESULTS FROM THE STUDY or a person, “see around this or that region”. When the subjects presented regions before entering them they said for example “we are going to the kitchen” or “we continue to the kitchen”.

The way the subjects showed the different regions to the robot or a person is shown in tables 4.1.

Every subject could present five regions, so the 14 subjects could present 70 regions. The subjects entered 64 regions (91.4%) of the total with the robot and 66 (94.3%) with a person respectively.

The subjects presented 50 regions (71.4%) of the total to the robot and 48 (68.8%) to a person. They presented the regions while they were inside them 47 times. Of these 25 (35.7%) were to the robot and 22(31.4%) to a person and before entering them 34 times, of these 12 (17.1%) to the robot and 22 (31.4%) to a person.

In 17 times, the subjects presented the regions both before and after entering them, of these 13 (18.6%) to the robot and four (6%) to a person. The chi square test showed a statistically significant difference between the regions labelled after being entered and before being entered (chi square test = 4.85, DF =1, p- value = 0.0275).

The subjects entered the regions without presenting them 32 times. Of these 14 (20%) were to the robot and 18 (25.6%) to a person.

Eight of the subjects entered all the regions (see appendix A) with the robot or a person, seven of the subjects did not mention all the regions either to a person or the robot. Four subjects did not enter all the regions because the 15 minutes time limit to show the robot around was over. The remaining two did not present the bathroom for either the robot or a person.

For more details see appendix C.

19

(26)

The robot A person

Region Presented After

entering

Presented before and after entering

Presented before entering

Presented After entering

Presented before and after entering

Presented before entering Computer

Vision

Lab 6 1 1 7 0 2

Hallway 6 - 3 6 1 2

Bathroom 1 7 3 2 1 7

Kitchen 7 4 1 4 1 7

Elin A.

Topp Office 5 1 4 3 1 4

Sum: 25 13 12 22 4 22

Average per

subject: ≈2 ≈1 ≈1 ≈2 ≈0 ≈2

Table 4.1: Shows how the subjects showed the different regions to the robot and a person.

4.1.1.2 Presentation of Locations

The subjects labelled the locations by saying, “this is a table”, “ this is a chair “, etc regarding other locations or “we got table”, we have chair”. Some subjects immediately defined the locations and gave instructions like “this is a table, clean it”, etc. on entrance. Some defined them only without giving instructions, others started by giving instructions without defining locations like “clean the table”, “move the chairs ”. See table 4.2.

The subjects could present 196 locations in total to the robot or a person i.e.392 for both. Of those the locations were presented 292 times. These were divided into defined and not defined. 175 were defined and 117 not defined. Of the defined, they were 98 (50%) to the robot and 77 (39.3%) to a person. The undefined were 51(26%) to the robot and 66 (33.7%) to a person. There is statistically significant difference (chi square test = 4.32, DF=1, p-value = 0.0376) between the defined and undefined locations comparing the robot to a person.

20

(27)

CHAPTER 4. RESULTS FROM THE STUDY

The robot A person

Region Location Defining only

Defining and giving

instructions

Giving instructions only

Defining only

Defining and giving

instructions

Giving instructions only Computer

vision lab Chairs

3 5 5 2 3 8

Computer

vision lab Computer

Desks 2 5 5 4 0 7

Hallway

Printer desk

3 6 4 2 6 3

Hallway Printer 3 7 3 5 3 4

Bathroom Toilet 4 4 3 4 2 6

Bathroom Sink 5 4 3 3 2 6

Kitchen

Sink counter

7 4 4 1 6 4

Kitchen Tables 1 4 4 2 4 6

Kitchen Chairs 4 3 5 3 4 5

Elin A.

Topp office

Clothes

hanger 4 2 4 3 2 4

Elin A.

Topp office

Little table

2 3 3 2 2 5

Elin A.

Topp office Chair 2 1 3 2 2 3

Elin A.

Topp office Computer

Desk 5 1 2 2 3 1

Elin A.

Topp office Office chair

2 2 3 2 3 4

Sum: 47 51 51 35 42 66

Average per subject

≈3 ≈4 ≈4 ≈3 ≈3 ≈5

Table 4.2: Shows locations and how the subjects presented the locations to the robot and a person in the different regions.

4.1.1.3 Presentation of Objects

The subjects labelled the objects by either saying “this is TV remote control”,

”this is a flowerpot”, etc or just “TV remote control”, “flowerpot”, etc. Some immediately on entering defined the objects and gave instructions like “ this is the flowerpot, put the flowerpot on the table”. Some defined them only without giving instructions others started by giving instructions without defining objects like “make sure the dishes are clean”, “put the flowerpot on the table ” etc. see table 4.3.

21

(28)

The subjects could present 56 objects in total to the robot or a person i.e. total is 112. Objects were presented 76 times. Of these 32 were defined and 44 were not defined. The defined were 20(35.7%) to the robot and 12(21.4%) to a person.

The undefined were 44 of these 18 (32.14%) to the robot and 26 (46.42%) to a person. There is not quite statistically significant difference (chi square test = 3.45, DF=1, p-value= 0.0631) between the defined and undefined objects comparing the robot to a person.

The robot A person

Region Objects

Defining only

Defining and giving instructions

Defining only

Defining and giving instructions

Hallway Paper 2 1 8 0 4 6

Kitchen TV remote

control 3 5 2 1 3 7

Kitchen Dishes 2 3 7 1 2 9

Eiln Topp

office Flowerpot 1 3 1 0 1 4

Sum: 8 12 18 2 10 26

Average per

subject ≈1 ≈1 ≈1 ≈0 ≈1 ≈2

Table 4.3: shows how the subjects presented the objects to the robot and a person in different regions.

4.1.2 Non-Verbal Communications

Non-verbal communications were used to refer to the locations or objects that were presented. During presentation there were two distinctive gestures accompanied verbal communications: gaze and hand gestures.

22

(29)

4.1.2.1 Hand Gestures

All the subjects used pointing and touch to present the locations or objects. For example the subject points toward a table so that the person that is being shown around could be able to specify it from other locations or objects in the kitchen during presentation. For the robot the same thing could be applied⁴.

The hand gestures that were observed in the trials could be divided in to four categories see figure 4.1

• Pointing from near distance: the subjects pointed directly at a location or an object from near distance.

• Pointing from far distance: the subjects pointed directly at a location or an object from a far distance. For example if a dish was on a table it is hard to determine if s/he meant a dish or a table.

• Pointing haphazardly: The subject pointed not directly at a location or an object or moved her/his hand in a semicircle if s/he presents multiple locations like “computer desks”.

• Touch: the subjects touched a location, touched or grasped an object.

4 As described in the background and method chapters a laser rangefinder is the eye of the robot that enables it to find its way. A camera that is fitted with an optical filter that is adjusted to the frequency of the laser could be attached above the laser (Sofge et al., 2003). The camera registers the reflection of the laser light off of a person, location, object, etc. in the region and builds a depth map (XY) based upon location and pixel intensity. Consider that the data points for bright pixels are clustered and interpreted to be a hand. This hand’s locations from different shots and the positions of the hand will be compared to other hand gestures already stored to determine if it is valid. The recognized hand gesture combined with verbal command could present a specific location or object.

23

(30)

a) c)

b) d)

Figure 4.1: the figures illustrates the points above a) the picture shows a subject pointing directly to the sink from a near distance, b) the picture shows a subject point directly to the table from a far distance, c) the picture shows a subject point haphazardly while presenting a table and d) the picture shows a subject touching a table.

4.1.2.1.1 Locations

The subjects used hand gestures in the form of pointing at or touching the locations, which were accompanying the verbal presentation. Tables 4.4, 4.5 show the different hand gestures that were used for the defined and undefined locations.

Hand gestures were used 260 times, 139 to the robot and 121 to a person.

Pointing was used 201 times, 100(38.6%) to the robot and 101(39%) to a person.

Touching was used 59 times, 39(15.1%) to the robot and 20(7.7%) to a person.

There is statistically significant difference (chi test = 4.9, DF=1 p-value = 0.0268) comparing pointing to touching for both the robot and a person.

24

(31)

Point from a near distance

Point from a far distance

Point

haphazardly

Touch Region Location

P M P M P M P M Computer

vision lab Chairs

1 5 - - 2 4 1 -

Computer

vision lab Computer

Desks 1 5 - - 2 5 1 -

Hallway

Printer desk

3 1 - - - - 5 2

Hallway Printer 2 1 1 1 - - 5 1

Bathroom Toilet 7 2 - 1 1 - - -

Bathroom Sink 4 2 - 1 - 1 3 -

Kitchen

Sink counter

4 2 1 - - 2 4 -

Kitchen Tables - 2 1 1 - 1 3 -

Kitchen Chairs 3 2 1 1 1 - 4 -

Elin A. Topp office

Clothes hanger

3 3 - 1 1 - 2 1

Elin A. Topp office

Little table

3 3 - - - 1 2 -

Elin A. Topp

office Chair 1 2 - - - 1 1 -

Elin A. Topp office

Computer

Desk 2 1 - - - 1 2 -

Elin A. Topp office

Office chair

- - - - - 2 2 -

Sum: 34 31 4 6 7 18 35 4

Average per

subject ≈2 ≈2 ≈0 ≈0 ≈1 ≈1 ≈2 ≈0

P: The subjects defined the locations or defined and gave instructions during presentation of locations.

M: The subjects gave only instructions during presentation of locations

Table 4.4: Shows how the subjects pointed to and touched locations during presentation of locations to the robot in different regions

25

(32)

Point haphazardly

Touch Region Location

P M P M P M P M Computer

vision lab

Chairs

2 - 1 - 1 5 - - Computer

vision lab

Computer

Desks 2 - - 2 1 4 1 - Hallway

Printer desk

2 2 - 1 1 - 3 - Hallway Printer 4 - - - 1 1 3 1 Bathroom Toilet 2 4 - 1 - - 1 - Bathroom Sink 2 5 - 1 - - - - Kitchen

Sink counter

3 1 1 1 1 1 1 - Kitchen Tables 2 - 2 1 1 2 1 1 Kitchen Chairs 1 - 2 1 - 2 - 1 Elin A.

Toppoffice

Clothes

hanger 7 1 - - - - 3 - Elin A.

Toppoffice

Little table

6 5 - 1 - - - - Elin A.

Toppoffice Chair 3 1 - 1 1 - - - Elin A.

Toppoffice

Computer

Desk 4 - - 1 1 - 1 - Elin A.

Toppoffice

Office chair

1 1 - 1 - - 2 1

Sum: 41 20 5 12 8 15 16 4 Average per

subject

≈3 ≈1 ≈0 ≈1 ≈1 ≈1 ≈1 ≈0 P: The subjects defined the locations or defined and gave instructions during presentation of locations.

Table 4.5: Shows how the subjects pointed to and touched locations during presentation of locations to a person in different regions.

4.1.2.1.2 Objects

The subjects used hand gestures in the form of pointing at or touching the objects, which were accompanying the verbal presentation. Tables 4.6, 4.7 show the different hand gestures that were used for the defined and undefined objects.

26

(33)

CHAPTER 4. RESULTS FROM THE STUDY Hand gestures were used 46 times, 23 to the robot and 23 to a person. Pointing

was used 27 times, 11(24%) to the robot and 16(35%) to a person. Touching was used 18 times, 12(26.1%) to the robot and 7(15.2%) to a person. There is no statistical significant difference (chi test = 2.24, DF=1 p-value = 0.1342) comparing pointing to touching for both the robot and a person.

Point from a

near distance Point from a far distance Point

haphazardly Touch or grasp Region Objects

P M P M P M P M

Hallway Paper 1 1 - - - - 1 1

Kitchen TV remote

control - 1 1 1 - - 6 -

Kitchen Dishes - 1 1 - - - 3 -

Elin A.Topp

office Flowerpot 2 1 - 1 - - - -

Sum: 3 4 2 2 0 0 11 1

Average per

subject ≈0 ≈0 ≈0 ≈0 0 0 ≈1 ≈0

P: The subjects defined the objects or defined and gave instructions during presentation of locations.

M: The subjects gave only instructions during presentation of objects

Table 4.6: Shows how the subjects pointed to and touch objects during presentation of objects to the robot in different regions.

Point haphazardly

Touch or grasp Region Object

P M P M P M P M

Hallway Paper - - 1 - - - 3 1

Kitchen

TV remote

control 1 1 4 1 - - 2 -

Kitchen Dishes - - 2 - 1 - 1 -

Elin A. Topp

office Flowerpot 1 2 - 2 - - - -

Sum: 2 3 7 3 1 0 6 1

Average per

subject ≈0 ≈0 ≈1 ≈0 ≈0 0 ≈0 ≈0

P: The subjects defined the locations or defined and gave instructions during presentation of locations.

Table 4.7: Shows how the subjects pointed to and touched objects during presentation of objects to a person in different regions.

27

(34)

4.1.2.2 Gaze

All the subjects gazed at the locations or the objects when they presented them to the robot or a person. Gazing indicates that subjects were interested in the location or the object they were presenting.

4.2 Presentation time

The time needed to present the environment to the robot was in average 6:15 minutes noting that the time needed by the robot to go from one room to another was not included. The time needed to present to a person was 2:48 minutes.

The difference is considered extremely statistically significant (t-test = 6.0562, DF = 26, p-value < 0.0001) comparing the robot or a person.

The table below shows that the subjects took longer time to show the robot around compared to a person. Noting that the time taken by the robot to navigate between the regions was omitted, however the time needed to show a person around did not include any omissions.

Trials Time took to show around the robot in

minutes Time took to show around a person in

minutes

Trial1 4:05 3:35

Trial2 3:05 2:57

Trial3 8:00 2:00

Trial4 2:20 2:07

Trial5 5:38 3:00

Trial6 5:20 2:00

Trial7 7:10 1:01

Trial8 6:50 3:00

Trial9 5:20 3:09

Trial10 9:50 3:58

Trial11 5:55 2:24

Trial12 8:00 2:42

Trial13 7:10 3:01

Trial14 7:23 2:40

Sum: 86:06 37:34

Average per

Subject: 6:15 2:48

Table 4.8: Shows how long subjects took to show around the robot or a person

28

(35)

Chapter 5 Observations

5.1 General observations

In addition to the results a few observations could be made about the trials. First, the robot said for every subject “Hello my name is Minnie, Please show me around”, nine of the subjects did not answer. Three (one female and two males) of the subjects answered by saying “hello Minnie” and another two (females) just “hello or Swedish hej“ in a low voice. The greeting between the subjects and participants could not be taken into account because some of the subjects and participants knew each other beforehand and they came to the experiment site together.

As the subjects were of different ethnic backgrounds and none of them is a native English speaker, there were noticeable difficulties with the language during presentation for the robot or a person. The robot gave the verbal outcome “stored information about location or object (computer disk, office chair, etc.) ”. Six (five female, one male) of the subjects had problem with understanding this outcome. The subjects used the phrase “this is <location>

(computer disk, office chair, etc.)” while presenting a location to the robot or a person. Some made grammatical mistakes by saying for example” this is tables”

or “this is chairs” instead of saying “these are tables” or “these are chairs”.

The subjects tried to follow the instruction sheet regarding the names of regions, locations or objects “office chair, little table, etc. ” (see appendix A) but all of them used in a way or another their descriptive names like “white table, blue chair, etc.” to present locations to the robot or a person. This is also true for the regions and the objects.

During the interview eight subjects used “he” referring to the robot instead of it.

This was not taken inconsideration as in their native languages such as French, Arabic, etc. In these languages there is no different pronoun for inanimate.

The subjects did not wait for confirmation from the participant after presenting regions, locations or objects, however the subjects waited for confirmation from the robot in various degrees. Three of the subjects said, “it is good”, “good” or

29

Identifying Similarities and Differences in a Human – Human Interaction versus a Human – Robot Interaction to Support Modelling Service Robots

Identifying Similarities and

Differences in a Human – Human Interaction versus a Human – Robot Interaction to Support Modelling Service Robots

FARRAH SAM

Identifying Similarities and Differences in a Human – Human Scenario Interaction versus a Human – Robot Interaction to Support Modelling Service

Robots

Abstract

Sammanfattning

Contents

Chapter1

Introduction

1.1 Robots in General

1.2 Problem Definition

1.3 Outline of The Report

Chapter2

Background

2.1 Human Communication

2.2 Human Robot Interaction

Chapter 3

The study

3.1 Scenario

3.2 Method

3.3 Hypotheses

3.4 Evaluation

Chapter 4

Results From The Study

4.1 Results Regarding Presentation Strategies

Chapter 5

Observations

5.1 General observations