Exploring interaction with unmanned systems : A case study regarding interaction with an autonomous unmanned vehicle

(1)

Linköping University | Department of Computer and Information Science Bachelor Thesis, 18 hp | Cognitive Science Spring term 2020 | LIU-IDA/KOGVET-G--20/011--SE

Exploring interaction with

unmanned systems

A case study regarding interaction with an

autonomous unmanned vehicle

Jesper Pettersson

Supervisor, Björn Johansson (LiU); Ove Jansson (FOI) Examiner, Peter Berggren

(2)

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional

circumstances.

The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Abstract

Unmanned systems (US:s) of various kinds are becoming more and more sought after resources as they do not have an onboard driver, and thus, can be employed in situations that are either dangerous or impossible for humans to engage in. Consequently, US:s are

envisioned to play a large role both in civilian and military contexts in the future, which presents new challenges regarding how humans should interact with these new systems. The purpose with this thesis was to explore different ways of interaction between human and artificial agents and how interfaces of autonomous systems supports this interaction. To investigate this, a literature overview of previous research regarding various ways of interacting with unmanned systems was conducted. This illustrated that a multimodal interface offers a more robust and natural form of interaction compared to fixed interaction principles that have their advantages and disadvantages depending on both context and situation. Moreover, a case study was conducted to explore human-autonomy interaction in a realistic battle mission, simulated in Virtual Battle Simulator 3. The results from the study indicate that speech is an essential mode of communication for controlling an unmanned autonomous ground vehicle in a mounted setting. Furthermore, problems were identified with the visual and auditory feedback from the unmanned vehicle in which verbal feedback was identified as a possible solution. Experience regarding both the simulation environment and as a commander of mechanised units was also identified as an important factor to adhere to in future studies.

(4)

(5)

Acknowledgement

I would like to thank my supervisors Björn Johansson and Ove Jansson for providing valuable tips, discussions and general support throughout the project; the people of the MTO

department at FOI; and lastly, my wonderful sambo Cornelia Böhm who provided me with advice and encouragement when I needed it the most.

Linköping in June 2020

(6)

(7)

Definitions and abbreviations

BMS Battle Management System

BFT Blue Force Tracking

CL Cognitive Load

HAT Human Autonomy Teaming

HHO Heads-up, Hands-free Operation

IFV Infantry Fighting Vehicle

Interface A device or program enabling a user to

communicate with a computer

LOA Levels of Automation

MUM-T Manned Unmanned-Teaming

OCU Operator Control Unit

RPAS Remotely Piloted Aircraft System

SA Situation Awareness

TGCU Teleoperated Ground Combat Unit

UAV Unmanned Aerial Vehicle

UGV Unmanned Ground Vehicle

US Unmanned System

VBS3 Virtual Battle Simulator 3

(9)

1. Introduction

The presence of unmanned vehicles has drastically increased in both the military and civilian domain. These unmanned vehicles (UV:s) do not have an onboard human driver, as a result they are sought after in scenarios where human life might otherwise be at risk, as a cheap option for data collection, or where a human operator cannot exist physically. UV:s exists in many different shapes, sizes and categories such as Unmanned Aerial Vehicles (UAV:s),

Unmanned Ground Vehicles (UGV:s), Unmanned Sea Vehicles (USV:s) and Unmanned Spacecrafts (USC:s), to name a few. Within the Swedish Armed Forces (Swedish:

Försvarsmakten, [FM]), UGV:s have been implemented to sweep Anti-Personnel Landmines and Anti-Tank Mines, while UAV:s have mainly been implemented for short-term

reconnaissance during field missions and long-term information gathering used in planning and allocation of resources (Försvarsmakten, 2013; Försvarsmakten, 2015; Försvarsmakten, 2013). As technology advances, more and more of these unmanned vehicles develop

autonomous functions that encompasses everything from stabilising and collision avoidance functions for smooth flights, to tracking and reporting of targets.

As autonomous functions develop, so does the need for developing ways to interact with autonomous systems, ranging from simply controlling the system to coopering with the system in a more team-like manner – this field is often referred to as Manned

Unmanned-Teaming (MUM-T) or Human Autonomy Unmanned-Teaming (HAT) (Battiste et al., 2018; Rudnick &

Schulte, 2017; Taylor et al., 2017). Within this field, the autonomous unit(s) and the humans are regarded as a team (much like a human-human team), and yet many problems remain regarding how humans and autonomous systems should interact with each other. Problems within areas of communication, trust and executions of commands needs to be addressed in order to achieve effective collaboration between humans and autonomous systems.

The following work was initiated on behalf of the Swedish Defence Research Agency and was carried out within the Gemensamma Teknikbehov inom Obemannade och Autonoma

System (GT-OAS) and Ledning av Autonoma och Sammansatta System med Intelligenta Enheter (LASSIE) projects. The GT-OAS project aims to find solutions regarding how

unmanned and autonomous systems can best be implemented in order to facilitate the interaction between the user and the system. The LASSIE project aims to investigate how command and control, and supporting functions will be affected by the introduction of autonomous intelligent units that are capable of interacting with both other autonomous intelligent units and humans.

1.1. Aim and research questions

This thesis aims to explore different ways of interaction between human and artificial agents and how interfaces of autonomous systems supports this interaction. The following research questions will be used:

1. What are the current interactions principles today that can be applied in a military context?

(10)

3. Based on the literature overview that underlies question 1 and 2, which interaction problems can be identified through a case study and how can they be improved? This will be elaborated after the result from the literature overview have been presented. The research questions will be answered using a two-step process consisting of an initial literature overview to answer research questions 1 and 2. This overview will be based on a previously created library of research articles and military reports.

Research question 3 will be examined through a case study with a relevant military scenario containing an autonomous unit corresponding to the Teleoperated Ground Combat Unit (TGCU) described in Johansson et al. (2019).

1.2. Delimitations

This thesis will only provide recommendations regarding potential improvements that can be made to identified problems, and thus, no design proposals will be made. The case study will be conducted within the boundaries of the LASSIE project, meaning that the scenarios have been pre-made and will not be adjusted. Further, the case study will utilise the pre-existing voice commands used to control the TGCU.

(11)

2. Step one – Literature overview

The following section provides a shorter background of UAV:s and UGV:s. Moreover, it presents the results of a literature overview which was conducted on a preassembled database of articles within the GT-OAS project, that the Swedish Defence Research Agency provided the researcher. The database consisted of 173 articles regarding unmanned vehicles divided into ten categories: Interaction via Gestures, Multi-Modal Interaction, Haptic Interaction, A

Mixture of Modalities, Aspects of Autonomy, Graphical User Interfaces and Interface Design, Approaches to, and Principles of, Interaction Design, 3D- and VR-Techniques, Human Factors, and Operator Viewpoints. The literature overview focuses on human-autonomy

interaction principles that have been studied in different contexts in order to answer the first and second research questions. The impact of autonomy on interaction principles and one approach to defining levels of automation is also discussed.

2.1. Unmanned Aerial Vehicle

Unmanned aerial vehicle (UAV) refers to an aircraft without any human passengers or pilots onboard. Thus, an UAV is unmanned insofar as no human can actively direct and pilot the aircraft onboard, rather it is either a remotely piloted aircraft system (RPAS) or it is capable of flying autonomously (Valavanis & Vachtsevanos, 2015). As a result of being pilotless, UAV:s have been deemed better suited for dull, dirty and dangerous missions than manned aircrafts (Nonami et al., 2010). Hence, UAV:s have been applied to various contexts such as disaster relief, search and rescue, as well as reconnaissance, surveillance and intelligence (Nonami et al., 2010). These contexts may have different needs from an UAV, and as a result UAV systems come in various configurations, with different capabilities, endurance levels and sizes. However, UAV:s can generally be placed in one of four categories: fixed-wing,

rotary-wing, blimps and flapping-wing (Nonami et al., 2010). Fixed-wing UAV:s are generally

energy efficient, offering long ranges, endurance and high speeds (Nonami et al., 2010), making them suitable for reconnaissance over long periods of time (MSB, 2018). Rotary-wing UAV:s have the advantage of being able to take-off and land vertically, in addition to being highly manoeuvrable with the capacity of hovering at a given position. These characteristics make rotorcrafts suitable for urban areas where there may be need of precise manoeuvres (MSB, 2018; Nonami et al., 2010). Blimps refers to airships and balloons that are lighter than air. These systems are generally slow and large but offer long endurance (Nonami et al., 2010). Flapping-wing UAV:s imitates birds and insects, insofar as their wings are flexible and/or able to morph (Nonami et al., 2010). Common for these categories of UAV:s is the usage of sensors, for instance radar sensors that are used in order to identify altitude, distance and direction of different objects (MSB, 2018). In addition to radar sensors, camera sensors are often attached to UAV:s. Cameras provides the means of generating a visual image of an area and are commonly used in both civilian and military contexts (MSB, 2018; Nonami et al., 2010). Further, CBRN sensors have also been attached to UAV:s to allow for safer testing and exploration of potential chemical, biological, radiological or nuclear threats (MSB, 2018).

2.2. Unmanned Ground Vehicle

Unmanned ground vehicle (UGV) refers to a vehicle that is capable of operating while in contact with the ground and without the presence of an onboard human driver (Xin & Bin, 2013). These vehicles are often operating in unknown environments and faced with many

(12)

navigational challenges due to the various ground conditions that exists in its operational environment, such as obstacles and different terrain types (Odedra et al., 2009). As with UAV:s, UGV:s come in various configurations that often are defined by their given task and the environment in which they are to operate within (Odedra, 2011). However, an UGV generally consists of sensors, platform, control, human machine interface, communication and system integration (Nguyen-Huu & Titus, 2009). Sensors allows the vehicle to perceive its surroundings, thus enabling controlled movements, which is crucial in highly unpredictable environments. The platform provides the utility infrastructure, power, and locomotion for the unmanned system. The UGV:s control systems determine the level of autonomy that the vehicle possesses, ranging from classic algorithmic control to adaptive control and multiple robot collaboration. The human machine interface may vary depending on how the UGV is controlled, e.g. by joystick and a monitor, or by natural modes of interaction such as speech commands. The communication link between the human operator and the vehicle varies and may be in the form of fiber optics or radio link, and in the case of military UGV:s

communication aspects such as secrecy and accuracy of information exchange are of great importance. System integration refers to the synergy within the unmanned system, which is dependent on the choice of sensors, components, configuration and system level architecture. (Nguyen-Huu & Titus, 2009)

UGV:s have been used in various applications and environments, often where humans cannot operate, for instance exploration in space (Odedra, 2011). Furthermore, UGV:s often replace humans in dangerous situations, where they perform defence, reconnaissance, transportation and security tasks in order to save lives (Odedra et al., 2009; Odedra, 2011). Within the military domain, UGV:s have been used to enhance the soldiers capabilities on the battlefield (Odedra, 2011), for instance to find and disarm mines and improvised explosive devices (IED) (Barnes et al., 2019; Odedra, 2011).

2.3. Design

Design, according to Simon (1988), is concerned with how things should be, where people devise artefacts in order to attain goals. This have been considered the central activity of engineering (Simon, 1988). Design can be likened to problem solving, insofar that it is a natural human activity (Razzouk & Shute, 2012) aimed at changing existing situations into preferred ones (Simon, 1988). Thus, design is about finding alternatives by creating multiple possible solutions (Saffer, 2009) based on scientific learning about human cognition,

ergonomics and senses (Goodwin, 2009). Design as a craft, according to Goodwin (2009), focuses on understanding the context in order to serve human needs and goals, to ultimately solve the given problem(s) within various constraints - e.g. cost, time and regulations (Goodwin, 2009).

2.4. Design principles

Design principles can briefly be described as ways of interacting with various objects, an objects interface with a human. For example, a design principle can provide recommendation of usage of controlling objects using gestures during clear vision in daylight and provide a warning of usage after sunset or fog.

Humans employ various interaction modalities in their daily lives, such as gestures and

(13)

goal in human-machine interaction. The following section will provide a literature review of various design principles and their applications in trying to achieve a more natural human-machine interaction in civilian and military contexts.

2.4.1. Gestures

Gestures are movements of the hands and arms which happen during interaction

(communication) between people (McNeil, 1992). The phenomenon of using gestures while communicating has been observed for at least 2000 years (Goldin-Meadow, 1999) and is reportedly found in various cultures (Bernardis & Gentilucci, 2006; Goldin-Meadow, 1999; Streeck, 1993). Gestures can be found when speech-ability is unavailable (Goldin-Meadow, 1999), when speech cannot convey the message or thoughts adequately (Bernardis & Gentilucci, 2006; Goldin-Meadow, 1999), and has the ability to reduce cognitive effort or serve as a thinking tool (Goldin-Meadow, 1999).

Traditionally, missions involving UAV:s have often been generated and altered under extreme conditions and in environments where the traditional systems based on mouse and keyboard interfaces can be seen as inefficient (Dernoncourt, 2014) and potentially disturbed by vibrations (Chandarana et al., 2017). One approach to solve this problem have been to implement more natural and intuitive interfaces, that allows for more human-to-human like interaction. Chandarana et al. (2017), examined the possibility of using gestures for UAV flight path generation. They implemented a gesture-based interface, using a Leap Motion controller to capture a user’s gesture input via three infrared cameras, and then compared this to a mouse and keyboard-based interface as baseline. Their results showed that although the mouse and keyboard interface performed better in terms of correctly defined flight paths, the gesture-based interface resulted in a fairly high percentage of correctly defined flight paths (74.36%). Chandarana et al. (2017) have argued that these results indicated that a small amount of training and guidance were needed in order to understand their gesture-based interface. However, subjects with prior UAV experience performed worse than subjects with no previous experience. In addition, the majority of the test subjects rated their own workload to be higher while using the gesture-based interface (Chandarana et al., 2017).

As described in the previous paragraph, there is a need for more natural and intuitive

interfaces in the domain of unmanned ground vehicles. Today UGV:s are often teleoperated (i.e. the operator controls the UGV with an operator control unit [OCU] such as a tablet, laptop or game controller). This mode of interaction often results in a singular focus on the UGV, which negatively affects the operator’s situation awareness (SA) and cognitive load (CL) while also rendering the operator vulnerable in battlefield situations (Marge et al., 2012). Marge et al. (2012) proposed a Heads-up, Hands-free Operation (HHO) as a solution to this problem. Marge et al. examined if a combination of autonomous person following (where a UGV is able to trail humans autonomously) and gesture control of the UGV could enable the operator to conduct a given mission faster, reduce the experienced CL and recall their

surroundings better by keeping his/hers head up and their hands free. In their study, the operator could toggle the UGVs following mode by forming a “T” shape with their arms, these gestures were identified by the experimenter who remotely toggled between the two

(14)

modes. The results from Marge et al.’s study indicated an overall better performance using the HHO compared to traditional teleoperation, and results from their NASA-TLX survey showed that 73% of the participants preferred the HHO UGV. Marge et al. (2012) argued that HHO offered a less demanding alternative to traditional teleoperation, which could be used advantageously in a military setting since it would eliminate the need of having another soldier defending the operator.

Within a military context, gestures have always been used as a form of communication. The usage of hand and arm signals have the advantage of rapidly and covertly conveying

information for coordinating actions in a dynamic context (Barnes et al., 2019). Different implementations aimed at replicating the Solider-Soldier gesture communications have been examined, these include the usage of instrumented gloves, handheld devices as well as camera-based systems. Elliott et al. (2016) concluded in a review of gesture-based systems that instrumented gloves would be best suited for the dismounted Soldier. Barnes et al. (2019) highlighted several advantages with using gestures as mode of communication with

unmanned systems. These include the naturalistic means for direction cues (i.e. pointing), being covert (i.e. silent), effective under high noise levels, no need for line of sight (with instrumented gloves), can be consistent with naturalistic movements and the fact that it is already used for Soldier-Soldier communications. Barnes et al. (2019) also noted that there are some disadvantages with gesture communications, as options for gesture recognition have different reliability and it struggles with a limited vocabulary. Gesture communication

between human and unmanned systems are still primarily dependent on a physical co-presence where visibility becomes key to convey information. This visibility can be affected by various factors such as fog, smoke, mist or night operations, especially if the gesture recognition system is camera-based (Elliott et al., 2016).

2.4.2. Haptic

Haptic feedback in the form of vibrotactile cues (e.g. mobile phones and watch vibrations) have been used for a long time to direct and manage attention (Mortimer et al., 2020). Today, the ever increasing need to present complex data results in an elevated risk of auditory and visual overload, and the human sense of touch provides a unique communication channel (Jones & Sarter, 2008) that can be utilized by tactile applications in order to supplement visual cues, resulting in reduced workload and increased performance (Elliott et al., 2009).

Today, teleoperation of unmanned systems rely heavily on visual feedback due to the distance between the operator and the vehicle, this physical detachment complicates the process of maintaining an effective awareness of the unmanned system (Luz et al., 2018; Micconi et al., 2016). Luz et al. (2018) examined the possibility of multimodal feedback by utilizing other human senses in addition to vision, in order to reduce the burden of the visual channel that might otherwise be overloaded, resulting in loss of relevant information. Their study implemented three different types of devices designed to provide the user with haptic

feedback, a traction cylinder, a vibrotactile glove and an E-Vita (tactile tablet). These devices provided tactile stimuli in the form of friction, vibration, and texture to convey the estimated traction state of a UGV. Luz et al. (2018) found that traction and vibration, compared to the visual-only modality improved the situation awareness of the operators regarding their understanding of the UGV’s traction state. The authors note that these results show the

(15)

viability of adding haptic feedback to the interface as an alternative sensor modality, that have the potential of decreasing the risk of visual overload.

The utilization of haptic feedback have also been explored within the realm of UAV:s.

Micconi, Aleotti and Caselli (2016) examined the possibility of enhancing a Remotely-Piloted Aerial System (RPAS) by implementing a haptic-based teleoperation system in order to provide additional feedback within environmental monitoring. Micconi et al. (2016) implemented an RPAS carrying a nuclear radiation sensor where the simulated sensor data was used to trigger force feedback to haptically convey detected radiation. The results indicated a slight decrease in exploration time and average detection error. However, this difference was not statistically significant. Micconi et al. (2016) noted that operators were able to focus solely on piloting the UAV when force feedback was enabled, suggesting that force feedback could supplement the visual modality by reducing the operator’s workload. Tactile options, such as tactile displays, have been examined as potential tools for spatial and direction orientation information, attention management and short communication within a military context (Barnes et al., 2019). In addition to being easy to learn and recall, tactile cues have shown to be as comprehensible as hand and arm signals, with the added advantage of working in poor visibility conditions (e.g., fog or smoke) and when people are out of line of sight (Barnes et al., 2019). Qualities that make tactile cues suitable for covert or night operations. However, tactile cues are limited to a small vocabulary (Barnes et al., 2019). Barnes et al. (2019) give account of research that indicate that a combination of tactile cues (from a tactile belt-system) and a chest-mounted display have resulted in increased navigation accuracy, lower experienced cognitive workload as well as reduced mission times (Barnes et al., 2019). Furthermore, tactile messages in the form of statues updates or alerts have proven to be a viable option for bidirectional communication between the operator and the unmanned system (Barnes et al., 2019).

2.4.3. Speech

Speech is a natural communication channel for humans, used both face-to-face and remotely via phones and mobile devices. In recent years, speech-based interfaces have increased in popularity thanks to smartphones. According to Kumar et al. (2014) this increase in popularity can be traced to big tech companies such as Apple, Google and Samsung launching their various speech-driven services (e.g. Siri, Google Now and S Voice) (Kumar et al., 2014). Speech-based interaction offers several advantages; among them a faster communication channel compared to the traditional mouse and keyboard interface. Eliminating the need for an onscreen keyboard can also increase the available screen real estate (Kumar et al., 2014). Furthermore, speech allows the user to interact with the system in a hands- and eye-free way (Kumar et al., 2014), when used with UAV:s it can enhance the operator’s experience by enabling full visual attention to be directed at the unmanned system (Fernandez et al., 2016). Speech is an indispensable communication channel within military operations, where radio-based communications are both sent and received (Barnes et al., 2019). Speech provides several advantages as a communication channel, such as being effective during poor visibility conditions, in addition to increasing comprehension and lowering workload in stationary settings (Barnes et al., 2019). However, speech is not as effective in noisy contexts, nor is it covert in situations that require silence (Barnes et al., 2019). Barnes et al. (2019) notes that speech is more effective in combination with another communication channel (Barnes et al.,

(16)

2019), for instance visual displays, where speech can be used to present status updates, attention management (e.g. alarms) and to convey critical information (Wickens, 2008). Speech has also shown to reduce workload when controlling robotic assets and is believed to be an essential component in achieving seamless interaction with autonomous systems (Barnes et al., 2019).

2.4.4. Visual displays

In today’s digital society, visual-spatial displays have become ubiquitous in human

communication. These visual displays are external representations that can take various forms such as a computer/smartphone screen or be printed on a piece of paper (Hegarty, 2011). Visual displays have been implemented in order to enhance cognition in various ways, for instance by providing means to externally organize and store information spatially, as well as to offload cognitive processes onto perceptual processes (Hegarty, 2011). Thus, interaction with visual displays have the potential to enhance the user’s thinking. However, several factors affect the user’s ability to comprehend the displayed information such as attentional, perceptual and encoding processes (Hegarty, 2011). For instance, the user can fail to encode essential information as a result of being distracted by task-irrelevant information that is highly salient (Hegarty, 2011).

Visual displays are omnipresent in today’s military operations and being used for displaying complex information such as graphics, map-based information or camera-based video feeds (Barnes et al., 2019). However, visual-only displays have several disadvantages such as line of sight being a prerequisite for using them, further they increase the risk of attention

tunnelling and information overload (Barnes et al., 2019). Barnes et al. (2019) also notes that hand-held visual-only displays can obstruct tactical situations in various ways due to its “head-down” nature. This entails interfering with weapon use and the potential of causing the operator to lose awareness of the immediate surroundings. Additionally, the usage of visual displays are not suitable for night operations as it might expose the position of the user (Barnes et al., 2019).

2.5. Multimodal interaction and intelligent adaptive interfaces

Although there are many studies regarding ways of interaction with technology, it should be noted that human communication and interaction is naturally multimodal (Bunt et al., 1998), combining various verbal and nonverbal channels to communicate with others (Quek et al., 2002). Humans employ gestures and other body language such as gaze and facial expression to enhance communication and interaction, thus allowing for more complex information to be conveyed than through a single modality (Bischoff & Graefe, 2002; Quek et al., 2002). Traditionally, human-computer interaction have focused on single modes of communication such as visual information on a screen with a keyboard input (Turk, 2014). As technology has evolved, so has the amount of available information, which in turn increase the risk of

information overload (Brewster, 1997). Having an abundance of available information can lead operators to focus on processing information instead of performing tasks, resulting in generally poorer performance and decision making (Wickens, 2008). Distributing the information across multiple channels can help to unburden the visual channel (Wickens, 2002). Another approach is to adapt the modality of the information depending on the context using Intelligent Adaptive Interfaces (Hou et al., 2018). These interfaces can react to an

(17)

operator and/or task state and adapt the displayed information and/or control accordingly (Hou et al., 2018), and thus it can provide the same flexibility as a human.

Natural interaction using a multimodal interface is one step towards achieving human-to-human like interaction between human-to-humans and unmanned systems. Taylor (2017) argues that multimodal interfaces which enable natural interaction is key to integrating unmanned systems, transforming their role from tools to actual team members (Taylor, 2017).

Multimodal interfaces have several potential benefits; among them is a greater learning rate of new systems if the user is able to interact in already familiar ways. Furthermore, multiple modes of communication enables one to choose how to communicate in a given situation without the loss of information, where two modes can be used simultaneously in order to ascertain that the intended message is received correctly (Taylor, 2017). Multimodal communication offers therefore a more robust interaction between humans and unmanned systems, compared to single mode of interaction (Barber et al., 2016). Contextual factors such as a given task, as well as the environment in which the task is performed can dictate which modes of interaction are of use (Taylor, 2017), these factors can be utilized by an intelligent adaptive interface to customize the mode of interaction (Hou et al., 2018). For instance, spoken communication may not be possible or desirable in situations where the task requires absolute silence, or in which the noise levels renders spoken communication useless. Gestures cannot, in most situations (Barnes et al., 2019), be used in situations where participants are unable to see each other, or if ones hands are occupied (Taylor, 2017). Thus, in order to introduce natural modes of interaction, one needs to understand how the user can and wants to interact with the systems in a given context (Mosier et al., 2017; Taylor, 2017). Along these lines, Abioye et al. (2019) investigated the feasibility and practical suitability of the

multimodal combination of visual gestures and speech in human to aerobatic interaction and how this was affected by varying noise- and lighting levels. Abioye et al. (2019) found that speech recognition accuracy/success rate fell as noise levels rose, where a noise level beyond 75 dB caused the recognition to be very unreliable. However, certain speech command words, such as “land”, were more noise resistant than others at higher noise levels. Abioye et al. (2019) concluded that a careful selection of words for the speech commanding phrases could potentially increase the accuracy and validity of speech recognition at higher noise levels. In addition, the study found that environment background and lightning conditions had next to no effect (less than .5%) on the quality of gesture recognition.

Another important aspect in achieving natural interaction between humans and unmanned systems is the processing time for issued commands, where delays in communication can have a high impact on the systems usability (Barber et al., 2016). Barber, Howard and Walter (2016) examined the possibility of real-time interactions with autonomous systems using a multimodal interface. In their study, the authors developed and tested a prototype consisting of Automated Speech Recognition (ASR), a custom gesture recognition glove, and natural language understanding available on a tablet. The gesture glove was used to convey nine gestures that their gesture recognition system supported (left, right, forward, back, clockwise, counterclockwise, resume, pause and pointing). The operator could issue commands to a UGV using gestures, speech, or a combination of the two. The UGV would deliver responses in the form of text-to-speech, auditory cues or via the visual display. The multimodal interface consisted of a live video feed, current command, UGV status information as well as a top down interactive map. Barber, Howard and Walter (2016) argued that this setup would allow for flexible bi-directional communication that would free the operator from having to

(18)

constantly look at the display, while also providing the option of having more detailed

information available on the screen if the UGV is no longer within the operators line-of-sight. The result from the study indicate that the multimodal interface prototype was able to operate in real-time with delays ranging from 201.62 ms to 1682.01 ms (Barber et al., 2016).

Combining interaction modalities such as speech and gestures have shown great potential within military contexts (Barnes et al., 2019). Traditional modalities such as speech provides several advantages including lowered workload when communicating with robotic assets, effective during poor visibility conditions and providing 3-D direction cues (Barnes et al., 2019). However, speech is not covert when the situation calls for silence, nor is it effective during high noise levels. Furthermore, purely speech-controls face problems such as communicating explicit directions and spatial relationships in a 3-D environment. In a multimodal interface, gestures could potentially be implemented to clarify localisation information provided by speech, and tactile feedback can be used to assert direction (Barnes et al., 2019). Barnes et al. (2019) highlights that the applicability of different interaction modalities will differ depending on context. However, the authors note that certain principles can be carried across context, such as torso-based tactile cues. These tactile cues have shown to be best at indicating direction but are not as effective at communicating distance. Thus, the authors concluded that torso-based tactile cues can be combined with speech-cues in order to convey both direction and distance.

2.6. The impact of autonomy on interaction principles

Automation have been introduced to various systems in order to increase precision, efficiency and to reduce the operator’s training requirements and workload (Sarter et al., 1997). While automation have led to more efficient systems, the human operator’s role has often shifted from an active to a more passive role of monitoring, handling exceptions and managing automated resources. The classic view of automation envisioned a reduction in human

“errors” and workload for the human operator as a result of this new role (Sarter et al., 1997). However, it has been shown that workload is not necessary reduced by automation, it only changes or shifts, presenting the operator with new challenges (Hollnagel & Woods, 2005) that demands a greater knowledge of the system to solve (Sarter et al., 1997). Furthermore, automation have not eliminated human “errors”, their nature might have changed while new kinds of errors such as “mode error” have been introduced (Sarter et al., 1997). According to Sarter, Woods and Billings (1997) mode errors occurs when an operator performs something that is appropriate for one mode, when the system currently is in a different mode, indicating that the operator lacks an accurate mental model of the automated system (Sarter et al., 1997). Sarter et al. argue that mode awareness – i.e. awareness of the automation’s behaviour and status, is essential to maintain a representative SA of the system (Sarter et al., 1997). The loss of mode awareness may lead to what Sarter et al. (1997) call “automation surprises”, in which the operator is surprised by the behaviour of the automation. Feedback from the system can be used in order to keep the operator in the loop and thus reduce the chances of automation surprises (Sarter et al., 1997) and much effort should be placed to avoid the ironies of automation (Bainbridge, 1983).

A similar development of autonomy is envisioned within the military domain. As autonomy improves, manual control will be used as a fallback mode for emergencies and the human’s role will shift to supervision of multiple unmanned systems (Barnes et al., 2014).

(19)

Subsequently, Barnes et al. (2014) reviewed U.S Army Research Laboratory (ARL) research on the human’s role in future autonomous systems with the goal of procuring design

guidelines to further improve upon human-autonomy collaboration (Barnes et al., 2014). Chen and Barnes (2014) developed several guidelines to allow for control over multiple

autonomous systems, which Barnes et al. (2014) summarised as follows:

• Agent/human interaction needs to be flexible. The user interface should support

bidirectional communications and control structures to effect rapid change. The system should be able to adjust to operator workload and allow agents to act autonomously under operator-specified conditions (Barnes et al., 2014, p. 3).

• The user interface must enable the operator’s ultimate decision authority. The

mechanism for ensuring human authority needs to be embedded in the agent architecture (e.g., mixed- initiative systems) (Barnes et al., 2014, p. 3).

• Automation transparency is essential. Lee and See’s (2004) 3P’s (purpose, process,

and performance) as well as the history of the system’s 3P’s should be presented to the operator in a simplified form, such as integrated graphical displays. The user interface must support operator understanding of the agent’s behavior and the

mission environment as well as effective task resumption after interruptions (Barnes et

al., 2014, p. 3).

• Visualization and training techniques should act as enablers of human-agent

collaboration. Appropriate human-agent trust (Lee and See, 2004) can be reinforced

by both training and visualization methods. Specific visualization techniques (e.g., augmented reality) have proven to be particularly useful in improving SA in the type of complex environments where agent technology is most beneficial. Operators should be trained to understand the system’s 3P’s (Barnes et al., 2014, p. 3).

• Human individual differences must be part of the human/agent design process. This

guideline can be accomplished by interface design, selection, training, and/or

designing agents that are sensitive to individual differences among humans (Barnes et

al., 2014, p. 3).

2.7. Levels of Automation

There have been many attempts at defining various levels of automation. Parasuraman,

Sheridan and Wickens (2000) proposed a conceptual model for types and levels of automation (LOA), where the authors argued that functions can be automated to various degrees ranging from low to high (i.e. from fully manual to fully automated). Parasuraman et al. (2000)

suggested four stages of automation: (1) information acquisition; (2) information analysis; (3)

decision and action selection; and (4) action implementation. These stages represent input and

output functions based on human information processing (Table 1).

Table 1. Four stages of automation based on human information processing (Parasuraman et al., 2000)

Stage Description Human information processing 1 Information acquisition Sensory processing

2 Information analysis Perception/working memory 3 Decision and action selection Decision making

(20)

The first stage of automation, information acquisition, refers to automation that supports processes linked to sensing and registering input data. This stage aims to support human sensory and perceptual processes. Thus, automation in this stage may incorporate systems capable of scanning and observing the environment (e.g., radar) in order to assist humans in monitoring activities. Higher levels of automation within the information acquisition stage may use criteria’s to organise and highlight sensory information, for instance an automated traffic control system that lists aircrafts according to handling priority (Parasuraman et al., 2000). Automation in the information analysis stage assess and manipulates information by performing tasks similar to human cognitive functions, such as inferential processes and working memory. In this stage, automation may provide predictions (e.g., future flight course of another aircraft), integration of multiple input variables, or context-dependant summaries of data to the user (Parasuraman et al., 2000). In the decision and action selection stage, the automation selects from decision alternatives and may for instance recommend diagnoses to medical doctors. Table 2 illustrates potential levels of automation for this stage. In the final stage, action implementation, automation executes the selected action and may complete all, or parts of a task (e.g. an autopilot in an aircraft) (Parasuraman et al., 2000).

Table 2. Levels of automation of decision and action selection (Parasuraman et al., 2000, p.287)

Range Levels of automation

High 10. The computer decides everything, acts autonomously, ignoring the human. 9. informs the human only if it, the computer, decides to

8. informs the human only if asked, or

7. executes automatically, then necessarily informs the human, and

6. allows the human a restricted time to veto before automatic execution, or 5. executes that suggestion if the human approves

4. suggests one alternative

3. narrows the selection down to a few, or

2. The computer offers a complete set of decision/action alternatives, or

Low 1. The computer offers no assistance: human must take all decisions and actions

2.8. Section summary

This section has given a short introduction to UGV:s and UAV:s. Additionally, various ways of interacting with unmanned systems and different human-autonomy interaction principles, such as gestures, speech, visual displays and haptic feedback, have been discussed from both a civilian and a military context. These human-autonomy interaction principles were shown to have their advantages and disadvantages depending on both context and situation.

Consequently, the combination of multiple interaction principles in the form of multimodal interfaces have been detailed and discussed as a way of providing a more natural human-autonomy interaction. Lastly, automation was discussed, in part how it can affect the

interaction principles but also how levels of automation can be defined based on Parasuraman et al.’s (2000) conceptual model.

Based on the human-autonomy interaction principles that have been identified from the literature that was provided for the first step, namely gestures, haptic feedback, speech, and visual displays, some of the principles will be tested in a case study. The chosen principles will be decided due to the degree of feasibility, this will be further described in the following section.

(21)

3. Step two – Case study

This section describes the method used to answer the third research question, in which data was collected through a case study conducted at the Swedish Defence Research Agency. Three observations and two interviews were conducted at two separate sessions, spanning a total of three scenario runs with one additional group discussion after the first session. The case study is limited to the boundaries of the LASSIE project, meaning that all of the previously identified human-autonomy interaction principles cannot be studied. The case study will investigate speech and visual displays in order to answer the third research question.

The case study was based on a scenario created by Johansson et al. (2019) where Sweden has been attacked by a hostile nation. The scenario simulated a realistic battle mission, in which a mechanised reconnaissance platoon is to perform a counter-attack against Spång. The

simulation ran in Virtual Battle Space 3 (VBS3), a virtual training environment offering a first-person-shooter (FPS) perspective and multiplayer capabilities for military simulation. VBS3 was developed by Bohemia Interactive and used today as a training and education tool by numerous militaries over the world. VBS3 provides common vehicle and weapon models from various nations. In addition it enables the creation and implementation of custom scenarios and units (Bohemia Interactive, n.d.).

The platoon consisted of an Infantry Fighting Vehicle (IFV), a caterpillar tracked combat vehicle with a tower that rotates 360 degrees, with a crew that was composed of a driver, an IFV commander and a gunner, these roles were all played by participants in the study. An IFV vehicle, seen in Figure 1 below, also has room for an additional seven soldiers

(Försvarsmakten, 2013).

Figure 1. UGV on the left and IFV on the right in VBS3

During the mission, the IFV commander had one autonomous UGV at their disposal, which could be controlled either manually or by voice command. The unmanned vehicle used in the case study had previously been outlined by Johansson et al. (2019). The authors proposed different fictional autonomous units with various levels of autonomy as a basis for future research in a study to be conducted at the Swedish Defence Research Agency. The

(22)

autonomous abilities where based on technical predictions as well as already available technology (Johansson et al., 2019). One of the proposed levels was Level 1 – Teleoperated

ability for ground reconnaissance. This level, according to Johansson et al. (2019), could

potentially be achieved by 2025-2030 as it entails a low level of autonomy with autonomous abilities that are realistic with today’s technology. The authors made use of a Teleoperated

Ground Combat Unit (TGCU) in order to illustrate their Level 1 of autonomy (Johansson et

al., 2019).

Figure 2. TGCU model for VBS3 developed by the Swedish Defence Research Agency, on the left. TGCU model inside of VBS3 on the right

The unmanned vehicle in the images above (Figure 2) makes use of caterpillar bands, and its propulsion system consists of a diesel generator and two batteries, enabling a top speed of circa 20 km/h (Johansson et al., 2019). The TGCU can perform missions for 12 hours, given that its two batteries are fully charged or for 24 hours in the case of static tasks (e.g. fixed position surveillance). An operator can control the TGCU, using simple voice commands, from a management site or from inside of a combat vehicle. However, the weapons can only be manually controlled using a terminal with a screen (Johansson et al., 2019). The TGCU is capable of receiving movement directives (e.g. “move to position x) as well as performing reconnaissance within a given sector, where it will report movement of identified humans or vehicles. TGCU’s navigational ability is limited in rough or dense terrain and may signal to be manually operated in order to get past certain obstacles. TGCU will report to the operator if it experiences malfunctions or disturbances that it interprets as attacks. If contact is lost with the operator, the TGCU will complete its last given task or perform it until its power reaches a critical level, either inducing a standby mode or prompting a search for a nearby recharge station.

In the case study’s scenarios and simulation, the TGCU was controlled using a Wizard of Oz method. Wizard of Oz is a data collection technique suitable for situations in which the interaction with a system may be trivial, in comparison to the response from the system which may be complicated, expensive, or currently not technologically possible to implement. In Wizard of Oz studies, participants interact with what they believe to be a computer system through an interface. However, a human or “wizard”, mediates the interaction between the participant and the system, creating the illusion of a fully functional system (Dahlbäck et al., 1993).

(23)

3.1. Scenario

Johansson et al. (2019) produced several scenarios with the purpose of evaluating how autonomous units with different levels of autonomy could aid mechanised combat. In the scenario, Sweden has been attacked by a hostile nation. The enemy has landed in both Norrköping and Oxelösund, with additional troops having airdropped at Malmen Airport in Linköping. One assignment is to take control of Spång, a fictional village, located at

Prästtomta firing range, which is currently occupied by an enemy mechanised company consisting of ten Armoured Personnel Carriers and a mortar section equipped with heavy grenade launchers. The Swedish counter-attack against Spång is conducted with one

mechanised battalion, consisting of two mechanised companies and two tank companies. The reconnaissance platoon, played by the participants, will initiate the attack by advancing slowly in order to identify targets and mines, while avoiding being spotted. One mechanised company will circumvent and attack Spång from the North West, while the rest of the battalion attack head-on, pinning the enemy to Spång (Johansson et al., 2019).

The scenario used in the case study, Johansson et al. (2019) based on the autonomous abilities of the autonomous unit described in Level 1 – Teleoperated ability for ground

reconnaissance. In this scenario, one IFV is advancing on Spång from North of the lake

Ommen (See Figure 3). The IFV crew consists of an IFV commander, a gunner, and a driver, each one controlled by a participant. The IFV commander has one TGCU at their disposal which can be controlled either manually or by voice command (Appendix 1 – Quick reference guide for TGCU). The enemy mechanised platoons are located at different forest areas north of Spång, while the grenade launching platoon is located in central Spång (See Figure 3).

Figure 3. Starting situation for the scenario runs. The IFV, played by the participants, can be seen on the top right in blue color. The enemy platoons in red can be seen in and around Spång on the bottom left

(24)

3.2. Participants

Data was collected during three scenario runs, in which three participants played the roles of an IFV crew per run. The roles included a driver, a gunner and an IFV commander, all roles were filled by employees at the Swedish Defence Research Agency. The IFV commander was the only participant of study, and two different participants filled the role of the IFV

commander.

The IFV commander (1) in the first and second run had some experience with tanks and combat vehicles, however, never as an IFV or tank commander. The participant was familiar with simulation environments such as VBS3 but had no previous experience with unmanned vehicles.

The IFV commander (2) in the third run had plenty of experience with tanks and mechanised combat. The participant had done their military service with tanks, been involved with educating mechanised soldiers and acted as a platoon tank commander. The participant was familiar with simulation environments akin to VBS3, which had been used in the training of mechanised soldiers. The participant had no previous experience with unmanned vehicles.

Table 3. Scenario runs and designated IFV commander. An X represents one study director playing the role

Scenario run IFV commander Gunner Driver

1 IFV commander 1 X X

2 IFV commander 1 X X

3 IFV commander 2 1 X

3.3. Apparatus

The physical simulation environment contained six computers, seven computer displays, six sets of mouse and keyboard, a joystick, and a wheel for gaming. Each station (see Figure 4) had a printout of relevant VBS3 key bindings for controlling their character in the simulation. The IFV commander station also had a quick reference guide containing available voice commands to control the TGCU, and physical maps of the area surrounding Spång.

3.3.1. Materials

The materials used for the data collection consisted of a video camera for recording the IFV commanders’ interaction with the TGCU. Field notes were produced in a semi-structured observation protocol using pen and paper. The participants in the interviews were given information both verbally and via the informed consent form. A computer was used to take notes during the interviews.

(25)

3.4. Procedure

All scenario runs were conducted at the Swedish Defence Research Agency and recorded with a video camera. Initially the participants were seated in a group, whereupon they were

presented with the overall scenario and the specific scenario’s objectives (See section 3.1). Then the participants were given maps of the area and a period of ten minutes to strategise and form a battle plan, after which each participant was given a designated station (Figure 4).

The gunner and driver stations consisted of one computer, a computer display, a mouse, a keyboard, and a headset with microphone for communication. The IFV commander’s station had the same equipment as the rest of the IFV crew, except for the additional equipment needed to interact with the TGCU. This entailed an additional computer display in which the commander could see the optical view (i.e. camera feed) and visual feedback (i.e. textual feedback) of the TGCU; a joystick to manually control the TGCU; and speakers in order to receive auditory feedback from the TGCU (Figure 5).

(26)

To give the TGCU instructions, the IFV commander would have to initiate the

communication by using its calling name “A1” (e.g. “A1 follow the road”), otherwise the TGCU would not perceive the instructions. The IFV commander could use the in-game map (i.e. the map in VBS3) to see the current position of the TGCU and also use it to mark spots or areas on the map that the TGCU could navigate to when ordered (Figure 6).

Figure 5. The IFV station with the IFV commander’s point of view on the left screen and the TGCU's point of view on the right screen

Figure 6. Image of the map view that the IFV commander could use to mark areas and spots for the TGCU that can be seen in the middle of the image with the IFV to the right of the TGCU

(27)

Before the scenario started, the participants were given a moment to explore the simulation environment in VBS3 in order to familiarize with their role in the IFV as well as the necessary controls. Each participant was given a quick reference guide for the control scheme of VBS3. The IFV commander was also given a quick reference guide of available voice commands to control the TGCU (Appendix 1 – Quick reference guide for TGCU). Once the participants felt ready, a camera overlooking the IFV commander’s station would start recording and the simulation began. The starting position of the IFV (top right) and enemy units (bottom left) can be seen in Figure 3.

The Wizard of Oz setup was curtained from the participants and consisted of a designated station, containing two computers, two computer displays, a steering wheel for gaming and two sets of keyboard and mouse. One of the displays showed the point of view of the TGCU, and the other was used to send visual and auditory feedback to the IFV commander. The station was manned by two study directors (i.e. “wizards”), one controlled the movement of the TGCU, while the other controlled the visual feedback (i.e. textual responses from the TGCU) and auditory feedback (i.e. audio signals sent through the speakers). The

communication between the IFV commander and the TGCU was conducted over Voice over Internet Protocol (VoIP) in VBS3, while the communication between the IFV crew was conducted over the in-game radio network.

One additional computer was used as a server to host the scenario, this also provided the Wizard of Oz station with admin functionalities, such as displaying a map of the geographical area in which the units moved. This was used by the one controlling the TGCU as a

navigation tool as well as to mark enemies on the map, simulating the TGCU’s ability to identify and report movements of spotted enemy troops and vehicles.

After the first session, a group discussion was held with the IFV crew to discuss the feasibility of the simulation and implementation of the TGCU. After each session, a semi-structured interview was conducted with the IFV commander to follow up on what had been observed. Before the interviews were initiated, the IFV commander was presented with informed consent form (Appendix 2 – Informed consent form). These interviews detailed how the IFV commander perceived the interaction with the unmanned system; other potential ways of interaction; trust towards the TGCU; the received feedback and how this could be improved. Neither the group discussion, nor the interviews were recorded.

3.5. Analysis

The case study was of qualitative nature, collecting data through observation, video recordings and interviews.

3.5.1. Observation

The scenario runs were observed at the Swedish Defence Research Agency. Observation as a skill has been described as: “the act of perceiving the activities and interrelationships of people in the field setting through the five senses of the researcher” (Angrosino, 2007, p.37). Observation may begin as soon as the researcher enters a field setting, whereupon the

researcher records what is observed in as much detail as possible, while striving to interpret as little as possible (Angrosino, 2007). Field notes are the traditional form of data collection, these are usually produced immediately after an event has been observed (Howitt, 2010). The

(28)

observer may use other forms of data collection such as group discussions, semi-structured interviews, and video recordings to complement the observation (Howitt, 2010).

The researcher can take on different roles during the process of observation, which may affect the data collection (Angrosino, 2007). These roles include total or complete participation,

total or complete observation, participant as an observer and observer as non-participant.

The role of total or complete participation sees the researcher as a full member of the organisation or group being studied, without disclosing their role as a researcher to the other group members. Partaking in total or complete observation as a researcher entails being as detached as possible from the field setting, leaving the researcher to observe without interacting with the people of study. Being a participant as an observer involves the

researcher partaking in activities and interacting with the ones being observed, whilst being recognised as a researcher. Observer as non-participant refers to researchers whose role are known to the people of study, where the researcher observes without actively engaging in the groups or organisations activities (Howitt, 2010). During the scenario runs, the researcher took on the role of observer as non-participant, producing field notes of events involving the TGCU in a semi-structured observation protocol. Overall events were noted and marked for further inspection as the scenario runs were recorded with a video camera.

3.5.2. Video

The scenario runs were recorded with a video camera, in which the IFV commander’s point of view was the focus of the video recording. Video can be used to record what people actually do, capturing actions, communication and activities as they naturally emerge within the constraints of the context. Analysts can therefore use video in order to gain an immediate depiction of the context, actors’ use of the context, constraints that dictate action, and various

mistakes, miscues or errors (McNeese, 2004). The use and reuse video material enables the

analyst to structure data in a way that helps to inform why users succeed or fail in complex settings (McNeese, 2004). Furthermore, video provides a temporal-spatial record that presents the analyst with a sequential timeline of scenes and events as they occur over time (McNeese, 2004). The video material from the scenario runs were transcribed into an observation

protocol containing four columns: time, context, what happened, and comments. The observation protocol detailed events involving the TGCU in various ways.

3.5.3. Qualitative interview

Interviews of qualitative nature were conducted with each IFV commander, in order to discover personal experiences (Howitt, 2010) and capture expert knowledge in their own words (McNeese, 2004). Qualitative interviews (i.e. semi-structured) rely upon a simple structure consisting of the most essential questions, leaving the option of expanding the interview to thoroughly explore the subject. This may entail adding new questions, or entire topics if they seem suitable. Consequently, the interview guide may change after an interview depending on which questions and topics that were probed (Howitt, 2010).

In qualitative interviewing, the researcher strives to explore how the interviewee thinks by acting as an active listener while encouraging rich and detailed replies (Howitt, 2010). The interview guide should therefore consist of open questions – i.e. questions that the interviewee cannot answer with a yes or no (Howitt, 2010).

(29)

Notes were taken during the interviews using a computer in order to record the interviewees answer to the questions. Everything could not be written down immediately as the researcher acted as an active listener and no audio was recorded. Hence, after the interview had been completed, the material was read over multiple times and the notes were expanded upon. A qualitative analysis was then conducted on the material, whereupon the notes were grouped into categories reflecting the collected material. The categories were then revised to match what had been observed during the scenario runs.

3.6. Research ethics

This study complied with the four ethical requirements of research, which is the requirement of information, the requirement of informed consent, the requirement of confidentiality and the requirement of utilisation (Vetenskapsrådet, 2002). These requirements were presented for the interview participants in the informed consent form (Appendix 2 – Informed consent form) which was then signed.

(30)

(31)

4. Results of the Case Study

The following section presents the case study’s results, divided into subsections containing observational data from each run through of the scenario, in addition to data from the group discussion in the first scenario session. Two complementary interviews, conducted with the participants acting as IFV commander, supplements the observational data. A total of two hours of video material was analysed, in addition to two hours of interview material.

4.1. Scenario run 1

The run lasted for 28 minutes 19 seconds whereupon the IFV was eliminated, the IFV crew did not manage to take Spång before this occurred. The TGCU was eliminated after 19 minutes 23 seconds.

4.1.1. TGCU usage

In the first run, the TGCU was mainly used to scout larger areas ahead of the IFV. The IFV commander adopted a hands-off approach, leaving it to complete its task without interfering. The various areas that the TGCU was to explore were placed on the map by the IFV

commander (Figure 7).

The excerpts below show examples of the IFV commander issuing orders to the TGCU after having marked an area on the map (Table 4 and 5).

(32)

Table 4. IFV commander issuing scout orders to the TGCU

Time Context What happened

06:24 The IFV commander initiates communication with A1

The IFV commander says: ”Speech Control” while looking at the quick reference guide with available voice commands

06:29 The IFV commander gives A1 new instructions

The IFV commander: ”A1 scout in the red area”

Table 5. IFV commander issuing scout orders to the TGCU

16:09 The IFV is standing still, the gunner is on lookout. A1 is standing still and awaiting further instructions

The IFV commander places a green area on the map

16:45 The IFV has moved and taken a fighting position. A1 is still standing still and awaiting instructions. The IFV commander initiates communication with A1

The IFV commander: ”Speech Control”

16:49 The IFV commander gives A1 new instructions The IFV commander looks at the quick reference guide and says: ”Adam 1 scout the green area”

There were instances during the first run where the IFV commander would sit with the map on the left screen and alternate between looking at the map, which displayed the movement of the TGCU in real time, and the right screen displaying the TGCU’s point of view (Table 6). This way, the IFV commander could monitor the movements of the TGCU.

Table 6. IFV commander monitoring the TGCU's movements

17:13 – 18:18

The IFV is still in a fighting position. A1 is moving towards the green area

The IFV commander’s focus alternates between looking through A1’s perspective and the map to see its movement

Insights/Summary

During the first run, the TGCU and its autonomous abilities was utilised to scout designated areas and report any identified movements, an intended way of implementing the TGCU in a battle mission. However, the TGCU was easily forgotten whenever it had completed its task, leaving it to stand idle in sometimes compromising positions where it could be spotted more easily by enemy units. This could be the result of insufficient experience conducting missions with unmanned vehicles.

4.1.2. Interaction

The IFV commander controlled the TGCU only by using voice commands (i.e. manual control was not used), consulting the quick reference guide several times when giving the

(33)

TGCU new instructions. The IFV commander expressed in the interview that it felt easy to control the TGCU with voice commands.

It was observed that during the first run, the IFV commander never interacted with the TGCU while the IFV was on the move (Table 7).

Table 7. The IFV commander interacting with the TGCU

Time Context What happened Comments

00:56 The simulation has just begun, the IFV is stationary. The IFV commander initiates

communication with A1

The IFV commander: ”Speech Control” 01:00 The IFV commander gives

initial instructions to A1

The IFV

commander: ”A1 follow the road”

The IFV commander looked at the quick reference guide with available voice commands

This observation was confirmed in the interview, where the IFV commander mentioned only controlling the TGCU while the IFV stood still. This was brought up in relation to perceived workload, which the IFV commander stated was low during the run. However, the IFV commander noted that this was probably because everything else came to a stop when interaction with the TGCU took place.

The interaction sequences often began with the IFV being ordered to a halt by the IFV commander, and the gunner was ordered to keep a look out in a certain sector, whilst the IFV commander made use of the map to mark an area for the TGCU (Table 8).

Table 8. Stopping the combat vehicle in order to interact with the TGCU

Time Context What happened Comments

15:12 The IFV has stopped. A1 is still stationary and awaiting instructions

The IFV commander opens up the map and says to the crew: ”We’re going to make use of the UGV for a while”

The IFV commander could have perceived the feedback earlier, but makes it explicit that new instructions are to be given to A1

Insights/Summary

During the first run, the IFV commander consulted the quick reference guide frequently when issuing voice commands to the TGCU. This could be an indication that the available voice commands were either difficult or non-intuitive to use when interacting with the TGCU. However, this could be a question of experience, where the quick reference guide is only necessary during the early stages of learning how to interact with an unmanned system. The perceived need to only interact with the TGCU while the IFV were stationary, might be an indication that the cognitive workload would become too high if the IFV commander were to interact with the unmanned system while the IFV was on the move. Again, this could be a matter of insufficient experience, resulting in increased workload while interacting with the TGCU.

Exploring interaction with unmanned systems : A case study regarding interaction with an autonomous unmanned vehicle