Giving the Audience a Perspective:

(1)

Giving the Audience a Perspective:

Investigating the opportunities and challenges of introducing independent audience controlled cameras to a collaborative virtual environment JACOB TÄRNING

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

(2)

audience controlled cameras to a collaborative virtual environment

JACOB TÄRNING

Master’s Programme, Computer Science, 120 credits Date: January 19, 2021

Supervisor: Mario Romero

Co-supervisor: Ingemar Markström Examiner: Tino Weinkauf

School of Electrical Engineering and Computer Science Swedish title: Ge Publiken ett Perspektiv:

Swedish subtitle: En undersökning av möjligheterna och

utmaningarna av att introducera självständiga publikkontrollerade kameror till en kollaborativ virtuell miljö

(3)

(4)

impact of adding audience-controlled cameras to the two parties’ cooperation was investigated. This was done through a user study where the participants were cooperating as a Virtual Reality user and an audience to find hidden objects. The study was conducted with two prototypes. In the first prototype, the audience was only able to see a mirrored view of the immersed user. In the second, independent smartphone-controlled cameras were added to the disposal of the audience. The results of the user study indicate that the second prototype was overall less practical than the first in effectiveness, efficiency and satisfaction. However, in certain scenarios where the number of objects within a scene made verbal communication inefficient, the second prototype prevailed, indicating its potential in more visually complex scenes.

Keywords

Asymmetric, Audience, Collaboration, Cooperation, Presenter, Virtual Reality

(5)

Sammanfattning

I det nuvarande fältet av samarbete inom Mixed Reality finns det i skrivande stund en tydlig avsaknad av forskning angående samarbete mellan en fördjupad Virtual Reality-användare och en icke-fördjupad publik. Därför undersöktes effekten av att inkludera publikkontrollerade kameror i de två parternas samarbete. Det gjordes genom att utföra en användarstudie där de två parterna samarbetade som Virtual Reality-användare och publik för att hitta gömda objekt. Användarstudien genomfördes med två prototyper. I den första prototypen kunde publiken endast se en speglad bild av den fördjupade användarens synfält. Detta kompletterades av enskilda smartphonekontrollerade kameror avsedda för publiken i den andra prototypen. Användarstudiens resultat indikerar att den andra prototypen generellt presterade sämre än den första inom områdena effektivitet, verkan och tillfredsställelse. Dock, i vissa scenario när antalet objekt i en scen gjorde verbal kommunikation ineffektiv visade sig den andra prototypen vara bättre, vilket indikerar dess potential för mer visuellt komplexa scener.

Nyckelord

Asymmetrisk, Publik, Kollaboration, Samarbete, Presentatör, Virtuell Verklighet

(6)

List of Figures

1 Reality-virtuality continuum . . . 4

2 Session room . . . 13

3 Voand Vgsessions . . . 14

4 V_o, simplified . . . 17

5 Vg, simplified . . . 17

6 Vgphone view, simplified . . . 18

7 Simplified prototypes, overhead view . . . 18

8 Target highlighting . . . 19

9 Simplified Presenter and Audience scenes . . . 20

10 VR example view . . . 21

11 Network diagram . . . 22

12 Mobile view 1 . . . 22

13 Mobile view 2 . . . 23

14 Presenter avatar . . . 24

15 Audience avatar . . . 24

16 Gaze indicator . . . 25

17 Average scores . . . 29

18 Normalized average scores . . . 29

19 Individual scores . . . 30

20 Selection and marking error rates . . . 31

21 Individual selection error rates . . . 32

22 Average Vg Audience subtask completion rates. . . 35

23 Individual Vg Audience subtask completion rates . . . 35

24 Average stage completion times. . . 36

25 Individual stage completion times . . . 37

26 Mean main task completion times . . . 38

27 Individual main task completion times . . . 38

28 Vosubtask completion times . . . 40

29 Vgsubtask completion times . . . 40

30 Individual Presenter subtask completion times . . . 42

(9)

31 Individual Audience subtask completion times . . . 42 32 Subjective workload. . . 43 33 UMUX scores . . . 44

(10)

List of Tables

1 Stage values . . . 16

2 Score . . . 30

3 Error rate . . . 32

4 Marking error rate . . . 33

5 Stage completion rate . . . 33

6 Individual stage completion results . . . 34

7 VgAudiences subtask completion rates . . . 34

8 Stage completion times . . . 37

9 Mean main task completion times . . . 39

10 Presenter subtask completion times . . . 41

11 Audience subtask completion times . . . 41

12 Subjective workload. . . 43

13 UMUX scores . . . 45

14 Individual statistics . . . 45

(11)

List of acronyms and abbreviations

AR Augmented Reality FoV Field of View FPS First Person Shooter HMD Head Mounted Display HUD Heads Up Display LAN Local Area Network MR Mixed Reality

NASA-TLX NASA Task Load Index PDA Personal Digital Assistant RTLX NASA Raw Task Load Index SUS System Usability Scale

UMUX Usability Metric for User Experience VR Virtual Reality

(12)

Introduction

From a novelty seemingly right out of a science fiction movie, the field of Virtual Reality (VR) has seen much progress in recent years. This figurative explosion has not only led to commercial products like VRChat [1] and Beat Saber [2], but to applications within a wide range of fields. Examples include, but are not limited to, engineering [3], healthcare [4,5,6] and virtual training [7, 8]. The benefits of VRsimulations as opposed to real life situations can be many. One such situation is performing risk-filled moments in a simulated environment, thus reducing the associated risk. For example, a surgeon can practise on virtual patients before carrying out the real procedure [4], or an astronaut may prepare for potentially fatal scenarios, like a fire emergency in a moon base, in the safety of a lab on Earth [7].

One more advantage that the virtual has over the real is no requirement for co-locationality. Normally, to share an environment users would have to be physically present, but if the environment is virtual users can work remotely and still be digitally immersed in a shared environment. While the issue of remote work can and has been addressed with numerous solutions like video conferences, these solutions are not optimal for all forms of tasks. Research has shown that virtual reality improves task precision [9, 10] and completion times [9, 11, 12] for activities involving 3D models and interactions, like assembling a water pump [12], or teaching a remote colleague a 3-dimensional path [9]. While it may be possible that video conferencing and other solutions are superior methods under some conditions, it is clear thatVRoffers something that the other alternatives lack.

Given the potential of remoteness inherent in VR technology, it is no big surprise that ways to utilize it have been studied thoroughly. Especially the specific subset collaboration when working remotely in virtual reality.

VRopens new avenues for the users to interact, avenues normally constrained by a default desktop setup. It is, for example, very hard for a normal desktop user to replicate the freedom of movement a VR user has with the nowadays common Head Mounted Display (HMD) and two controller setup. Complementing VR with other technologies like depth sensors and

(13)

Augmented Reality (AR) only makes this more apparent. Through video streams, a remote VR user can see the local area, and the local user may perceive the remote user with AR. Add depth sensors to that, and now the remote user can gesture to the local with his or her hands. Suddenly, it is almost as if the remote helper is in the room with the local user to cooperate.

It is perhaps due to this inherent ability and the potential applications enabled by it that much of the focus has been remote cooperation. However, comparatively little research has been done regarding cooperation in VR in a co-located environment. Even less has been studied when collaboration between immersed and non-immersed users is concerned. It should also be noted that the terms VRuser and immersed user will be used to refer to the same thing from now on, i.e. someone being immersed in and using VR.

Nevertheless, this is not a subject that should be overlooked. Consider a lecture or presentation held in VR. Providing every audience member with a VR HMD for it is most likely unfeasible for the vast majority of people, especially when the size of the audience gets larger. The obvious solution would be to just mirror the lecturer’s or presenter’s view on a screen, but this limits the ways of interaction between the audience and the immersed user to being almost strictly verbal. Visual cues like gaze and pointing are lost when one of the parties is totally immersed and the other is not. This presents a challenge when the audience needs to direct theVRuser’s attention to something in the virtual space, which leads to the subsequent research question:

In a collaborative environment where an audience and a Virtual Reality presenter are cooperating to find points of interest in a Virtual Environment, what are the opportunities and challenges of combining a first-person and multiple first- and third-person views, each controlled by the presenter and the audience respectively, for the task of increasing presenter and audience collaboration?

Followed by the hypothesis:

Introducing third-person views controlled by each audience member will noticeably increase the cooperation between the two parties compared to a setup where the audience is limited to the presenter’s mirrored view.

The metrics of measuring these differences are discussed further in- depth under theCollaborationandMetricssections.

(14)

Related Work

In this chapter, background regardingCollaborationand Reality-virtualityis described. Related work done in the field is summarized underCooperation in Mixed Reality. Lastly, two surveys used in the study are presented inNASA Task Load IndexandUsability Metric for User Experience.

Collaboration

Collaboration is defined as “the process of two or more individuals working together to solve a common problem or achieve a common goal” by Thomsen et al. [13]. While other words like cooperation and teamwork might not carry the exact meaning as collaboration, they will nonetheless be used interchangeably in this paper from now on.

The main task of this study is to measure the cooperation between a VR user and an audience operating their own independent cameras while cooperating to find points of interest in a Virtual Environment, and how it may be improved. To evaluate cooperation within a shared system the three criteria of effectiveness, efficiency and satisfaction are proposed by Gutwin and Greenberg in their study on collaboration mechanics in shared-workspace groupware [14]. Effectiveness tackles whether the collaborative task was completed or not, and any associated error rate. Efficiency denotes the amount of resources such as time and effort used to finish the activity, and satisfaction, how happy the participants are about the process and its outcome. Satisfaction may, however, depend on the two earlier criteria, as noted by Gutwin and Greenberg. If effectiveness and efficiency are low, then satisfaction is probably low as well, and vice versa. The methods used to evaluate these three criteria can be found in .

(15)

Reality-virtuality

The Reality-virtuality continuum is a concept introduced by Milgram et al. [15], which describes the relationship of real environments and virtual environments as extrema on a scale instead of opposing antitheses. An example of the continuum can be found in Figure 1. Mixed Reality (MR) is a broad term that covers most of this continuum, except the endpoints of said scale. MR is also further defined by them as “one in which real world and virtual world objects are presented together within a single display”. The researchers however note that their discourse mainly concerns visual displays.

Figure 1 – Based on the continuum introduced by Milgram et al. [15]

Another way to use the phrase MR is when referring to systems which mix technologies from different parts of the continuum, for lack of a better term.

While individual items in such systems might not be covered by the broad concept that isMR, it is a useful term nonetheless. The real and virtual may not be mixed on the same screen, but in the total experience.

Cooperation in Mixed Reality

Remote Cooperation in One Medium

Cooperation in VR and other media on the Reality-virtuality continuum is a widely researched subject, especially in recent years due to the advent of readily available VR technology. A common theme among much of the research done in this field is remote collaboration, usually in the form of a local user performing a task with the guidance of a remote expert. The effectiveness of an immersive virtual environment collaboration was explored by Coburn, Salmon and Freeman [9] by letting pairs of participants teach each other in VR or over video conferencing calls how to draw certain 3D paths. They report thatVR was the preferred medium of teaching and that

(16)

al. [16] when moving virtual objects in an enclosed area. From analysing the participants’ conversations they found that both parties tended towards operator centric language. They also report a higher mental workload for the operators.

Multiple perspectives as an enhancement to VR collaboration was explored by Park, Kapoor and Leigh [17] when working with oceanic data from Chesapeake bay. During sessions they noted issues with avatar occlusion, gesture misinterpretation during menu usage, a lack of shared intent and interface state, interference of each other’s work in a shared environment and longer time to synchronize views from separate, local environments.

The collaboration between users of either VRor ARdevices was compared by Müller et al. [18]. Their experiment where participants cooperated to find virtual blocks sharing symbols was done in both a co-located and remote environment. The independent variable was whether the users used VR or ARdevices. They report some pros stated by their participants regarding both technologies, like more reference points inARand less visual clutter inVR, that there was no significant difference in workload between conditions, and that co-locatedVRusers performed better in the experiments.

Onebody [10] was proposed by Hoang et al. as an alternative for remote collaboration in the field of posture guidance, comparing it to other mediums such as video conference calls and pre-recorded videos. Their solution was reported as having higher upper limb accuracy than the alternatives and enabled bi-directional feedback, but also had the longest task completion times.

Remote Cooperation in Multiple Media

Another common trend is to also mix AR and VR. Here, the AR user usually acts as the local agent, while the VR user acts as the remote expert.

Piumsomboon et al. [19, 20, 21] explored the cooperation between ARand VR users in multiple studies. In the first, they investigated how a visible

(17)

Field of View (FoV) and gaze affected performance when identifying and moving virtual targets. They found that users with FoV- and gaze-enabled interfaces performed better than those without them [19]. In the second, the authors investigated the effects of having a minimized virtual avatar of the remote VR user kept in the local FoV when looking away from the remote user. Their results indicate that the inclusion shortens task completion times and heightens feelings of social presence [20]. In the third study, connecting the remoteVRuser to a wide lens camera was explored in four experiments.

These experiments were including a virtual avatar and frustum of the remote user, the avatar’s dependence on camera rotation, theVR user’s dependence on camera rotation, and avatar placement. Here it is reported that both a frustum and an avatar had a positive effect on their results, but the former more than the latter, a preference by the local AR user to have complete control over the avatar’s orientation, a preference by the remote VR user to have their view independent of camera rotation, and the preferred position of the avatar was at the shoulder [21]. The shoulder being the preferred placement of a remote helper is supported by Cai and Tanaka’s [22] study on MRshoulder-to-shoulder collaboration.

Gao et al. [23] investigated how different inclusions of the local view of an area reconstructed for the remote user as a point cloud affected the participants’ performance. They found no statistically significant differences between a mirrored Heads Up Display (HUD) view, a view attached to the virtual local user and no local view. They do note, however, some of the pros and cons of the respective modes, like straightforward progress checking on a smaller, mirroredHUDview, and a bigger and clearer view when attached to the local user’s position.

Kolkmeier et al. [11] compared a similar set up of a local AR user with a remoteVRexpert seeing spatial meshes generated from point clouds capture from the local area. In the baseline condition, the VRuser was able to move around and be visible to the AR user, which was compared to an AR-dependent view, a remote desktop view and no remote user embodiment.

They found that their proposed method was generally superior, but note a few interesting discoveries. One was that participants who reported high familiarity with video games had shorter task completion times on the desktop setup, remote users from the same group that moved around more also further decreased task completion times, and a lack of a significant decrease in performance between the proposed method and that condition without virtual

(18)

tasks quicker than those with an eye gaze. Head gaze was also reported as more stable by the participants [24]. In the second study, visualizing the remote user’s hands were compared to a more traditional annotation system for desktop setups, and their results indicate that the former was more useful than the latter [12].

Kim et al. [25] studied how different feature sets impacted the performance of a local AR user assembling LEGO™, arranging tangrams and folding origami with the help of a remoteVRuser. The explored sets were visualized hands of the remote helper only, hands and pointing, hands and sketching and all three features together. It is reported that the participants’ preferred conditions were those with the sketch tool included, and that the tool also improved their results in most tasks, with some exceptions. The pointer, however, had no significant impact on the parties’ remote collaboration. It was also reported that the extra tools also increased the mental workload of the participants.

Co-located Cooperation in Mixed Reality

Research on co-located virtual cooperation is a bit more sparse, however.

Learning and collaboration inVRwere studied by Jackson and Fagan [26] by letting three classes of ninth-grade students explore a virtual replica of Seattle and investigate the effects of global warming on the city. They found that collaboration increased engagement in the virtual activities, which in turn was enabled by the inclusion of a voice chat. Moreover, it was also noted that the voice chat and virtual presence helped the instructors advise the participants, and that a small number of the students were afflicted by some sort of malaise.

While the authors do not name it explicitly in their report, it is assumed they describe the well-known phenomenon of cyber sickness [27].

A user study with an ethnographic approach was conducted by Khadka et al. [28] to investigate how co-located users collaborated when exploring and analysing datasets inVR. From it, they determined that it is important to maintain spatial relations, communicative gestures and natural interactions.

(19)

These three affordances were the most effective means of communication between collaborators.

As can be seen by the above examples, most research focuses on cooperation between a limited number of users. This is a stark contrast to Ahn et al. [29]

where they studied the challenges and issues of implementing interactivity for a larger audience, in this specific case, aVRtheater audience. One issue highlighted is the mapping of interactable objects to the audience, noting the infeasibility of matching every user with a controllable object due to the limited way of distinguishing them. Instead, they advocate for objects controlled by groups of users through various methods, such as statistics and first-come-first-served.

Taking a different approach towards VR and audiences, Terrier et al.

[30] studied the effects of social inhibition caused by both a co-located and a remote audience to aVRuser, on said user’s performance. They found that a remote audience negatively impacted the results more than a co-located one.

In the same vein, another uncommon approach to the subject of cooperation in MRis the collaboration of immersed and non-immersed users. As hopefully demonstrated by earlier sections, most research presented in this report tends to either focus on cooperating users immersed to some degree, or completely non-immersed users. These two categories are rarely combined in the presented studies. The works of Furukawa et al., Gugenheimer et al.

and Chan and Minamizawa [31, 32, 33] all focus on this specific form of cooperation. TeleSight [31] utilizes both a dome screen displaying 180-degree view of the virtual world which extends the field of view of the immersed participant, and an interactable robot head mimicking the rotation of the HMDon the users. FaceDisplay [32] instead used three touch screens and a depth camera mounted on anHMD that enabled non-immersed users to see the virtual world and interact with it through touch and gestures. FrontFace [33] combined eye tracking with an extra external touch screen mirroring the user’s eyes and their virtual view when immersed and also used touch and gestures as interaction methods, extending it with voice recognition.

Gugenheimer et al. found that there was a significant imbalance between immersed and non-immersed users in favour of the latter when testing their hardware with a cooperative and a competitive game. They attributed this to the situational blindness of theVRuser in regard to the real world compared to their counterpart. They also report a higher discomfort by the VR user

(20)

In their study on MR collaboration in a museum Brown et al. [34] let groups of three users explore a museum through three different media: a physical visit, VR and through a Web browser. Communication was done with audio over a video conferencing system and sharing a map. The on-site visitor was tracked by a handheld Personal Digital Assistant (PDA) used to view said map. They noted how locational awareness helped the participants to coordinate and navigate the museum.

NASA Task Load Index

The NASA Task Load Index (NASA-TLX) was developed by Hart and Staveland [35] as a general evaluation tool for subjective workload. It was a result from a research project conducted by the NASA-Ames Research Center with the goal of finding which factors influenced variations in subjective workload assessments between participants taking part in various similar and dissimilar experimental assignments and exercises. This tool has been used to determine the workload imposed by a wide variety of tasks, situations and settings, ranging from Air Traffic Control and aircraft cockpits to power plant control rooms and computer usage [36].

The survey is to be either administered during or directly after a task has been completed, and consists of two parts. The first part is a six item survey. Each item is scored by the surveyed on a scale from 0 100, with five point increments, resulting in a 21-point scale. These six items concern mental, physical and temporal demand, effort, performance and frustration.

The second part consists of 15 pairwise questions, one for each possible pair of the six previous items. The surveyed is asked to pick the item which contributed more towards the perceived workload. The answers from the second part are used to calculate the weights for each individual item, which are then summed together to generate a final workload score.

A common modification done to the NASA-TLXis to remove the weighting process in favour of being more simple than the original to administer. This version is sometimes known asNASA Raw Task Load Index (RTLX)[36]. It

(21)

remains inconclusive whether theRTLXis an acceptable replacement, since studies comparing it with the original survey have found it to be both more, equally and less sensitive than theNASA-TLX[36].

Usability Metric for User Experience

TheUsability Metric for User Experience (UMUX)is a four-item seven-point Likert scale survey used to assess a system’s usability subjectively [37]. It was designed by Finstad to yield similar results as theSystem Usability Scale (SUS) [38] under the same conditions. Another reason behind its creation was as a response to some of the perceived shortfalls of the SUS, lack of compactness and usage of five-point Likert scales being pointed as two of them. Furthermore, the survey was developed with the ISO 9241-11 standard in mind, which formalizes the term usability [39]. Said standardization includes the three criteria effectiveness, efficiency and satisfaction. This is particularly fitting since they match with the three criteria for cooperation proposed by Gutwin and Greenberg [14].

The 4 items included in the survey are the following:

1. [This system’s] capabilities meet my requirements.

2. Using [this system] is a frustrating experience.

3. [This system] is easy to use.

4. I have to spend too much time correcting things with [this system].

Strongly disagree and strongly agree are defined as the extremes of all four scales. The bracketed text in the survey items is supposed to be replaced with the name of the system in question being tested. Similarly toSUS, odd items are scored as score 1 while even are as 7 score. The finalUMUXscore is then calculated as

U = 100P

x2Sx 24

where S is the set of all the scores and U the final score, which ranges from 0 100, just likeSUSscores. All the final scores generated by the surveyed population may then be averaged out to generate a mean usability score.

(22)

Method

In the following sections, the method of the study is described. Information regarding the conducted user study can be found in User Study with details on Participants and Precautions against covid-19 under their respective subsections. An in-depth description of the testing environment can be found inPrototypes. Finally, all tracked metrics of the user study are detailed in the Metricssection.

User Study

To adequately answer the research question, a two-condition, between-subject controlled user study was performed. Participants took part of the study in triplets. Under each condition, one participant out of the group of three took the role of Presenter, with the remaining two assuming the role of Audience.

In the first condition, dubbed Verbal Only, (Vo), the first prototype was utilized. This condition was designated as the control group. The Audience was only able to see the mirrored view of the immersed Presenter. In the second condition, the Audience received smartphones to control virtual observers. The point of view of the virtual observer was rendered on the screen of the phone. This was to enable them to freely move around, point and interact with the virtual environment. Dubbed Verbal + Gaze (Vg), the second condition was the proposed solution to increase Audience and Presenter collaboration, in accordance with the research question.

In the study, the participants were tasked with finding as many objects hidden in the virtual environment as possible, under a time limit of five minutes. This was repeated in six different stages, with increasing difficulty.

The first level of difficulty served as a tutorial. However, the stage was still considered to be part of the actual experiment. Thus, data was still tracked and included in the final results. Stages were changed when either all targets

(23)

were found or time ran out, whichever happened first, with clear indicators for whichever had occurred.

During the trials, information regarding the hidden objects were unevenly divided between the Presenter and the Audience. This was to enforce unequal access to the necessary information required to complete the task, thus ensuring cooperation between the two parties. In the experiment, targets would become visible to the Audience one by one by pulsing between its original colour and black. Finding the target was the Audience’s first task.

The second was to guide the Presenter to select the target, with the next target becoming visible to the Audience after selecting the current one. Selecting the current target was the Presenter’s only task. Audience members in the Vo

condition were only able to guide the Presenter through verbal interaction of what they saw through the mirrored view, while members of the Audience in the Vg condition were also able to indicate the location of the hidden object with both their line of sight vector and avatar position in the virtual space. No video or audio were recorded during the trials. The main reasons behind this decision can be found in .

Audience members of the Vg condition also had access to a marking button to mark objects for other Audience members. This function was mainly used to separate task completion times and record error rates of the Audience. Members of the Audience in the control group lacked this option, so a button to be pressed when the Audience spotted the target was used instead to separate task completion times. However, this was deemed a potential vector for covid-19 due to forcing the Audience members to sit closer together and introducing a common area touched by multiple participants.

To remedy this, the researcher conducting the study operated the button, observing the Audience and pressing it when the Audience members showed signs of noticing the current target. This decision was made to cut down on development time and the ease of implementing the chosen solution.

Participants

Participants were students and faculty recruited in groups of three. Users formed these triplets at their own discretion. The students were mainly recruited from the course Advanced Graphic and Interaction given at KTH, with some exceptions. Students taking said course were offered extra credits

(24)

Figure 2 – Overview of the area where the user study was conducted. Areas of interest in the figure are highlighted as following: (A)VRinteraction area, (B) Audience seats and zone, (C) TV with the mirroredVRview, (D) base stations and (E) router used in the Vg prototype.

in it as compensation for their participation. Participants were required to not be colour blind and have binocular vision. The users were not pre-screened for basic communication and collaboration skills. In total, 18 participants took part in the study. The age of the members of the user group ranged from 20 to 52, with a mean age of 27.5 and a median of 25. One participant (5.6

%) did not report age or gender. About 16.7 % of the participants identified themselves as female, 77.8 % as male, and 0 % as other. Each triplet of participants were randomly assigned to either the Vo condition or the Vg

condition, with an equal amount of triplets assigned to Voand Vg, respectively.

Upon arrival, the participants were read one of the scripts from appendix A detailing the goal of the study, their assigned task, the mechanics of the environment and the tools available to them. They were also instructed that the trial could be paused if desired, such as if the presenter were to develop cyber sickness. Then, the participants were allowed to decide among

(25)

(a) Example of a typical Vo session.

The Audience members in the background can be seen guiding the immersed Presenter in the foreground.

The phone in the furthest away Audience member’s hand is not part of the experiment.

(b) Example of a typical Vg session.

Here the Audience members are interacting with both the Presenter through their smartphones and verbally amongst themselves.

Figure 3 – Vo and Vg sessions.

themselves which roles, immersed Presenter or non-immersed Audience, to assume. After picking roles, immersing the Presenter and handing out phones where applicable, the participants were allowed some time to familiarize themselves with the virtual environment. The trials were then initiated after verbal confirmation regarding their readiness. The participants, especially the Presenter, were allowed to take breaks from the trial when desired. This was mainly to prevent cyber sickness and eye fatigue. Upon completing or retiring from the trial, all participants were asked to complete two surveys, the RTLX and UMUX surveys, extended with a few extra items for more information. These were about age, gender and priorVR/First Person Shooter (FPS) experience and can be found in appendix B. All Presenters were queried about their prior VR experience, but only Audience members part of the Vg condition were asked about their prior FPSexperience. Prior VR andFPSexperience were tracked to account for skilled participants possibly performing better. The RTLX was administered using the NASA-TLX app for iOS [40].

Precautions against covid-19

Since the user study was carried out in a co-located environment during a global pandemic, measures were taken to reduce any potential vectors of covid-19 for the participants. VR headsets were switched out between trials

(26)

the designated area. The area where the study was conducted can be seen in Figures2and3.

Prototypes

Two prototypes were developed in parallel to provide a testing environment for the study. The simplified prototypes can be seen in Figures 4,5, 7and9.

The virtual environment was shared between the two prototypes, consisting of an empty rectangular space with a cylindrical landmark in each corner, to which sets of geometrical shapes were added. The geometrical shapes used in the virtual environment were cubes, spheres, cylinders and tetrahedrons.

These four shapes were chosen due to being easily identifiable from each other.

The positions and relative sizes of these shapes were sampled from the equation sin^xy₁₀cos ^y₂ + 1.5 and its derivative respectively, where x, y 2 [ 10, 10]. This equation is a cosine function which is simple to control. It also generates a field with numerous hills and valleys for the used intervals, which later served as natural occluders in the levels. The aforementioned function was sampled with a few different step sizes m to generate all stages except the tutorial stage. All objects in each set were also scaled by a fixed value n for every stage to either avoid objects intersecting in the more crowded sets, or fill space in the less dense ones. The shapes were coloured with different hues to provide more visual diversity. The used colours were yellow, orange, light red, violet and indigo. These colours were chosen to avoid misidentification due to colourblindness, which ended up as a moot point. This was due to participants ending up being required to see all colours, as stated in . Saturation was determined by the objects’ relative height. Examples of the virtual environment can be seen in Figures8,10,12, 13,14,15,16aand16b.

The Presenter was able to explore this area, with the chief method of locomotion being teleportation. The second tool available to theVRuser was a selector. The selector would try to select whatever the user was pointing at with the correspondingVRcontroller, as indicated by a laser pointer attached

(27)

Stage Step Size m Scale Factor n Nr. Objects Hidden Objects Occluder Ratio

1 4 3 25 25 1/24

2 2 1.5 100 30 1/99

3 1 1 400 30 1/399

4 0.5 0.5 1600 30 1/1599

5 0.25 0.25 6400 30 1/6399

Table 1 – Stage generation values and statistics. Occluder Ratio denotes the ratio between the number of occluders and the current hidden object in each stage. The tutorial stage is omitted since it was created manually

to said controller. Haptic feedback was used when selecting. The former tool was mapped to the left controller’s trigger button, while the latter was mapped to the right’s. A compass in theVRuser’sHUDwas also available to aid with navigation in the virtual environment. TheHUDcompass is visible in Figures8,10and15.

In each stage, a certain amount of targets were hidden in plain view.

These targets would become visible to everyone, except the Presenter, one at a time by pulsing black. Whenever the Presenter managed to select the current target the next one would become visible. Playing an audio cue when selecting the current target was done to further emphasize a correct selection.

There was a strict time limit of 5 minutes to find all targets in all stages. If all targets were found before running out of time, a visual cue of exploding confetti was played. If they were not, a notice would appear on the Presenter’s HUD indicating time was up. All remaining targets were also disabled, rendering the participants unable to progress any further in the current stage.

A counter and timer tracking targets and time left respectively were also visible.

V

o

Prototype

In the first prototype, a mirrored view of the Presenter’s perspective was presented on a big screen for any onlookers to see. The pulsing targets were only highlighted on this screen, as shown in Figure 8. Targets were never highlighted on the VR HUD. They were visible, but not highlighted. The main difference between the two prototypes was in the affordance created by each setup.

(28)

Figure 4 – Simplified example of the Vo condition. Agents are labeled as the following: (A) Presenter, (B) Audience. Notice how the current target is highlighted in black on their screen.

Figure 5 – Simplified example of the Vg condition. Here, the Audience members’ observer avatars (C) can be seen. In contrast to Figure4, the current target is not highlighted on the TV screen, but on the smartphones used by the Audience members.

(29)

Figure 6 – Simplified phone view in the Vg condition of the right avatar in Figure5. As with Figure4, notice how the current target is highlighted black in this view. Furthermore, the Audience member’s own gaze indicator is also extended fully and acts as a pointer for the owner.

(a) Overhead view of the simplified Vo

example. (b) Overhead view of the simplified Vg

example.

Figure 7 – Simplified prototypes, overhead view.

(30)

(a) Current target, as seen by the Presenter. The current hidden target is the muted yellow cube in the middle of the figure.

(b) Same current target, as seen by the VoAudience. Currently black in the pulse cycle high-lightning it. The right controller with the laser pointer was displaced during the photo shoot.

(c) Same current target, as seen by the Vo Audience. Currently yellow in the pulse cycle high-lightning it.

Figure 8 – Target highlighting.

(31)

(a) Vg Presenter scene, simplified. The following agents in the scene are labeled: (A) The Presenter, (B) Audience observers.

(b) VgAudience scene of Audience member B.1, simplified. Notice how B.1’s pointer extends all the way, compared to Figure9a

Figure 9 – Simplified Presenter and Audience scenes.

(32)

Figure 10 –VRexample view. Target is not highlighted in this view.

V

g

Prototype

The second prototype was extended with network support for smartphone devices.

The network diagram can be found in Figure 11. These smartphones were used by the Audience in the study, and the only place where the targets were visible in the second prototype. The smartphone version of the prototype allowed its users to move and look around freely, point with a laser pointer and mark objects for other non-immersed users. Marking objects would cause them to begin pulsing white. If the current target was marked, it would alternate between pulsing through black, white and its original colour. Furthermore, an indicator would also turn from red to green if the current target was marked. Figure 6 and 12 shows a simplified version of the mobile prototype and an overview of the phone interface, respectively. Audience members were visible to the Presenter and other members of the Audience as a floating eye with a partial laser pointer indicating their gaze, as seen in Figure 15. The gaze indicator was fixed length unless said indicator’s length was longer than the distance to the target from the point of projection. In that case, the shorter distance was used, essentially becoming a pointer. A full length pointer was considered during the pilot development, but was rejected as it was deemed as too great of an advantage of the Vg condition. The fixed length of the gaze vector was chosen to be long enough to be visible at a distance, but not too long, as shown in Figures16aand16b. If it was too long, it enabled Audience members to position themselves high above the stage and easily point out the current target for the Presenter. This behaviour was exhibited during the pilot when a full length laser pointer was available to the Vg pilot Audience. The alternative of a relative length gaze indicator was found to be distracting when quickly changing focus between objects in the foreground and background.

(33)

Figure 11 – Network diagram of theLocal Area Network (LAN).

Figure 12 – Mobile view. The circles on the periphery (A,B) of the view provide the controllers for the audience view. The left circle (A) controls position and the right circle (B) controls orientation. To move vertically the user swipes up and down anywhere on the screen. Swipe down moves you up and swipe up moves you down. Further points of interest are: (C) The marking button, (D) Indicator visualizing if the correct target has been marked, (E) Timer, (F) Score tracker and (G) Laser pointer.

(34)

participants, with the focus on general usability and other pre-identified topics.

This process was repeated until the prototypes were deemed good enough for the purpose by the participants, as indicated by the post-session reviews. The development was done with the Unity game engine [41], utilizing Steam’s SteamVR software development kit plugin [42]. An HTC ViveVRhardware set with extra headsets was used in the study in conjunction with Sony Xperia z5 model E6653 smartphones and a full HD 1080x1920 resolution tv screen.

Figure 13 – Mobile view, different perspective. The Presenter avatar can be seen near the green corner landmark.

(35)

Figure 14 – Mobile view of Presenter avatar.

Figure 15 –VRview of Audience avatar.

(36)

(a) Gaze indicator. The used length was chosen to make the indicator visible at distance and from multiple angles, but without turning into a pseudo-pointer at greater distances.

(b) Gaze indicator. When close enough to an object, the indicator functions as a pointer.

Figure 16 – Gaze indicator.

(37)

Metrics

For this study, the captured metrics of effectiveness, efficiency and satisfaction estimated the level of collaboration between Presenter and Audience. To do so, the following data points were collected:

Effectiveness

Task completion, score and error rate were used to determine the participants’

effectiveness. Two different task completion rates were tracked. One for whether or not all hidden objects were found in a stage and another for how often the VgAudience managed to mark the current target before the Presenter selected it. The score tracked the amount of targets found in each stage and error rates recorded erroneous selections and markings. Score tracked the effectiveness of each group of participants, and was a global metric for all groups and all conditions. On the other hand, error rates track the effectiveness of the Presenters for both conditions and for the Audience in the Vgcondition.

The error rates track the action of selecting for the Presenter and marking for the Vg Audience. Selecting is the Presenter action of pointing at a target with the controller and clicking the button on the controller. The number of selections made before the correct target was selected was tracked in each round of every stage. Marking is the Vg Audience action of pointing targets to highlight them to other Audience members. The Presenter does not share the view of the marked objects, only the general direction of gaze from the audience avatar as seen in Figures16a,16b,9aand9b. The error rates and Vg

Audience completion rate were averaged for all stages.

Efficiency

Task completion times and subjective workload as estimated by the RTLX determined the participants’ efficiency. Task completion times were measured from the point where the current target became visible, to when it was selected by the Presenter. These times were further divided into the time for the Audience to find the target and the time for them to guide the Presenter to it and select it. The task completion times were then averaged out for each stage.

For cases where all targets were found in the stage, the total time for that stage was recorded. When a stage was left uncompleted, the maximum time of five minutes was used instead. The users’ subjective workload was determined by using the unweightedRTLX version of theNASA-TLX. The total sum of all

(38)

The score from the UMUX survey was used to infer the participants’

satisfaction. While the survey does not tackle satisfaction directly, it was developed with it in mind, in accordance with the ISO 9241-11 standard [39].

Z-testing

Z-testing [43] was carried out at ↵ = 0.05 where applicable, with unbiased standard deviations computed for both conditions’ relevant data points. While the user study ended with less participants than desired, p-values were derived from respective z-scores calculated with the standard deviations of Vo . The z-scores were determined by the following formula:

z = µg µo so

png

Here z denominates z-score, µi, sample means, si, sample standard deviation and ng, total number of Vgdata points.

(39)

Results

In the sections below, the results for determining the users’ effectiveness, efficiency and satisfaction are presented in order. Under the Effectiveness section, score, error rate and task completion is detailed. Task completion times and subjective workloads can be found in theEfficiencysection. Finally, UMUX scores are detailed in the Satisfaction section and the individual statistics of each group are presented underIndividual Results.

Effectiveness

Score

In Figure 17 it is possible to see that Vo groups generally achieved better scores than their counterparts in all stages but stage nr. 3 and 4. In the former, Vg groups scored 1 point more on average, but this change is not statistically significant, as shown in Table 2. In the latter, the difference between the two conditions was significant. Here, the Vggroups’ average score was almost double that of the Vogroups at 12.33 compared to 6.67 respectively.

Comparingly, the difference between the two medians from the conditions is only 3, a bit more than half of the averages’ difference. This suggests that the disparity in stage 4 might not be as significant as the averages indicate.

Finally, the only other stage than stage 4 with a significant p-value was the final stage. Here the trend of Vogroups scoring better than the Vg groups resumed.

Moreover, the median delta is also closer to the average delta, lending further credence that there was a significant difference between the two populations.

(40)

Figure 17 – Mean score per stage with standard deviations. The values with a standard deviation of 0 was caused by all teams in that condition achieving the same final score by completing the stage.

Figure 18 – Scores normalized by the Occluder Ratio of each stage. Notice how similar the Vg stage 4 and 5 scores get after normalization. This is most likely due to happenstance, as even shifting the normalized score by 1 will result in wildly different normalized scores when using stage 5’s Occluder Ratio.

(41)

Figure 19 – Individual scores of each session for every stage. The highest scores attainable are 8 and 25 for the Tutorial and stage 1 respectively, with 30 for the rest. Note how one Voteam was very close to complete stage 2, being short of only 3 points. Individual trials are numbered.

Stage Tutorial 1 2 3 4 5

Vomean 8 25 23.667 16.667 6.667 4.667

Vgmean 8 21.667 21.333 17.667 12.333 3.333

V_omedian 8 25 24 16 7 5

Vgmedian 8 22 19 16 10 4

V_ostandard deviation 0 0 3.512 2.082 0.577 0.577 Vgstandard deviation 0 3.512 7.767 6.658 5.859 1.155

P-value - - 0.25 0.405 0 6.33E-05

Table 2 – Table of average and median scores, together with standard deviations and p-values.

(42)

Figure 20 – Average selection and marking error rate for Presenters and Vg

Audience members respectively.

Selection Error Rate

The average selection error rate for the Presenters of both conditions may seem somewhat inconsistent at first glance, which Figure 20 demonstrates.

However, following the trends of the averages, they suggest that Presenters experienced a bump in difficulty of selecting the correct target in stage 3 and stage 4 for the Voand Vgconditions respectively. Moreover, the p-values from Table3support that there is statistical significance between the selection error rates for half of the stages, namely stage 1, 2 and 5. On that note, the elephant in the figure needs to be addressed. The sudden sharp increase in error rate for the Vg crowd on stage 5 could possibly be because of an outlier, so said outlier was removed for comparison and can be seen as the third series for the final stage. The outlier was the group labeled 5 in Figure21. The corrected error rate still yields a significant increase compared to the earlier stages, indicating that the presumed outlier might be more accurate than initially thought.

(43)

Figure 21 – Individual selection error rates from each session for every stage.

Individual trials are numbered.

Vo mean 0.375 0.04 0.268 0.22 0.3 0.214

V_g mean 0.125 0.138 0.078 0.226 0.189 1.6

Vo median 0 0 0 0 0 0

V_g median 0 0 0 0 0 1.5

Vo standard deviation 1.013 0.197 0.608 0.465 0.733 0.579 Vg standard deviation 0.338 0.583 0.324 0.505 0.518 1.430

P-value 0.227 5.73E-05 0.013 0.920 0.358 3.73E-14 Table 3 – Table of average and median selection error rates, together with standard deviations and p-values. Since there was an equal amount of data points for the Vg stage 5 error rate the two middle values were averaged out, resulting in 1.5.

(44)

condition, Vo, the error rates curiously share a similar trend across the stages.

Vg mean 0.188 0.086 0.211 0.085 0.297 0.150

Vg median 0 0 0 0 0 0

Vgstandard deviation 0.704 0.343 0.790 0.341 1.131 0.366 Table 4 – The average and median marking error rate together with standard deviations for the Vg Audiences.

Task Completion Rate

Stage Completion Rate

As Table 5 and 6 shows, only the three first stages, tutorial included, were ever completed by the participants. The last one of these three, stage 2, was only completed once by a Vg team. The same team was also the only one to finish stage 1 in the Vgcondition, which can be seen in Table6. It should also be noted that not only did all teams from the Vo condition complete stage 1, but one of these teams was also close to finish stage 2, finding all but three hidden objects in that stage. This can be concluded from Figure19. The stage completion rate for the earlier stages further supports that the Vo participants performed better than their Vg counterparts in those stages, but although the former groups achieved better scores than the Vg groups in stage 2, only one group from the latter condition managed to actually complete it.

V_o 1 1 0 0 0 0

Vg 1 0.333 0.333 0 0 0

Table 5 – Stage completion rates. 1 means all teams of that condition completed the stage, while 0 means none did.

(45)

Condition Trial Tutorial Stage 1 Stage 2 Stage 3 Stage 4 Stage 5

Vo 1 Yes Yes No No No No

Vg 2 Yes Yes Yes No No No

Vg 4 Yes No No No No No

Vg 5 Yes No No No No No

Table 6 – Individual stage completion results.

Audience Subtask Completion Rate

The subtask referred to here is the first of the Vg Audience’s two aforementioned subtasks, namely finding and marking the current hidden object. In Figure22 and Table 7it is possible to see the completion rate of this subtask across the different stages. The lowest completion rate being recorded in the tutorial stage might be possible to be attributed to the Vg

prototype’s learning curve. Yet when the individual completion rates from Figure 23 are taken into consideration, it is clear the last group in Vg is an outlier. This group only completed the first subtask only once during the whole trial, and subsequently lowered the final results. The completion rates without this outlier can also be seen in Figure22, and as expected the result there is uniformly higher without the outlier.

Vg subtask completion rate 0.292 0.477 0.563 0.547 0.351 0.6 Sans outlier 0.438 0.66 0.8 0.784 0.448 0.833 Vg standard deviation 0.505 0.392 0.458 0.461 0.264 0.382 Table 7 – VgAudiences’ subtask completion rate with accompanying standard deviations.

(46)

Figure 22 – The average completion rate of the first subtask of the Vg

Audiences.

Figure 23 – Individual Vg Audience subtask completion rates for each stage.

Here it is clear on team was an extreme outlier compared to the other two.

Individual trials are numbered.

(47)

Efficiency

Task Completion Time

Stage Completion Time

Generally, the Vo participants completed the first two stages significantly quicker than their Vg counterparts, which Figures24and25showcases. This is also further supported by the p-values found in Table8for these stages at the

↵ = 0.05level. This correlates to the results visible in Figures26and27, and Table 9, where main task completion times are shown. Lastly, while the Vg

groups finished stage 2 on average quicker than Vo groups, it is not indicative of the whole truth, as the median values for both conditions are equal. It is also not possible to calculate the p-value for this stage, since there was no variance in the Vocondition.

Figure 24 – The mean stage completion times for the first three stages. The remaining stages were left out since they were never completed during the user study.

(48)

Figure 25 – Individual Stage completion times for all sessions. Individual trials are numbered.

Vo mean 84.793 211.334 300 300 300 300

Vg mean 149.962 296.935 295.139 300 300 300

Vo median 79.687 211.345 300 300 300 300

Vg median 144.786 300 300 300 300 300

Vo standard deviation 13.031 63.987 0 0 0 0

Vg standard deviation 43.967 5.309 8.419 0 0 0

P-value 0 0.02 - - - -

Table 8 – Average and median stage completion times with standard deviations and p-values per stage. A completion time of 300 implies that the group or groups did not finish the stage.

(49)

Figure 26 – Average completion times of the main task, finding and selecting the current hidden target.

Figure 27 – Individual averages of the main task completion times for each session per stage. Individual trials are numbered.

(50)

enough for the differences to not be statistically significant at ↵ = 0.05. In the end, the Vg participants only managed on average to be faster on two stages, with only one of these deltas in mean main task completion speeds being statistically significant. The results here also correlates inversely with the two conditions’ average scores from Figure17.

Vomean 10.599 8.453 12.302 17.811 40.731 54.606 Vg mean 18.745 13.591 13.295 16.779 21.714 45.246

Vomedian 8.559 6.984 11.1 15.584 26.334 49.516

Vg median 17.065 12.918 10.685 15.571 18.815 35.5 Vostandard deviation 6.712 4.965 6.680 9.516 27.682 34.582 Vgstandard deviation 8.789 6.742 8.241 11.738 12.699 25.491

P-value 2.76E-09 0 0.234 0.430 2.93E-05 0.392

Table 9 – The average and median task completion times from the main task of finding a selecting a hidden target. Accompanied by standard deviations and p-values.

Subtask Completion Time

While not entirely obvious, it should be noted that the mean subtask completion time on the final stage for the Vg in Figure 29 condition differs from the average main task completion time in Figure 26 for the aforementioned condition and stage. This is most likely explained by the way subtask completion times were calculated for this condition. When computing the averages for this condition, completed tasks where the Audience failed to mark the target were dropped, since no point existed for those to divide the main task completion time into subtask completion times. Therefore, the samples from which the subtask completion time averages were computed from were subsets of those used for the averages of Figure 26. Other than that, the trends from Figures28and29correlates well with those in Figure26.

(51)

Figure 28 – The average Vo completion times for the two subtasks of finding and selecting the current hidden target.

Figure 29 – The average Vg completion times for the two subtasks of finding and selecting the current hidden target.

(52)

completion times. These can be found in Table10. According to the Audiences p-values in Table11, the same was true for the first three stages. Nevertheless, the disparity of these results should not be taken as definitive proof either, since the subtask completion times were split differently under the conditions. In the baseline condition, Vo , task times were split manually by the researcher overseeing the experiment, while Vgsubtask completion times were measured from the point where an Audience member marked the current target. Thus, as mentioned in the relevant figure descriptions, the two conditions results should not be compared directly. However, they are presented together in this report to save space and not increase the page count of an already lengthy report any further.

Vomean 4.042 3.696 6.907 10.958 15.870 14.361

Vg mean 5.988 5.327 3.415 7.010 5.311 5.378

V_omedian 2.6 2.575 4.934 8.867 15.775 15.01

Vg median 0.811 2.938 2.194 4.553 2.510 2.320

V_o standard deviation 3.593 2.698 5.912 7.152 6.013 8.232 Vg standard deviation 9.358 4.766 3.663 11.415 6.228 7.390 P-value 0.152 0.001 3.95E-04 0.003 2.43E-10 0.015 Table 10 – Presenter subtask completion rates. While they are presented together, the completion times of the two conditions should not be directly compared due to them being measured differently.

Vomean 6.570 4.782 5.046 7.011 24.861 39.135

Vg mean 12.489 7.824 8.447 8.867 15.758 22.153

Vomedian 4.492 3.8 4.6 5.267 12.06 21.584

Vg median 9.359 6.107 6.562 7.517 13.188 17.359

Vo standard deviation 6.282 3.374 3.272 5.611 29.538 39.457 Vg standard deviation 9.078 4.748 5.952 4.081 10.809 15.035 P-value 0.013 5.21E-07 4.47E-10 0.075 0.266 0.292 Table 11 – Audience subtask completion rates. The conditions should not be compared directly since different measurement methods were used.

(53)

Figure 30 – Individual Presenter subtask completion times per stage. While presented together, the two conditions’ results should be analyzed separately due to their differing measuring methods. Individual trials are numbered.

Figure 31 – Individual Audience subtask completion times per stage. Again, the two conditions’ results should not be directly compared since different methods of measurement were used. Individual trials are numbered.

(54)

something reflected by the means and p-value of Table12.

Figure 32 – The mean subjective workload of the Presenters and the Audience members respectively.

Role Presenter Audience

Vomean 43.333 41.389

V_g mean 28.333 39.861

Vomedian 40.833 45.833

V_g median 22.5 42.917

Vostandard deviation 12.276 14.07 Vgstandard deviation 13.097 12.499

P-value 0.034 0.79

Table 12 – Average and median workloads as reported by the participants with derived standard deviations and p-values.

(55)

Satisfaction

On average, Presenters from the Vo and Vggave the two applications the final score of 84.72 and 86.11 on theUMUXsurvey respectively. The difference between the averages of the two conditions were greater on the Audience side, however. Audience members in the Vocondition scored their prototype 64.17 on average, while those in the Vgcondition scored their 56.25. This can be seen in Figure33. The differences between the two conditions are not statistically significant at ↵ = 0.05, according to the p-value for the Audience found in Table 13. Nevertheless, the table’s median values indicate that Vg Audience members tended to rate their prototype lower than their counterparts, but it is not possible with the given data to say with certainty that this was not caused by random noise in the sample population.

Figure 33 – The averaged UMUX scores the two prototypes received from the participants.

(56)

V_gstandard deviation 10.486 15.081

P-value 0.759 0.214

Table 13 – Mean UMUX scores, median UMUX scores, standard deviations and p-values.

Individual Results

The individual statistics for each group in both conditions are shown in Figures19,21,23,25,27,30,31and Table14. When analyzing the individual results per stage as a function of self-estimated experience, there is no clear correlation, with one major exception. Curiously, the Vo group that achieved the best result also had the Presenter with the lowest self rated experience withVRout of all of the Presenters across both conditions. No similar trend is present in data for their Vg counterparts. Yet if subjective workload and UMUXscores are considered, another pattern appears.

Condition Vo Vo Vo Vg Vg Vg

Trial 1 3 6 2 4 5

VRexperience 2 3 1 5 3 3

Presenter workload 40.833 56.667 32.5 19.167 22.5 43.333 PresenterUMUXscore 79.167 95.833 79.167 75 95.833 87.5

Mean AudienceFPSexperience - - - 5.5 5.5 6

Mean Audience workload 50.417 49.583 24.167 25 50.417 44.167 Mean AudienceUMUXscore 81.25 41.667 75 72.917 52.083 43.75

Table 14 – Individual trial data on experience, workload andUMUXscore.

Groups across both conditions with better performance in all tracked data points except error rates generally rated their workload as lower, and gave the prototypes higherUMUXscores. This was consistent for both Presenters and Audience members, particularly regarding the RTLXscores. Less so for the Presenters’UMUXscores, compared to the Audience’sUMUXscores which

(57)

were better correlated with the results. This could be argued as a point against using the UMUX survey to infer satisfaction, but it should be remembered that the Presenters generally scored the prototypes high compared to the Audiences, which were lower and more varied. Since Audience members of better performing groups generally ranked the usability as higher, it seems plausible that they were also more satisfied with their results due to satisfaction being one of the key factors the survey was designed for. The generally higher usability scores, and therefore satisfaction, match with Gutwin and Greenberg’s claims of higher satisfaction if greater effectiveness and efficiency is attained [14].

Finally, lower subjective workloads seemingly being inversely proportional to performance could be explained by those participants finding the given task in the trials easier than those who self rated their workload as higher.

Giving the Audience a Perspective: