Reaching out to grasp in Virtual Reality
A qualitative usability evaluation of interaction techniques for selection and manipulation in a VR game
MIKAEL ERIKSSON
KTH ROYAL INSTITUTE OF TECHNOLOGY
SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION
A qualitative usability evaluation of interaction techniques for selection and manipulation in a VR game
Sträck ut och ta tag i virtuell verklighet
En kvalitativ användarbarhetsstudie av interaktionstekniker för val och manipuliering i ett VR spel
Mikael Eriksson mikaele3@kth.se
Computer Science Master of Science in Computer Science
Supervisor Björn Thuresson
Examiner Danica Kragic
Employer Resolution Games
Supervisor at Resolution Games CarlArvid Ewerbring
20160622
generation of hand motion controllers which allows the users to reach out and grasp in the virtual environment. Earlier research has explored a range of possible interaction techniques for immersive VR interaction, mainly focusing on the quantitative and objective performance of each technique. Yet even with this research picking the right technique for a given scenario remains a challenging task. This study tries to complement earlier research by instead investigating the qualitative and more subjective aspects of usability, along with making use of the upcoming commercial VR hand controllers. The purpose was to provide guidelines to help future immersive VR interaction designers and researchers. Two interaction techniques (classic GoGo and ray casting with a reel) were chosen to represent the two most commonly used interaction metaphors for selection and manipulation, i.e. grabbing and pointing. Eleven users were then recruited to try the two interaction techniques inside a shopping scene initially part of a commercial VR game. Each user had to complete five tasks for each technique while “thinking aloud”, followed by an interview after the test. The sessions were recorded and analysed based on five usability factors. The results indicated a strong preferences for the GoGo interaction technique, with arguments based on how natural its interaction was. These results confirmed several conclusions drawn in earlier research about interaction in immersive VR, including the strength of natural interaction in scenarios which has the capacity to reach a high grade of naturalism, as well as the importance of showing the user when the interaction technique differs from realistic behaviour. Last but not least the results also pointed to the importance of further study on immersive VR interaction techniques over long time use and when combined with user interfaces.
Sammanfattning
En ny våg av VRhjälmar håller på att utvecklas för den kommersiella marknaden med exempel såsom HTC Vive, Oculus Rift och Playstation VR. Dessa VRhjälmar kommer tillsammans med en ny generation av rörelsekänsliga handkontroller som tillåter användarna i den virtuella miljön att nå ut och greppa tag. Tidigare forskning har utforskat en mängd möjliga interaktionstekniker för immersiv VR interaktion, med fokus på de kvantitativa och objektiva faktorerna för varje teknik.
Trots denna forskning så är valet av interaktionsteknik för ett givet VR scenario fortfarande en svår uppgift. Denna studie försöker komplementera tidigare forskning genom att granska de mer subjektiva och kvalitativa aspekterna av användbarhet, samtidigt som den nya generationen av handkontroller för VR används. Syftet med studien var att framställa
rekommendationer för att underlätta för framtida interaktionsdesigners och forskare inom VR. Två interaktionstekniker (klassisk GoGo samt strålkastning med fiskerulle) valdes ut för att representera de två mest använda
interaktionsmetaforerna för val och manipulering, det vill säga att greppa och att peka. Elva användare rekryterades för att pröva de två interaktionsteknikerna, inom ramen för ett shopping scenario som ursprungligen ingick i ett kommersiellt VR spel. Varje användare ombads att utföra fem uppgifter med varje teknik samtidigt som de “tänkte högt”, vilket följdes av en avslutande intervju. Sessionerna spelades in och analyserades utifrån fem användbarhetsfaktorer. Resultaten visade att användarna föredrog GoGo, på grund av att dess interaktion ansågs vara mer naturlig. Resultaten bekräftade även ett flertal slutsatser från tidigare forskning kring interaktionstekniker för VR, så som styrkan i naturlig interaktion i situationer som har kapacitet att nå en hög grad av realism och vikten av att visa användarna när interaktionstekniken bryter mot ett realistiskt beteende. Sist men inte minst visade resultaten även på vikten av framtida studier, dels gällande användning av
interaktionstekniker över en längre tid och dels gällande hur dessa interaktionstekniker ska kombineras med användargränssnitt.
1.1.1 Objective 2
1.1.2 Delimitations 2
1.2 Key concepts 3
1.2.1 Interaction Technique 3
1.2.2 Immersion 3
1.3 Word List 4
1.4 Ethical Aspects 5
2 Background 6
2.1 Virtual Reality 6
2.1.1 The History of Virtual Reality 6
2.1.2 Applications of Virtual Reality 6
2.1.3 Virtual Reality Hardware 7
2.1.4 Virtual Reality Input 8
2.2 Usability 8
2.2.1 Usefulness 9
2.2.2 Efficiency 9
2.2.3 Effectiveness 9
2.2.4 Learnability 9
2.2.5 Satisfaction 9
3 Related Work 11
3.1 Interaction in VR 11
3.1.1 Proprioception 12
3.1.2 ControlDisplay Ratio 12
3.1.3 Interaction Spaces 12
3.2 Selection 13
3.2.1 Ray Based Selection 14
3.2.2 Virtual Hand Selection 15
3.2.3 Selection Challenges 15
3.3 Manipulation 16
3.3.1 GoGo Interaction Techniques 17
3.3.2 Ray Casting Interaction Techniques 17
3.3.3 Hybrid Techniques 18
3.3.4 Two Hand Manipulation 18
3.4 Naturalism and 3D Interaction 19
3.4.1 Traditional 2D Versus Natural 3D Interaction 19
3.4.2 Natural Interaction for Video Games 19
3.4.3 Is Natural Interaction always Optimal? 20
4 Method 21
4.1 Implementation 21
4.2 Tools 22
4.2.1 Unity 23
4.2.2 Oculus Rift 23
4.2.3 Oculus Touch 23
4.3 Preparation for User Tests 24
4.3.1 Test plan 24
4.3.2 Pilot Test 27
4.3.3 Execution of User Tests 27
4.3.4 Instructions for Users 27
4.4 Analysis of Data 28
5 Result 29
5.1 Task Completion: Usefulness 29
5.2 Task Time: Efficiency 31
5.3 Interaction Technique Effectiveness 31
5.3.1 Selection 32
5.3.2 Rotation 32
5.3.3 Translation 34
5.4 Learnability 35
5.5 Satisfaction 36
6 Discussion 38
6.1 Achieved Usefulness 38
6.2 No Efficiency Difference 39
6.3 The GoGo Interaction Technique 39
6.3.1 Selecting with GoGo: Moving Your Virtual Hand 39
6.3.2 GoGo and Menu Interaction 40
6.3.3 GoGo CD Ratio Threshold Problem 40
6.3.4 The Satisfaction of Natural Interaction 41
6.3.5 Learning GoGo 41
6.4 The Ray based interaction technique 41
6.4.1 Selection with Ray 41
6.4.2 The LeverArm Rotation Problem 42
6.4.3 Sensitive Joystick 42
6.4.4 Ray Less Intuitive and Immersive 43
6.4.5 Menu Interaction with Ray 43
6.5 Method Discussion 43
6.5.1 VR Novelty Effect 43
6.5.2 Supplemental Quantitative Measurements 44
6.6 GoGo versus Ray 44
6.7 Conclusions and Future Work 45
6.7.1 Pick Natural Interaction in Situations which Allow it 45
6.7.2 Not Keeping it Real? Show it! 46
6.7.3 User Interfaces are also Part of the Interaction Technique 46
1 Introduction
Computers serve a central role in today's modern society, but our screens and interaction with the digital world is mostly still done in two dimensions (2D). Virtual Reality (VR) wants to change that, providing the user with a three dimensional (3D) immersive experience. The main purpose of VR is to provide the user with an illusion that they are somewhere else, fooling the user's senses to believe that they actually are inside a virtual environment. In recent years a range of VR technology has been developed and prepared to be released to the commercial market, including examples of head mounted displays (HMDs) such as the Oculus Rift, the HTC Vive and Playstation VR all coming in year 2016. Since VR offers an additional dimension compared to the classical computer setup, this opens up new possibilities on how interaction with the digital world could be carried out.
Indications of 3D interaction in research can be found as early as the 1960’s, but the field started to really take shape in the 1990’s. An early foundation of structuring research around interaction techniques in immersive virtual environments can be found in the work of Poupyrev et al. (1997) and Bowman & Hodges (1999), where they present formal frameworks to develop and evaluate interaction methods for virtual environments. One problem with the immersive VR interaction has been the multitude of hardware setups, making it hard for researchers to draw conclusions that can be generalized beyond the specific setup. With the dawn of commercial 3D game motion controllers starting with the Nintendo’s Wii Remote, followed by Playstation Move and Windows Kinect, new more common hardware arose for 3D interaction research. These controllers offered cheap hardware that was largely available to the common public,and input with more degrees of freedom (DoF) than the classical 2D mouse and keyboard setup.
Now together with the new VR HMDs to be released to the commercial market a second generation of consumer motion controllers is being produced. They will offer new research opportunities within immersive VR interaction, especially since they are designed from the start to be a part of an immersive VR experience together with the HMDs. These controllers are aimed to give the user virtual hands, allowing the user to stretch out and interact with the virtual world. At first thought the obvious way to design interaction with these controllers is to mimic real hands, to strive for a high grade of naturalism. Yet VR is not bound by physicals laws, and allows design of “hypernatural” or “magical” interaction techniques which can overcome limitations of natural interaction such as range beyond arm's reach. This means as an VR interaction designer, you must take care on which grade of naturalism your interaction technique should strive for to achieve a good user experience (Bowman et al., 2012).
To simplify the process of designing interaction for immersive VR and creating a solid hardware independent framework, research has focused on three universal interaction tasks: selection, manipulation and navigation (Bowman & Hodges, 1999;
Bowman et al., 2001). Selection and manipulation are commonly regarded as a pair, since they are similar in nature and faces similar design challenges. A range of interaction techniques intended for selection and manipulation has been developed and tested, but often in an isolated specially built test environment with a particular hardware setup. One approach to test interaction techniques has been to establish so called testbeds (Bowman & Hodges, 1999), which would serve as independent tests that could quantitatively compare interaction techniques against each other without an inherent bias towards certain techniques. These test scenarios form a stable base of knowledge with quantitative data, but they are far away from what an end user of a VR application will see today, just using bare boned graphics and interaction tasks without a context.
To complement and extend this existing base of knowledge, this study will evaluate the usability of the two most common Interaction technique metaphors for selection and manipulation, grabbing and pointing, from a qualitative perspective.
Further, the test scenario will be based on a real VR game scene (part of a game planned to be released in the first quarter of 2016), to bring the test scenario closer to an actual end user experience. The game in its original form only used the
orientation of the user's head and one button as input, this will be changed to instead allow the user to have virtual hands. The hardware used will consist of hand controllers which allow 6 DoF, and which are constructed specially for immersive VR to represent the user hands. Thereby using a hardware setup which may become common if VR reaches mainstream consumer use.
1.1 Research Question
The overall objective of this study can be formulated in the following research question:
How should you design and implement hand based interaction techniques to achieve a high grade of usability for immersive VR interaction?
To limit the span of this research question, only interaction consisting of selection and manipulation will be included in the study, navigation will not be considered. Further, from each of the two most common interaction metaphors for immersive VR, grabbing and pointing, a representing technique will be chosen to be studied (this choice is further detailed in section 4.1.1). The purpose of answering this question is to extend upon the existing knowledge about interaction techniques for immersive VR, providing new knowledge in the form of:
● Evaluating the usability of the two most common interaction techniques for selection and manipulation from a qualitative perspective
● Using a test scenario close to what a real end user would experience
● Utilising the new VR motion input controllers specially designed to provide the user with virtual hands
This usability evaluation will then be able to provide further insights on how developers should design interaction techniques for immersive VR and add to the growing research knowledge base on immersive VR interaction, by providing a qualitative user focused perspective on modern VR technology.
1.1.1 Objective
To answer the research question of this study two interaction techniques (pointing and grabbing) will be implemented, and then tested in a scene from a real VR game. The game scenario is a shopping scene, where the user will need to select and manipulate objects to complete purchases and investigate wares. The two interaction techniques will then be evaluated by user testing them inside the shopping scenario, measuring the usability of each individual interaction technique according to these five aspects:
○ Usefulness
○ Efficiency
○ Effectiveness
○ Satisfaction
○ Learnable
These aspects are part of the definition of usability provided by Rubin & Chisnell (2008). Finally, the study will also investigate which of the two interaction techniques is the generally preferred one for the scenario. Thus the research question will be answered by
● Evaluating the individual usability of two interaction techniques in a game scenario close to a real end user situation.
● Evaluating which interaction technique is generally preferred for the game scenario.
1.1.2 Delimitations
An important restriction of this study is that it only considers immersive VR through a HMD, which in this case means that the user’s head and hands can be tracked in 6DoF (for a definition of immersion, see section 1.2.2). Further, tracking which include finger precision is not studied, nor tracking of the user's gaze (eye tracking) or voice interaction. Finally, as stated
earlier, navigation is not considered in this study, the player will remain seated throughout the testing scenario. The user will be limited to selection and manipulation of objects, where manipulation includes positioning and rotating objects.
In the evaluation of usability, the sixth factor defining user experience, as defined by Rubin & Chisnell (2008), accessibility, which they defines as what makes a product useable for people with disabilities, will due to the small scope of this study not be included.
1.2 Key concepts
In this section terms central to the study are explained and defined.
1.2.1 Interaction Technique
There is no widely agreed upon definition of what an interaction technique is, even though there are some example of tries at a definitions, such as:
“An interaction technique is the fusion of input and output, consisting of all software and hardware elements, that provides a way for the user to accomplish a task” (Tucker, 2004, pp. 2022).
Tucker´s (2004) definition shows that an interaction technique can cover all aspects of a process for a user to complete a task, from the hardware setup to the software design. To avoid defining a new interaction technique each time a small aspect of the process changes (e.g. instead of using two buttons a control stick is used) an interaction technique must be allowed to be flexible, as long as it remains true to its core concept. To this end, and in a similar manner Bowman & Hodges (1999), interaction technique in this study will be considered a process of interaction, which can detail both hardware and software elements, but with at least one defining aspect. Since this study concerns a specific hardware setup (6 DoF tracked HMD and virtual hand controllers) the core differentiation considered between interaction techniques in this study will lie in the interaction metaphor used, which is either grabbing or pointing in this case. The reason for this being that the interaction techniques then should be able to be used by a range of hand controllers with similar capabilities, rather than be tied to one specific hand controller.
1.2.2 Immersion
Immersion is a well versed term within VR, often accompanied by “presence”, both something a VR developer wants to strive to achieve with their VR application. Even though the terms seems to be interchangeable, they are by definition different. Presence is defined as the subjective feeling of a user to feel like they are actually inside the virtual environment, while immersion is based on the traits of the VR hardware (Slater & Wilbur, 1997). Slater & Wilbur (1997) defined immersion as a description of technology, which could be split into the aspects inclusiveness, extensiveness, surrounding, vividness and matching. Inclusiveness is a measure of how much of the outside world is hidden from the user. Extensiveness is a measurement on how many senses are included in the VR experience. Surrounding tells how much the user can turn around, from a limited field to a full 360 degree view. Vividness is determined by the ability of the display device, such as the offered resolution of the screen and its capacity to display colors. While the first four aspects are related to (visual) output to the user, “matching” concerns that the user’s bodily movement should be correctly mimicked in the virtual world, also including other senses than the visual, such as audio and haptic feedback. From this definition of immersion, it becomes clear why a HMD can be argued to be more immersive than watching a 3D cinema with polarized glasses, i.e. the HMD offers higher inclusiveness (it closes out almost everything of the user's surrounding), and it changes the view of the virtual world as the user moves her head, achieving a high grade of matching.
1.3 Word List
CAVE Cave Automatic Virtual Environment, a VR setup where
the user is immersed in VR by having images projected on the walls, ceiling and floor of the room. See section 2.1.3.3.
GoGo Interaction technique based on the metaphor of grabbing,
where the user can stretch out their virtual hand longer than their real arm length. See section 3.3.1.
Head Mounted Display (HMD) One or two screens attached to the head of the user, staying in place as the user rotates or moves around.
Immersion The immersion of a VR experience is determined by which
abilities and performance the used hardware has, in comparison to presence which is the perceived feeling of the user of “being there” inside the virtual environment.
See section 1.2.2 for a more thorough definition.
Interaction Technique See section 1.2.1.
Manipulation The interaction act of moving or rotating an object. One of the universal interaction tasks together with selection and navigation.
Presence The perceived feeling of a user truly being inside the
virtual environment.
Ray Short name for the ray casting based interaction technique
used together with GoGo in this study, uses pointing as the base interaction metaphor with a reel functionality to pull objects closer. See section 4.1.3 for details.
Ray Casting Cast a ray from the hand or the head position of the user to determine a target for selection. See section 3.3.2.
Six Degrees of Freedom (6DoF) The possibility to track both the three rotational axis (3DoF) together with positional tracking of the three space coordinates.
Selection The interaction act of selecting an object, marking that object as the current target. One of the universal interaction tasks together with manipulation and navigation.
Three Degrees of Freedom (3DoF) The ability to track the three perpendicular rotational axis.
Translation Changing the position of an object in space, without
changing the rotation or scale of the object.
Universal Interaction Task Navigation, selection and manipulation together form a set of universal or atomic interaction tasks that can be combined to form more complex interaction.
Usability See section 2.2 for the definition of usability used in this study.
Virtual Reality Computer simulated reality which provides the user the illusion of being somewhere else.
1.4 Ethical Aspects
A VR experience which has lacking frame rate or is badly designed can cause the user to feel uncomfortable and even sick.
To minimise the risk of making test users ill the expertise available at Resolution Games will be incorporated into the study, both in the form that the initial game scenario is provided by the company and that they have offered to provide feedback on the test scenario. A well designed and and convincing VR experience can also cause the user to feel uncomfortable if the scenario contains disturbing elements, seeing violence being carried out in front of you in an immersive 3D world is very different from the same scene playing on a 2D screen. This means that care must be taken that offensive elements are not included in the VR scenario, this will be once again ensured by drawing on the expertise of Resolution Games.
To further make sure that the test users feel comfortable and safe throughout the test scenario participation will be fully voluntary, and users will have the right to abort their participation at any time (including under the actual test scenario). All personal data will be kept confidential under the process of this study, and at the end of the research be disposed of. Test users will also be anonymous throughout the whole process.
2 Background
This section starts with a brief history of virtual reality, followed by a subsection on the applications of Virtual Reality.
Thereafter follows an introduction to common hardware used to provide a VR experience, both visual output devices and different input devices. Finally, this section ends with a subsection describing which factors are considered a part of usability in this study and how they are measured.
2.1 Virtual Reality
2.1.1 The History of Virtual Reality
In this section a short review of VR history follows, based on the work of Mazuryk & Gervautz (1996), Earnshaw (1993) and The Verge (n.d.).
The first try at creating a VR device is commonly attributed to Sensorama, a multisensory simulator built by Morton Heilig from 1960 to 1962. The simulator consisted of a video augmented with a variation of different sensory stimulation, such as binaural audio, scent and haptic feedback, to make the user feel immersed into the artificial world displayed. The user was situated in front of a booth like object with an extension going out from the top engulfing the user's head. Even before Sensorama there are examples of flight simulators that bordered on a Virtual reality experience, even if the term itself was not invented yet.
The paper “The ultimate display” by Sutherland (1965) painted a picture that VR still strives for, a fully immersive experience where all human senses would receive feedback, or as Sutherland writes:
“The ultimate display would, of course, be a room within which the computer can control the existence of matter. A chair displayed in such a room would be good enough to sit in” (Sutherland, 1965).
Sutherland then turned to create the “Sword of Damocles”, which in turn is credited to be the first computerized HMD with head tracking and stereo view, gaining its intimidating name based on its bulk and size. Thereafter follows a range of VR technology, many developed by the military to simplify training through simulation. The first VR products for the consumer market was the DataGlove (1985) followed by the Eyephone HMD (1988), both created by the VPL company. Military flight simulators proved to be a great accelerator for VR research, pushing technology to better simulate a realistic flight scenario without the cost and risk associated with flying an actual physical airplane.
From the 1980 to the 1990s VR had become real and growing market, seen as a possible next step in a computerized world after the revolution of graphical user interfaces. Pop culture such as the movie the “Lawnmower Man” helped put virtual reality on the map for a wider public. VR Techniques such as the CAVE system was introduced, placing the user in a cube shaped room where some or all surfaces in the room consisted of 2D screens, creating the illusion of a virtual environment.
Yet at the mid 1990s the VR hype blew over and the market quieted down, instead the internet moved in as the main new technology. Except from being buffeted out by the internet commercial VR also had other problems leading up to its flop, the most commonly cited being that of the hype surrounding the products often tended to promise more than the current technology could deliver. VR remained in the shadows until year 2012, when the Oculus Rift HMD was first revealed, the first step towards a new generation of VR technology (The Verge, n.d.).
2.1.2 Applications of Virtual Reality
The core idea of VR, to create an alternate reality for users to experience, can seem like a the final level of escapism, where a user can hide from the real world. This is also been described in Science fiction literature and media, for example the dystopic future described in “Ready Player One” by Ernest, C. (2011). In this dystopia the world in year 2044 has gone into chaos through a widespread energy crisis, leaving the citizens of earth to struggle to live in a declining world. The only solace offered is the OASIS, a huge VR network where users can escape their miserable lives into a virtual world where
everything is possible. This dark future aside, the story also gives examples of how VR can serve in many different forms such as training, education and entertainment. Many of these applications are already in some ways a reality; some examples of these are:
● Training and simulation related to military activity, especially flight simulation as noted in section 2.1.1, played a major part in pushing VR research ahead. VR can simulate scenarios which would be costly to arrange in reality for training purposes, while also removing the risk a real situation could imply (e.g. firing bullets, performing a complicated surgical operation or driving a big vehicle) (Mazuryk & Gervautz, 1996).
● Visualisation and virtual tourism. By allowing students in a classroom to equip VR technology the teacher can bring their students to historical artifacts around the world without ever leaving the classroom. In the same manner complex data, difficult to display in 2D, can be spread out into a 3D space the user can explore (Mazurk &
Gervautz, 1996).
● Teleoperation. Robots can work in conditions where no human could live, and one option of controlling these robots could be done through teleoperation, i.e. letting the user see through the robot's eyes and move it by moving their own limbs from a safe location (Mazuryk & Gervautz, 1996).
● Entertainment in the form of VR games was a minor factor in the products developed in the 1990’s, but today the game industry can be considered a major factor in pushing the limits of new VR technology, serving as the main application of VR to a possible future consumer market (Oculus VR, 2015).
2.1.3 Virtual Reality Hardware
VR hardware provides output to fool a user’s senses that he or she actually is present in a virtual environment, and to allow the user to interact with the virtual world the hardware must in turn accept input from the user to respond to. In this section common VR solutions for output and input are reviewed.
2.1.3.1 Head Mounted Displays
A HMD commonly appears to be something between a pair of ski goggles or a helmet (see figure 1), providing two digital screens in front of the user’s eyes that together shows slightly different images of the same scene, creating the illusion of depth through parallax.
Figure 1: examples of head mount displays (HMD), to the left Google Cardboard, to the right Samsung VR Gear, pictures originally from Wikipedia (2015a) and Wikipedia (2015b).
When the user moves his or her head, the images changes to display the virtual environment from the new perspective, providing the illusion that the user is actually looking around in the VR (Mihelj et al., 2014). Modern consumer examples of VR HMD includes the Oculus Rift, HTC Vive, Samsung VR Gear, Google Cardboard and Playstation VR.
2.1.3.2 Temporally and Spatially Multiplexed Displays
To give a sense of depth through only one display, these displays provides different images for each eye, either by shifting between the two images over time or in space. In the first case the screen quickly changes between an image for the left then for the right eye, and by using a pair of active glasses that synchronously in time with the screen blocks the view of the inactive eye, creates an illusion of depth (Mihelj et al., 2014). The second approach makes the display show both the left and right image at the same time, but through the use of polarized glasses each eye only receives one image each. A common example of usage of the polarized glasses is in 3D cinema (Mihelj et al., 2014).
2.1.3.3 CAVE Systems
CAVE (Cave Automatic Virtual Environment) like systems creates an immersive VR experience by equipping a small room with displays instead of walls (sometimes including the floor and the ceiling) that together creates the illusion of a virtual environment. By then adding motion tracking and surround sound, a very convincing VR experience can be created, with the obvious weaknesses being the space and money required to set up such a system (Mihelj et al., 2014).
2.1.4 Virtual Reality Input
The range of input devices that can be used for VR is as large as the amount for computer interaction in general, and detailing them all is out of the scope of this study. One reason it is hard to design interaction for immersive VR is due to the fact of the wide range of existing input devices (Bowman, Wingrave & Campbell 2001). Therefor this section is limited to review current tracking technology, followed by the most common input devices bundled with current consumer VR solutions. For a more detailed review about different VR input devices, see Mihelj et al. (2014) pp. 5395.
2.1.4.1 Tracking
One important input of VR is that of tracking the user body or separate body parts such as the hands or the head of the user.
This can be done in a multitude of ways:
● Inertial tracking: tracking through physical controllers which the user can hold or attach to their body that have a built in accelerometers and gyroscopes, thus being able to provide data about acceleration and rotational changes of the sensor.
● External tracking: tracking the user or a physical control through externals cameras, either based on a marker system or through computer vision algorithms, the most common examples in the second case being the Windows Kinect and the Leap Motion.
○ Another example is eye tracking, i.e. by tracking the user's eyes the application can determine where the user is currently focusing their attention.
2.1.4.2 Consumer Input Devices
For the major upcoming consumer HMD VR solutions, there is a clear divide in that solutions based on a PC make use of external tracking, commonly through a marker system, while mobile VR solutions use inertial trackers. This applies to the HMD themselves, but also the bundled input devices.
The first example of a VR controller is the classic game pad, which without any tracking only uses thumbsticks and buttons to provide input, and has been used both together with mobile VR and PC VR solutions. These controllers are, however, less immersive than their motion tracked counterpart. Currently for the PC based HMD solutions the main upcomming input devices are 6 DoF externally tracked hand controllers, able to function as the virtual hands of the user. Yet the fingers of the users are not tracked through computer vision, but instead the state of the mechanical buttons must serve as indications of the user's current finger position.
2.2 Usability
Usability can at first impression appear to be a one dimensional measure, either a system is usable or it is not. Yet usability is a complex composite measure (Nielsen, 1994; Rubin & Chisnell, 2008), and two products can be equally usable, but for very different reasons. Therefore, to be able to determine the usability of a product a definition of the factors that makes something usable must first be established. Nielsen (1994) remarks that the factors of usability that are of interest in a particular situation are highly dependent on the product and the user group considered. As an example the evaluation of a software interface or a physical hardware manual can both consider usability, yet due to their different nature the usability tests should be adjusted accordingly (Nielsen, 1994).
For this study the definition of usability as described by Rubin & Chisnell (2008) will be used, decomposing usability into the five components: Usefulness, Efficiency, Effectiveness, Learnability and Satisfaction, each described in the following subsections.
2.2.1 Usefulness
Rubin & Chisnell (2008) describes usefulness as at which degree the user can can complete the task the product is intended to solve. Regardless of how easy a product is to learn or how satisfying it is to use, if it cannot solve its intended task it is useless for the user. Usefulness can seem like an obvious factor to consider (why test something if it cannot perform its intended task), but from an user test perspective it is an important reminder to make sure that the users actually can achieve their specific goal with the product.
2.2.2 Efficiency
Efficiency is how quickly a user can finish a task, commonly determined by measuring the time it took for a user to complete a given task (Rubin & Chisnell, 2008). Nielsen (1994, p. 30) takes a slightly different approach, detailing the “Efficiency of use” as the performance of a user which can be considered an expert of the system. Nielsen (1994) which names learnability as the main component (rather than usefulness as Rubin & Chisnell (2008) suggests) therefore makes an important point in that measuring the efficiency of a user, one must consider the level of mastery the user has achieved. This displays the complexity of measuring usability, in that the individual factors are not independent of each other. For user tests it is therefore important to consider the level of expertise the user has when measuring and comparing efficiency.
2.2.3 Effectiveness
Rubin & Chisnell (2008) describes effectiveness as: “the extent to which the product behaves in the way that users expect it to and the ease with which users can use it to do what they intend” (p. 4). They further write that quantitative error rates is the most common way to measure a product's effectiveness. As an example if a user pushes a button on a user interface in the belief that it will save their current work, but instead it deletes it, the system has clearly behaved unexpectedly from the user's point of view, deluding the user to perform an error. The definition of effectiveness by Rubin & Chisnell (2008) ties in with the usability factor Nielsen (1994, p. 26) defined as “errors”, in that the system should have a low error rate, it should be easy to recover from an error and catastrophic errors should not occur. Nielsen (1994, pp. 3233) thus separates minor errors (such as a an incorrect selection forcing the user to reselect) which only slows the process of completing the task, to catastrophic errors where users work can be lost or the system stops working.
2.2.4 Learnability
In the usability definition by Rubin & Chisnell (2008) learnability is closely tied with effectiveness. The learnability of a system determines how hard or easy it is to understand the system, i.e. if a new user of the system must put a lot of effort into learning the functionality of the system. They further write that learnability can refer to the ability of a user to remember how to use a system with infrequent use, which connects to the factor Nielsen (1994) in his definition of usability referred to as
“memorability”, i.e. the memorability of a system with sparse usage patterns.
2.2.5 Satisfaction
Finally satisfaction is the users subjective opinions of the system, if they like it or dislike it, commonly recorded either orally through interviews or in written format through surveys (Rubin & Chisnell, 2008). Nielsen (1994, p. 33) points out that satisfaction is especially important for systems which is related to leisure rather than labor, since a satisfying experience does not necessarily entail a efficient or easy to learn system. He also notes that there are more objective alternatives to measure the satisfaction of a user, such as measuring the blood pressure, heart rate, Electroencephalography (measuring electrical brain activity), but that such means of measurement can intimidate already nervous test subjects further. Instead he suggests that by averaging the answers of many subjective users, the satisfaction of the system can be determined.
3 Related Work
This related work section first introduces existing research regarding interaction techniques in immersive virtual environments, with special focus on the challenges to the field along with important concepts. First interaction in VR is discussed to get an overview of its different components, followed by subsections explaining what proprioception (3.1.1), ControlDisplay ratio (3.1.2) and interaction spaces (3.1.3) are since they are common tools used when evaluating interaction techniques. After that the two 3D interaction tasks which are examined in this study, selection (3.2) and manipulation (3.3), are presented more in detail. Earlier work done on the two main interaction technique types, either based on a grabbing metaphor (section 3.3.1) or a pointing metaphor (section 3.3.2) is also reviewed. This chapter then continues with a short review of more advanced interaction techniques, including hybrid techniques (3.3.3) and two hand based interaction (3.3.4) techniques. Finally this chapter ends with earlier research exploring if naturalism is a good design choice for 3D interaction (section 3.4).
3.1 Interaction in VR
A perfect immersive VR experience would offer the user the same interaction as in the real world, using the human's body and full range of senses to explore the digital landscape. VR Technology have yet to reach a state where an exact replication of the realworld experience is possible, thus other options must be explored to induce immersion and create a good user experience (Bowman et al., 2012). Designing interaction in immersive VR is hard since there are many factors affecting the result. Building on work such as Norman (1986), Poupyrev et al. (1997) created a classification of these factors in their work with immersive manipulation in VR, ordering the factors into the following five groups:
● Userdependent: the knowledge and skill of the user.
● Input/Output device dependent: attributes of controllers (input) and screens (output), their strengths and limits.
● Interaction techniques dependent: attributes of the interaction technique used, such as if it is based on a pointing or a grabbing metaphor, the number of DoF ( Argelaguet & Andujar, 2013).
● Application dependent: design and layout of the immersive virtual environment, placement and size of virtual objects.
● Task context dependent: the goal of the task, how to complete it, how to fail etc.
This list highlights the multitude of factors which together determines successful interaction in VR, showing the inert complexity of interaction. A similar list of factors can be found in the work of Argelaguet & Andujar (2013), which however solely focuses on the task of selection.
Instead of focusing on all the possible factors that create good interaction in VR, Mime et al. (1997) focused on challenges which are particularly hard when designing the interaction. They suggested the following points as the main general challenges in designing immersive VR interaction:
● Fatigue: traditional 2D mice and keyboard interaction requires small precise movements of the hands, while the rest of the body and particularly the arms can rest on an ergonomic chair. However, in immersive VR, if the interaction technique is designed to be natural (mimicking real world interaction), wide arm movements or even whole body interaction can quickly tire the user (Mime et al., 1997; Argelaguet & Andujar, 2013).
● Limited haptic feedback: a natural part of our everyday interaction is the haptic feedback we receive: when we push a door open we feel its weight, when we press a button down we can sense the button trying to swing back up again. This feedback helps us to perform precise interaction and orient ourselves (Mime et al., 1997). Yet in VR there is still no good way to recreate this feedback in a realistic manner (Bowman et al., 2012).
● No unifying framework for interaction (Mime et al., 1997): for desktop computers the interaction based on WIMP (Windows, Icons, Menus, Pointers) is totally dominating, where the user controls a pointer through a mouse and alphabetical input on the keyboard. However, for immersive VR there is still no dominant framework for VR interaction which developers can rely on.
3.1.1 Proprioception
Proprioception is the ability each human has to determine the current position of their individual body parts, and their relative distances to each other. Since haptic feedback is limited in VR, Mime et al. (1997) suggests that basing interaction
techniques on proprioception could be a way to increase the perceived immersion of the user. They argue that proprioception can provide a real world physical framework for the user to orient themselves within, and allow manipulation of objects even when the user does not have visual contact with the target.
Mime et al. (1997) presents three ways in which proprioception can be used to to enhance virtual interaction: direct manipulation, physical mnemonics and gestural actions. Direction manipulation using a one to one mapping with the user's hand allows the user to manipulate an object, and determine its position relative to the user, even with their eyes closed, since they are constantly aware of their hands position relative to their body. Physical mnemonics is the idea of making virtual objects available by placing them relative to the user's body, in a similar manner as a tool belt or pockets. This way, for example, menus can constantly be available to the user, but be placed on the side or the back of the user's body to avoid visual clutter. Finally, body relative interaction can help aid in remembering gestural actions (Mime et al., 1997). However, proprioception has the natural limitation of only working within arm's reach (or foot’s reach), since everything is relative to the user's body and each body part.
3.1.2 ControlDisplay Ratio
The ControlDisplay ratio (CD ratio) determines how changes in the controller input maps to changes of the selection tool. A CD ratio of 1 means that the mapping between the controller input and the selection tool in the virtual environment are one to one, the prime example being a user’s virtual hand following their real physical hand exactly (Argelaguet & Andujar, 2013).
A CD ratio lower than one would means that the user could extend their virtual hand further away than their real one, the GoGo interaction technique (Poupyrev et al., 1996) being an example of this using a nonlinear mapping between the real hand and the virtual one, allowing the user to stretch out their virtual hand beyond the bounds of their real hand (the GoGo interaction technique is detailed further in section 3.3.1). Finally, a CD ratio higher than 1 means the the user's movements get downscaled, big physical movements giving small effect.
Interaction techniques based on CD ratio can be classified into three groups depending on how the CD ratio is controlled:
either manually, target oriented or velocity oriented techniques (König et al., 2009). Manually switched techniques allow the user to change the ratio, e.g. in a selection scenario to initially have a high ratio to quickly move close to the target (with less accuracy), thereafter lower the ratio to be able to do small corrective movements. Switching modes can however induce a heavy cognitive load on the user (König et al., 2009). Target oriented techniques removes this cognitive load on the user by lowering the CD ratio near the target for the user, but due to its autonomous nature it will slow the interaction heavily in cluttered virtual environments (König et al., 2009; Argelaguet & Andujar, 2013). Velocity oriented techniques, finally, bases the CD ratio on the speed of the input controller.
3.1.3 Interaction Spaces
3.1.3.1 Motor Space
The physical working space available to a user is referred to as the motor space. The motor spaces varies depending on the hardware setup and the physical environment surrounding the user while they interact with the virtual world. As an example, if the user is wearing a tracked HMD and using a swivel chair the user’s motor space is constrained by the size of the room, the tracking range of the HMD and the available rotation on the chair (Argelaguet & Andujar, 2013).
3.1.3.2 Visual Space
The visual space consists of the virtual world the user can see, and is limited by the field of view available on the screen or screens used (Argelaguet & Andujar, 2013). If the hardware setup used allows for head tracking, the visual space can simply be changed by the user rotating or translating their head in space. If the motor space and visual space is mapped one to one, the user can use proprioceptive feedback to directly interact with the virtual world, since where they feel their hand is relative
to their body also will be true in visual representation of the virtual world. However, if the spaces are decoupled (for example in a classic desktop setup with a mouse and screen), the user needs visual feedback (the mouse pointer) to map between the motor space (the area provided by the mouse pad) to the visual space (the screen).
3.1.3.3 Control Space
The virtual space of objects available to interact with can, depending on the interaction technique used, be smaller or larger than the actual motor space of the user. This virtual interaction space Argelaguet & Andujar (2013) refers to as the control space. Interaction techniques with a low CD ratio allows the user to reach objects beyond their physical motor space, creating a larger control space at the cost of accuracy, while techniques with a high CD ratio means a user has a smaller control space than the motor space but higher accuracy. Figure 2 displays an example of the three interaction spaces when a stationary user is exploring a virtual box shaped room using a 6 DoFs hand tracking device and an interaction technique which doubles the the range of the virtual hand compared to the range of the real hand. The user can by turning her head see the whole room, shifting the visual space. The user can move her hands all around her, and even pass her physical hand through the virtual wall behind her. However then the user loses visual contact with her hand, since the user only can see objects within the room, and further only directly interact with objects within the control space.
Figure 2: shows an example of the three interaction spaces motor space, visual space and control space involved in a possible VR setup. The dashed lines marks the visual space of the user (the smiley), the part of the virtual room the user can see. The semicircle surrounding the user marks the control space, within which the user can directly interact with virtual objects. Finally the small full circle directly around the user marks the motor space, the actual physical space the user can move her physical hand within.
3.2 Selection
Selection is the task of letting users mark an object as the target for further action. Selection is known to be a universal interaction task, serving as a building stone to form more complex interaction tasks (Bowman & Hodges, 1999; Bowman et al., 2001; Steed, 2006; Argelaguet & Andujar, 2013). Due to its atomic nature selection is a well studied area, especially since selection usually is a prerequisite to manipulation (Argelaguet & Andujar, 2013). In this section only the techniques and challenges of selection will be reviewed, yet many of the concepts can also be applied on manipulation.
Selection in immersive virtual environments has two dominating interaction technique categories: ray based techniques and virtual hand techniques (Steed, 2006; Argelaguet & Andujar, 2013). They can both be related to real world selection metaphors we use in our everyday life, i.e pointing and grabbing respectively. Ray based selection is further discussed in subsection 3.2.1 and virtual hand selection in subsection 3.2.2. There is also a family of indirect selection techniques, such as