Object Selection by Relative Hand to Cursor Mapping: Design and Evaluation of a 2D-Object Selection Technique for Hand Tracking in Virtual Environments

(1)

INOM

EXAMENSARBETE MEDIETEKNIK, AVANCERAD NIVÅ, 30 HP

STOCKHOLM SVERIGE 2021 ,

Object Selection by Relative Hand to Cursor Mapping

Design and Evaluation of a 2D-Object Selection Technique for Hand Tracking in Virtual

Environments

MIKAEL DAHLGREN

KTH

SKOLAN FÖR ELEKTROTEKNIK OCH DATAVETENSKAP

(2)

Object Selection by Relative Hand to Cursor Mapping

Design and Evaluation of a 2D-Object Selection Technique for Hand Tracking in Virtual Environments

Mikael Dahlgren

Department of Media Technology and Interaction Design KTH Royal Institute of Technology

Stockholm, Sweden mdahlgre@kth.se

March 7, 2021

(3)

Abstract

Hand and tracker jitter negatively affect performance with ray-casting in virtual environments, making it difficult to acquire small objects which require high levels of precision. This study evaluates an alternative target acquisition technique for 2D-interfaces in virtual environments with relative hand-to-cursor mapping. Trade-offs between the relative hand-to-cursor mapping and ray-casting are explored in comparative evaluation, utilizing Fitts’s law, SUS- surveys and interviews. The results demonstrate that relative hand-to-cursor mapping performs equal to ray-casting while also allowing higher precision in certain selection tasks.

1

(4)

Sammanfattning

Str˚alföljning p˚averkas negativt av darrning i händer och sp˚arteknik i virtuella miljöer vilket gör det sv˚art att markera objekt som kräver hög precision. Denna studie utvärderar en alternativ teknik till str˚alföljning för objektmarkering i 2D-gränssnitt i virtuella miljöer kallad relativ hand till musmarkör mappning. Avvägning mellan dessa tv˚a tekniker utvärderas i en jämförande studie med hjälp av Fitts’s lag, SUS-undersökning och intervjuer. Resultaten demonstr- erar att relativ hand till murmarkör mappning presterar jämförbart med str˚alföljning och till˚ater samtidigt högre precision i särskilda objektmarkeringsuppgifter.

(5)

1 Introduction

Since the resurgence of consumer virtual reality (VR) electronics in the early 2010’s, there have been technological advancements in input modalities, moving from traditional game controllers and gaze-tracking to VR-controllers with 6 degrees of freedom [1]. In december of 2019, hand tracking was for the first time introduced as an integrated feature of a virtual reality head-mounted display (HMD) with a software update for the HMD Oculus Quest, allowing interface interactions through a select-by-pointing technique [10]. Tracking the user’s hands allows for system interaction without the need of any peripheral device. The solution for achieving hand tracking is to have a convolutional network extrapolate hand-bone positions from a video feed from cameras mounted on the HMD. A drawback with this approach is that it introduces tracking jitter which varies depending on lighting and environmental conditions [8].

1.1 Object Selection

Object selection is the act of specifying an object from a set, which is one of the fundamental interaction primitives in 3D interaction tasks and must be supported by any interactive virtual reality application [15]. One commonly used object selection technique is select-by-pointing via ray-casting [3, 5, 15, 6] (see Figure1). All currently available consumer HMDs use selection by pointing via ray-casting as a technique for object selection, which works well for spatially tracked controllers or HDM orientation tracking [1]. However, a drawback with ray-casting is sensitivity to noise, due to the fact that small movements result in large translations at the end of the ray where the cursor or object is being moved. This worsens as the length of the ray increases [3, 15, 11, 14, 23].

1.2 Motivation

This paper argues that ray-casting is not a satisfactory object selection technique when paired with the current state of hand tracking in consumer hardware, as ray-casting suffers from low precision and unresponsive behaviour due to input noise and consequent noise filtering. To address this challenge, this paper proposes a selection technique that uses relative hand-to-cursor mapping (RHCM), which works by taking the hand offset from an anchor point and translating it to a cursor offset from its screen center (see Figure 1). This technique attempts to avoid the ‘small movements resulting in large translations’ -problem inherent to ray-casting, thus allowing higher precision in object selection tasks. RHCM was compared to ray-casting by a total of thirteen participants, utilizing a Fitts’s test for quantitative data, and surveys and semi-structured interviews for qualitative data.

RQ: How does a relative hand position mapping approach perform compared to ray-casting in target acquisition in 2D user interfaces utilizing hand tracking in virtual environments?

1.3 Delimitation

In order to limit the scope of the study, no alternative gesture for selection action was evaluated. An analogy for the reader to consider is the difference between the selection action of clicking a mouse button, and the target acquisition action of moving the mouse cursor to a target. This study is concerned with the latter. This study also limits its scope to manipulating a cursor on a virtual two-dimensional screen. This is motivated by the observation that most common application navigation in HDMs is done via 2D-interfaces projected onto a surface in front of the user in virtual 3D space [1]. As 2D-interface navigation is becoming the norm in VR, there is relevance to explore object selection techniques that effortlessly facilitate such interactions [13].

2 Related Research

Bowman et al. [15] categorize selection techniques followingly: selection by pointing (eg. ray-casting), selection by touching (eg. virtual hand intersection), selection by occlusion (e.g. intersecting virtual objects) and indirect selection (e.g. voice commands). Ray-casting [21] is perhaps the most commonly used pointing-based technique, and is the technique against which relative hand-to-cursor mapping (RHCM) was tested. With ray-casting, the user points a

(7)

Figure 1: The two object selection techniques evaluated in this paper. Left: Hand mapping. Right: Ray-casting

virtual ray extending from the hand or input device to specify an object in the scene (see Figure1). Ray-casting suffers from a number of issues, mostly caused by individual hand tremor and tracker noise coupled with the fact that small movements are amplified at the ray’s end, making fine adjustments difficult [3, 15, 11, 14, 23]. Ray-casting behaviour is also inconsistent in different distances between the user and the target. These issues make ray-casting difficult to use when objects are small, as selecting such objects by pointing requires high levels of precision [21]. Various solutions to these issues with ray-casting have been proposed. Cone-casting [16] extends ray-casting by adding a cone centered around the ray, making the area for selecting targets larger. While easier to make selections, objects have to be separated by at least the diameter of the cone, or multiple targets will be selected, forcing the same precision problem as with original ray-casting [3]. Another proposed refinement to ray-casting is the bubble-cursor [12], where a circular cursor dynamically resizes so that it only contains one object. This solution however, also degrades in efficiency in cluttered environments and results in unwanted interactions [2, 3, 14]. Another approach is to apply a progressive refinement technique, where notable examples of such are SQUAD [14] or PRISM [11]. SQUAD solves the precision problem by dividing up the selection into multiple steps in order to keep the need for precision low, which allows for accurate selections, but at a cost of performance and restricted interface design [14]. PRISM uses continuous progressive refinement by moving the cursor slower than the user’s hand in low velocities when small movements are needed, and fast in high velocities when the user wants to make a movement towards a general area.

While this approach does allow for higher accuracy, the non-linear movement causes a significant mismatch of the physical pointing direction to the perceived pointing position [2, 3, 14]. To combat noise from spatial input, Vogel et al 2007 and Kopper et al. 2011, showed that a dynamic recursive low pass filter is a suitable solution to tracking noise and hand jitter without noticeable trade-offs in cursor responsiveness [14, 24]. While Oculus has not publicly released any research papers regarding their ray-cast implementation for hand tracked interaction, some insights are found in a blog post detailing their best practices for hand tracking [9]. In order to achieve a stable ray, a secondary position on the user’s body is set to anchor the ray, and the hand controls its direction. This anchor position is dynamically switching between three different body locations (shoulder, hip, eye level) depending on head gaze direction. The Oculus hand tracking API documentation does mention that the technique uses a noise filter, without disclosing what kind of filter or how it is being implemented [8].

3 Design And Implementation

3.1 Design Method

The design phase was largely affected by covid-19 (Read more about this in Section 6.1 Impact of covid-19). A choice was made to embrace ‘autoethnographical research through design’ during early stages of research, which is a first-person research technique. First-person research involves data collection and experiences from the researcher themselves, and has become an increasingly viable alternative to more traditional HCI methods [17, 22]. Examples of such methods are ’autoethnography’, ’autobiographical design’, and ’autoethnographical research through design’.

Although autobiographical design is a common and productive design practice, there are no accepted detailed models of how to perform or report on it in research, and as such it’s not accepted as good research practice [7, 22].

2

(8)

Figure 2: Autoethnographi- cal research through design

‘Autoethnographical research through design’ attempts to bridge this gap by com- bining autobiographical design with analytical autoethnography [1], which introduces a dedicated interpretative step where the designer generalizes from own experience to refine, elaborate and revise theoretical understanding about the subject matter in question. ‘Autoethnographical research through design’ lets designers obtain knowledge about an artefact through self-design combined with documentation [7, 22]. In practice, this means working in iterative cycles of documenting observations, analysis, designing and implementing, and evaluation of self-use of the system (see Figure 2).

This allows for rapid and continuous design, consisting of many minor technical chal- lenges and design revision. With sympathy for the reader, this following section will present summarised excerpts from the documentation and major findings from the design process in a compact format.

3.1.1 Evaluation

Evaluation was carried out by completing one or several evaluation application sessions. The evaluation application consists of three different segments which are used to gather quantitative and qualitative data (see Section 4.3 Evaluation application). For quantitative data, the evaluation application uses a method called ”Fitts’s tests”, which is specifically used to evaluate the performance of object selection techniques [18, 19]. A Fitt’s test consist of a series of target selection tasks for which time and accuracy is recorded in order to produce a value called thoughtput (see Section 4.3.1 Fitts’s circle). Higher throughput values suggests better performance. For qualitative data, the evaluation applications consist of two use case scenarios, one of which is a text-input task which puts high demand on precision, and one which is a user interface navigation task which has low requirements on precision but high on cursor behaviour and system response.

3.2 Design Iterations

Figure 3: Design iterations. Left: Virtual trackpad. Middle:

Bound tracking volume. Right: Hand mapping In line with issues highlighted in the literature, us-

ing Oculus implementation of ray-casting in evaluation sessions produced high error rates and bad user experience which was attributed to noise filtering and exaggerated movements [3, 15, 11, 14, 23]. The goal was set to design a more accurate and subjectively responsive technique than the implemented ray-casted solution. Improvements to ray-casting as suggested in prior work come at a cost of either performance or responsiveness. A decision was therefore made to avoid ray-casting.

Another initial design constraint was to allow the arm to be parallel to the body with the hand lev- eled at about hip height, as to avoid fatigue and discomfort [2, 15, 23].

3.2.1 Virtual trackpad

Using real life analogues as inspiration for brainstorming, ideas were explored through sketching and acting out interactions. From these exercises a design decision was made to implement a ‘virtual trackpad’. A virtual static surface which, when intersected with a finger, moved the cursor (like using a trackpad on a laptop). The virtual trackpad was placed near the body at hip height (see Figure 3) and then evaluated through self-evaluation sessions.

Findings from evaluating the virtual trackpad showed that virtual surface interaction with lack of haptic feedback resulted in unwanted behaviour (see Table 1).

(9)

Note - Virtual Trackpad Proposed design action The lack of haptic feedback makes it impossible to

know if the hand is touching the trackpad surface or not. This causes unresponsive cursor behaviour.

Track the hand in a volume around the trackpad instead of surface contact

The placement of the tracking surface has a great impact on ergonomy.

Evaluate different placements To reach the far edges of the virtual screen the track-

pad has to be very large and requires large and awkward arm movements

Amplify cursor movement by implementing a multiplier to relative cursor position

Table 1: Excerpts from documentation of findings and subsequent design changes for the virtual trackpad.

3.2.2 Bound tracking volume

Findings from the virtual trackpad motivated the implementation of a tracking volume around the trackpad, tracking the finger position when inside the bounds of the cuboid (see Figure 3). Different placements and orientations were evaluated where the origin of the box was put in low and high positions, as well as rotated towards and away from the user. The most relaxed placement of the tracking anchor was found to be slightly higher than hip height and rotated to face the user. Cursor sensitivity was implemented as a result from findings of evaluation sessions of the virtual trackpad. Higher cursor sensitivities solved the need for large arm movements. However, this came at the cost of accuracy.

Note - Bound Tracking Volume Proposed design action

Increased cursor sensitivity allows for smaller arm movements at the cost of increased jitter and lowered accuracy

Evaluate a range of cursor sensitivities

Hand sometimes moves outside the tracking volume and the cursor becomes unresponsive. Visualization of the trackpad offer no substantial feedback about the system status.

Exclude visualization of the virtual trackpad all together.

Instead, continuously track in an infinite volume

Table 2: Excerpts from notepad of findings and subsequent design changes for the bound volume tracking.

Cursor Sensitivity Average throughput Throughput range

4 4.1275 3.9 - 4.3

5 4.4001 3.7 - 4.9

6 4.5811 4.2 - 5.1

7 4.3380 3.9 - 4.8

8 3.4562 2.9 - 4.0

Table 3: Mean throughput of cursor sensitivities in self-evaluation sessions.

3.2.3 Hand mapping

Findings from evaluating the bound tracking volume lead to the removal of the virtual trackpad altogether and the tracking volume was extended to infinity, meaning the system continuously tracks the finger position in relation to a point (see Figure 3). Fitts’s circle tests was performed in self-evaluation sessions with various cursor sensitivities (see Table 3). The evaluation session had ten tests per sensitivity configuration, fifty tests in total, and the order of these tests were randomized. The data showed that the cursor sensitivity which yielded the highest throughput was 6.0. The threshold values for the cursor sensitivity were found by establishing a lowest sensitivity at which the arm had to be almost fully extended in order for the cursor to reach the edges of the screen, and then increase this value

4

(10)

until noise amplification rendered the cursor unusable. At this point the technique felt comfortable to use, and in self-evaluations it produced higher throughput than possible with ray-casting.

3.3 Implementation

3.3.1 Cursor position

The position of the cursor is continuously set to be equal to the screen center, plus the distance between the hand and anchor position (see Figure 4). As the cursor isn’t concerned with depth, the depth position of the anchor or the user’s hand is not taken into account when setting the cursor position. 9/16 of the human body was found to be an adequate approximation of hip height. Therefore, to account for variability in user height and still place the anchor about hip height, the y-position of the anchor point is calculated to 9/16’s of the user’s head height. Here, y is the vertical axis, x the horizontal axis, and z is depth. The x-position of the anchor is set to be the center of the user’s HMD position.

//Position of the head Vector3 HMD;

//Position of the anchor

Vector3 A = (HMD.x,HMD.y*9/16,0);

//Position of the cursor plane center Vector3 P;

//Position of the user’s hand Vector3 B;

//cursor sensitivity Float S = 6.0;

//Position of the cursor

Vector3 cursor = (P.x + (A.x - B.x) * S, P.y + (A.y - B.y) * S, P.z );

Figure 4: Pseudo code of the cursor position implementation

3.3.2 Noise filter

Results from Experiment group 1 motivated the implementation of a noise filter (see Section 5). As proposed by Vogel et al 2007 and Kopper et al. 2011, a dynamic recursive low pass filter is suitable without noticeable tradeoffs in cursor responsiveness [14, 24]. The filtering was handled by continuous gaussian smoothing over a 32 sample space.

The intended effect is to apply high noise reduction when precision is required and low noise reduction when fast response time is required. In short, in each moment the cursor’s position is set to a weighted average of the 32 most recent positions. Each sample weight is set by a gaussian function. The bias of the weights are set by distribution parameter which linearly interpolates between low and high max values based on the cursor’s velocity. Through self-evaluation, it was found that velocities under 15mm/s constitued as ”low” and 33mm/s as ”high”.

4 Method

4.1 Procedure

Upon arrival the participants were presented with an introduction to the study, and they provided their informed consent in a demographic and VR-experience survey. They then received an Oculus Quest HMD and entered VR to complete all steps in an evaluation application. They were given up to five minutes to familiarise themselves with the selection technique before beginning the first task in of the evaluation application. There were three tasks in the evaluation application to complete: a Fitts’s circle test, a guided UI navigation, and a text input task. When these tasks were completed, the participants removed their HMD and a SUS survey was administered. This procedure of VR-trial and SUS-questionnaire was then repeated for the remaining selection technique. Last, a semi-structured interview took place (see Figure 5). For each participant, the order of what selection technique they were presented with was randomized. This randomization was balanced so that an equal number of participants got to evaluate either RHCM or ray-casting first in order. To avoid participation bias, each participant was told that both selection techniques were developed in prior research, and that the only concern of this study was to compare the two.

(11)

Participants were video recorded on a smartphone (Google Pixel 2) while performing the VR selection tasks in a full body and front facing camera angle. Interviews were recorded in audio using the same device.

Figure 5: Illustration of the experiment procedure

4.2 Participants

Participants were recruited through personal contacts of the author using social media or contact by phone. The recruited participants were not required to have any previous experience with VR. The participants were made aware of the potential risk of motion-sickness and the fact that they could opt out of the study at any time. All participants provided informed consent to participate in the study. None of the participants had prior knowledge of the subject of the study. In total there were 13 participants. As initial findings prompted changes to the design of the selection technique, the participants are categorized into two groups, where group 1 evaluated the first variation of the technique (RHCM-v1) and group 2 the second variation (RHCM-v2). Group 1 had six participants in total, where two were females and four males, all right-handed and with an age between 24-30. Group 2 had seven participants where three were females and five were males, all in the age between 25-34, one of which was left-handed. Participants were incentivized by receiving a scratchcard for participating.

4.3 Evaluation Application

Figure 6: UI navigation segment of the evaluation application

An evaluation application was produced in which both RHCM and ray-casting could be evaluated (see Figure 6). The application was designed to collect both quantitative and qualitative data, and consist of three main segments:

Fitts’s test (quantitative), a guided UI navigation task (qualitative), a text- input task (qualitative). There was also an introduction screen designed to help participants familiarize with the test environment and interaction technique.

The resolution of the UI screen in the virtual environment is 1920*1080 px.

The size is 90,3 inch at a distance of 2m and height at 1.5m. This was set after measuring the size, placement and resolution of the UI elements as presented in the Oculus user interface. The tools used for developing the application was Unity (Unity Editor, Version 19.1.2.4069442) together with Oculus integration package (Oculus Unity SDK, version 16.0). The ray-casting technique used in evaluation was an unaltered implementation provided in the Oculus integration package.

4.3.1 Fitts’s circle

A common approach for evaluating the performance of selection techniques is by implementing a test based on Fitts’s law [18, 19]. There are different configurations depending on how many directions are being tested. For 2D, a Fitts’s circle test is the generally applied configuration (see Figure 7). The reader should note that the term trial, in this context, describes the act of moving from one point and selecting the next target, and the term sequence describes the act of completing all trials in a circle. Each sequence has its targets set to a unique combination of circle diameter

6

(12)

and target width. Thus, a full Fitts’s circle test consists of as many sequences as the amount of widths multiplied by the amount of distances.

Figure 7: Fitts’s circle test in 2D.

Figure 8: Geometry for a trial

Inspired by data transmission equations concerned with bandwidth, signal strength and noise, Fitts’s law aims to measure target selection tasks as quantifiable transmissions of data. The channel through which the data is transmitted is the human performing the target selection. The information through this channel is obtained by, for each sequence, dividing the index of difficulty (ID), with the mean movement time (MT). As ID is measured in bits, and MT in seconds, the rate of transmission, called throughput (TP), is measured in bits/second, as seen in (1).

ID is simply a relationship between the target width (W) and the movement amplitude (A) (see Figure 7). The ratio between them gives the amount of data required to perform the trial. The ratio is logarithmic in order to quantify the ratio into bits, as seen in (2).

T P = ID/M T (1)

ID = log₂(A_e/W_e+ 1) (2)

The use of the effective values (subscript ”e”) is an improvement suggested by Crossman in 1957 in an unpublished report to include spatial variability or accuracy in the calculation [19]. With this, We is computed as 4.133 × SDx, where SDx is the standard deviation in the selection coordinates, and the effective target amplitude Ae is the actual movement distance for the trial. Adjusted in this way, throughput becomes a single human performance measure that includes both speed and accuracy in human responses (see Figure9)

In order to achieve this, coordinate data need to be gathered for each trial’s starting position (”from”), target position (”to”), and the select position (see Figure 8). Given the points “from”, “to” and “select”, the lengths a, b and c are calculated. dx and Ae can then be obtained, as seen in (3) and (4). Note that dx is 0 for a selection at the center of the target.

dx = (c²− b²− a²/2a) (3)

A_e= a + dx (4)

The Fitts’s test was implemented in accordance with Crossman’s improvement [19] (see Figure 9). Nine combinations were used: A= 300px, 500px, 700px in unique combinations with W= 80px, 120px, 160px (see Figure 10). These corresponded to task difficulties ranging from ID=1.52 bits to ID=3.28 bits. The scale of target conditions was chosen such that the largest A and W spanned the height of the virtual screen with a margin of 30px. Each sequence included 12 trials. The target to select was highlighted in blue, otherwize idle in gray. Upon selection, the highlight moved to the opposite target. Selections proceeded in a rotating pattern around the circle. An error made the background of the screen briefly turn red to indicate error. Participants were told to be as quick as possible in their selection, but to slow down their pace if they felt that they were making too many consecutive errors. The data from the tests was calculated by the application and stored in text files.

(13)

Figure 9: Throughput computed using effective values

Figure 10: Configuration of the Fitts’s circle trial

4.3.2 Guided UI navigation

The UI navigation segment was implemented to obtain qualitative data of a selection techniques performance in scenarios which require low precision and large cursor movements. It consists of two pages for the participant to navigate. The first page (see Figure 6) shows two horizontally scrollable rows containing six large (640px x 200px) and eight medium (320px x 200px) app icons, as well as a side menu. From The menu, one item leads to page 2, which consists of a large vertical scrollable container with 24 small (200px x 200px) app icons. The participants were instructed to take their time, that their performance wasn’t being measured and were encouraged to try interactions multiple times to get a feel for the technique. They were then given the following instructions:

1. Navigate to the end of the large app list, then scroll it back. Repeat this step for the medium app list 2. Find and select “Page 2” in the side menu

3. Navigate to the bottom of the list app list, then back up.

4. Find and select “APP X”

Figure 11: Text-input segment of evaluation application

8

(14)

4.3.3 Text input

The text input segment was implemented to obtain qualitative data of a selection techniques performance in scenarios which require high precision and small cursor movements. Participants were instructed to type the sentence “the quick brown fox jumps over the lazy dog” in a comfortable yet effective pace on a virtual keyboard. The sentence is 32 characters long while using all 26 letters of the alphabet, requiring the participant to move all over the keyboard (see Figure 11).

4.4 SUS-Survey

The system usability scale (SUS) is a generalized ten-item Likert scale survey, aiming to assess usability of a system [6].

It produces a single score on a scale of 0–100, which can then be compared to other different systems. Each question is ranked by the participant from 1 to 5, based on their level of agreement. It should be noted that even though the score ranges from 0-100, it is not a percentage. SUS scores can be translated into ratings, such as ’worst imaginable’,

’poor’, ’OK’, ’good’, ’excellent’, ’best imaginable’ [4]. After completing the evaluation application, each participant completed the SUS-survey. Participants were told to consider the statements in relation to manipulating the cursor, and not for example, the VR experience or test application. Participants could ask for help about clarification while taking the survey.

4.5 Interviews

The interviews were held last, after participants had completed the evaluation application with both selection techniques and filled their corresponding SUS surveys. They were told that the interview would be about their experience with manipulating the cursor with their hand. Since participants were not aware of the names for the selection techniques, they were encouraged to reference them in the manner of order used. For questions where comparisons did not feel relevant or possible, participants were encouraged to give separate answers for each selection technique.

Although the interview followed a script, follow up questions or additional questions were added when there was a need for clarification of statements, or unexpected behaviour during the test sessions.

4.5.1 Interview script

1. Describe in detail what you had to do in order to manipulate the cursor.

2. Describe what your hand and arm felt like while manipulating the cursor.

3. How did you learn to manipulate the cursor?

4. Describe the nature of the connection between you and the cursor.

5. How did these methods compare to manipulating a cursor with any other method which you’re familiar with?

6. What did it feel like to be able to control the cursor with your hand?

7. Describe what it was like to perform each task. (Fitts’s test,UI navigation, text input)

8. Are there any particular thoughts you have about the experience that we haven’t touched upon in this interview?

4.5.2 Interview analysis

The interview recordings were transcribed then analyzed by coding the transcriptions into codes and sub-codes, as suggested by Malterud K. (2012) [20]. This method consist of repeatedly analysing interview transcripts with increasing granularity in order to identify broad themes and common sentiments within them. These themes and sentiments are then used to categorize statements with codes and sub-codes (see Table 4). The end result is a table of occurrences of codes and sub codes, giving an overview of sentiments given across multiple interviews.

(15)

Code Sub-code Applied when mentions of...

Physical Ergonomical Awkward / Relaxed poses Sensation Strain / Comfort

Productive Accuracy Hitting / Missing targets Responsive Cursor’s relative behaviour Introspective In control Sense of being in control

Intuitive Learning curve, interaction rules Ease of Use Effort / Leniency

Table 4: Codes and sub-codes identified during transcript analysis

5 Results

This section will present the results by two groups of participants. Group 1 evaluated ray-casting and RHCM-v1.

Group 2 evaluated ray-casting and RHCM-v2 (see Table 5).

Participant Group Technique Variant Noise Filter Cursor Sensitivity

Group 1 RHCM-v1 No 6.0

Group 2 RHCM-v2 Yes 4.5

Table 5: Configuration of the two variants of the evaluated selection techniques.

5.1 Participants

In total there were 13 participants, nine male and four female with a mean age of 28.57, standard deviation of 3.80.

All participants were right handed except for one participant in group 2. Group 1 consisted of six participants, four male and two female with a mean age of 25.83, standard deviation of 2.32. Group 2 consisted of seven participants, five male and two female with a mean age of 30.62, standard deviation of 3.42.

5.2 Throughput

Throughput (bits/s) is a performance measure based on speed and accuracy of selection tasks with varying difficult over time [18, 19] (For details, see 4.3.1 Fitts’s Circle). To determine if the means of two sets of data are significantly different from each other, dependent-sample t-tests were applied.

The results from group 1 evaluating ray-casting (M : 1.54, SD : 0.64) and RHCM-v1 (M : 1.60, SD : 0.64) indicate no significant difference in throughput between the techniques, t (53): 0.76, p: 0.22. The results from group 2 evaluating ray-casting (M : 1.57, SD : 0.57) and RHCM-v2 (M : 1.72, SD : 0.57) indicate that there is a difference in throughput, t (62): -1.71, p: 0.046), (see Figure 12).

5.2.1 Target amplitude

Isolating target amplitude, results from group 1 could not produce any significant difference between the techniques, with p> 0.05 for all configurations in t-tests. Results from group 2 however, showed that there was a significant difference t (62): -2.55, p: 0.02, between the techniques in small (300px) amplitudes, ray-casting (M : 1.50, SD : 0.52), RHCM-v2 (M : 1.79, SD : 0.57) (see Figure 13). Configurations of medium (500px) and large (700px) widths produced no significant difference between the techniques in group 2, with p> 0.05 for all configurations.

10

(16)

Figure 12: Grand mean of throughput. Left: Group 1, Right: Group 2

Figure 13: Mean throughput over target amplitude. Left: Group 1, Right: Group 2

Figure 14: Mean throughput over target width. Left: Group 1, Right: Group 2

(17)

5.2.2 Target width

Isolating target width, results from group 1 showed that there was a significant difference t (62): -2.41, p: 0.03, between the techniques in small (80px) width targets, ray-casting (M : 1.13, SD : 0.61), RHCM-v1 (M : 1.45, SD : 0.63). In group 2, there was a significant difference t (20): -2.31, p: 0.03, between the techniques in medium (120px) width , ray-casting (M : 1.55, SD : 0.54), RHCM-v2 (M : 1.86, SD : 0.35). Results from group 2 and small (80px) width produced a difference t (20): -2.01, p: 0.059, which is notable but does not qualify as significant. (see Figure 14).

5.3 Error Rate

In group 1 , ray-casting (M: 12.04%, SD: 3.84%) did produce a higher mean of error rate than RHCM-v1 (M: 8.95%, SD: 6.82%). In group 2 , ray-casting (M: 8.33%, SD: 7.8%) also produced a higher mean of error rate than RHCM-v2 (M: 5.82%, SD: 4.86%). However, no statistical significance of difference in error rate was found between the selection techniques in either group 1 (t(5): 2.57, p: 0.10) or group 2 (t(6): 2.45, p: 0.29).

5.4 SUS-Survey

The total mean of SUS-scores show that ray-casting performed better in usability than RHCM-v1. RHCM-v2 and ray-casting performed comparably (see Table 6).

Group 1 Group 2

Ray-casting RHCM-v1 Ray-casting RHCM-v2

SUS Mean 72.9 57.1 69.6 71.1

SUS Range 50.0 - 87.5 17.5 - 72.5 50 - 87.5 60.0 - 82.5

SUS Rating Good Ok Good Good

Table 6: SUS-score of the selection techniques from both groups

5.5 Interviews

5.5.1 Group 1

Overall, sentiments were more positive for ray-casting than RHCM-v1. Five out of six participants expressed difficulty in identifying the cause of the perceived difference between the techniques, and four out of six perceived the techniques to be more similar than different. Three participants attributed increased usability in ray-casting to it being less linear in relation to their hand movements: “Even when I held my hand still the cursor moved around. It’s like it didn’t know where my hand was” (Participant 3 about RHCM-v1). Two participants perceived RHCM-v1 to be more accurate whilst simultaneously require more effort to use, and preferred RHCM-v1 in the typing task, while preferring ray-casting in the UI navigation task. A sentiment reiterated by five out of six participants was that RHCM-v1 was too rigid and responsive to small movements for it to be a good experience to use: “The technique (ray-casting) was more floaty but also more fun to use. The other (RHCM-v1) felt like it mimicked my exact movements, which felt more rigid, like holding a laser pointer” (Participant 1). Interview transcription analysis (See Table 7) showed difference in mentions of accuracy, and ease of use. Intuitive controls were frequently mentioned as a positive for both techniques.

5.5.2 Group 2

Overall, sentiments were more positive for RHCM-v2 than Ray-casting. Five out of seven perceived the techniques to share more similarities than differences. Three participants claimed to feel a stronger connection to the cursor in RHCM-v2 compared to ray-casting, attributing it to a more accurate response to hand movement: “With (ray- casting) it felt like I had to adapt to the cursor, while with the other (RHCM-v2), the cursor responded to me” ( Participant 11). Six out of seven participants thought they were more accurate with smaller targets with RHCM-v2 compared to ray-casting. Even participants who preferred ray-casting overall preferred RHCM-v2 in typing tasks

12

(18)

where small target acquisition was required. “It (ray-casting) was a bit more responsive, but that made it harder to miss. The other (RHCM-v2) was more helpful when I had to make small movements” ( Participant 10 ). Interview analysis (See Table 7) show that there is a sentiment difference between the techniques regarding accuracy and ergonomy (Ray-casting: 2, RHCM-v2: 0).

Group 1 Group 2

Ray-casting RHCM-v1 Ray-casting RHCM-v2

Code Sub-code Pos. Neg. Pos. Neg. Pos. Neg. Pos. Neg.

Physical Ergonomic 0 0 2 2 2 0 1 1

Sensation 0 1 0 2 0 0 0 1

Productive Accuracy 1 1 3 1 0 2 6 0

Responsive 1 0 0 0 2 1 0 0

Introspective

In control 1 0 1 2 2 0 2 1

Intuitive 6 0 6 0 3 0 4 0

Ease of use 4 0 0 2 3 1 3 0

Table 7: Occurrences of positive and negative sentiments total count collected from the interview sessions

6 Discussion

This paper has presented RHCM, a selection technique for manipulating a cursor on a 2D-plane when utilizing hand tracking. RHCM was evaluated in a repeated-measures experiment to ray-casting. The experiment was divided into two groups, where the difference between the groups was the configuration of RHCM. In group 1, RHCM-v1 performed equally to ray-casting in terms of throughput and error rate, but worse in SUS-survey and interviews, attributed largely to rigid and over-sensitive behaviour (See Section 5.5 Interviews). This motivated a reconfiguration of RHCM where cursor sensitivity was lowered and a noise filter was implemented. This re-configured version (RHCM-v2) was evaluated against ray-casting by participants in group 2. RHCM-v2 saw improvements in throughput, SUS-survey and interview sentiments. RHCM-v2 performed equal to higher throughput compared to ray-casting, and scored equal to ray-casting in SUS-survey, and got more positive sentiments in interviews.

RHCM-v1 and ray-casting produced equal throughput data overall (see Figure 12), showing only a significant difference in small target acquisition where RHCM-v1 performed slightly better than ray-casting. RHCM-v2 however, showed a significant improvement in throughput over ray-casting, producing higher throughput overall, in small target amplitudes, and medium target width (see Section 5.2 Throughput). Results show an indication of performance being better in small targets with RHCM-v2 than ray-casting as well, although this could not be statistically proven. RHCM produced lower error rate overall compared to ray-casting in both groups, but when comparing averages no statistical significance could be attained.

The quantitative data is reflected in qualitative data in terms of accuracy with small objects and acquisition over large distances. In small target acquisition, RHCM-v2 was preferred over ray-casting, while ray-casting is perceived as more ergonomic than RHCM-v2. This is most likely an outcome of the difference in implementation between the techniques, where ray-casting controls cursor position by pointing angle, and RHCM by hand position.

Traversing large distances with the cursor requires more effort with RHCM compared to ray-casting, and making small adjustments requires more effort with ray-casting comared to RHCM. This is consistent with interview sentiments of ray-casting being preferred in the UI navigation segment of the experiment, but not in the text-input segment.

RHCM was designed using autobiographical design, as described in section 3.1 Design Method. One caveat with using autobiographical design is that it cannot produce broadly generalizable results [22]. During the design phase, long term self-use of the technique made the author subconsciously able to differentiate between noise and intentional cursor movement, and thus performed much better with high cursor sensitivity than first-time users. This became evident during interviews with subjects in group 1 who described the cursor as overly responsive and rigid, compared to ray-casting which was perceived as more fluid. The subsequent implementation of a noise filter and adjusted cursor sensitivity proved to have a positive effect, and RHCM received much more positive SUS-rating and sentiments in

(19)

interviews from group 2. The experiment showed that with RHCM, sensitivity is a factor which affect usability with the technique, in contrast to ray-casting where the cursor position is determined by a pointing angle and thus no sensitivity setting can be applied.

The data suggests that RHCM is a preferable technique to ray-casting in tasks with small target acquisition over small or medium distances, such as text input. But less useful in situations where accuracy is not a main concern, such as UI navigation with large target to medium targets.

6.1 Impact of Covid-19

The Covid-19 epidemic limited the ability to recruit participants, affecting the scope of the study. Thus, the results might have shown greater statistical significance in a larger study. Also, as Covid-19 put increasingly stricter restrictions on the ability to meet others in person, early prototype testing on participants was deemed an impractical and immoral method of design evaluation. A choice was therefore made to embrace an autobiographical research method during design of the technique, which has shortcomings where generalizable results are difficult to achieve [22].

Another way which covid-19 affected the study was the locations of the trials. Due to restrictions of available locations and safety-concerns of the participants, all trials had to be conducted in the homes of the participants. In order to ensure similar tracking conditions between trials, efforts were made to conduct the trial in a windowless room illuminated by one ceiling lamp and objects on the floor were removed such as mats and chairs. While this does not guarantee that the conditions are equal between subjects, the conditions were kept identical within subject test.

6.2 Future Research

The results from the experiment showed that RHCM is preferable in cases where accuracy is required, but less in situations where large cursor translations has to be made. To combat this, variations in orientation and sensitivity of RHCM could be evaluated to find ideal configurations of the technique regarding interactions which does not require much accuracy. It would also be useful to explore interaction techniques for switching between ray-casting and RHCM depending on use case scenario. A limitation of RHCM is that it has no means of depth selection. For increased utility, a solutions for this should be explored where depth cues are gathered by other means than pointer angle, for example. gaze direction. R. Kopper et al showed that their continuous progressive refinement technique increases accuracy at the cost of pointing - cursor mismatch [14]. With RHCM, pointing at the target is not required, and is therefore not affected by the shortcomings of this technique. This makes continuous progressive refinement a potentially well suited implementation with RHCM in future research.

7 Conclusion

RHCM is presented as an alternative 2D-selection acquisition technique to ray-casting. RHCM was designed and implemented into a prototype which was evaluated using Fitts’s law selection tasks and guided UI navigation / text-input tasks. Two variations of RHCM, differing in cursor response configuration, were compared to ray-casting.

One variant, RHCM-v1, produced comparable quantitative results, but low qualitative results. The other variant, RHCM-v2, performed better or equal to ray-casting in both qualitative and quantitative results. Results were favourable when target widths were medium to small, and distances between them were small. In all other tasks, RHCM-v2 performed equal to ray-casting. Overall, the results indicate that a relative hand position mapped solution performs better than ray-casting in target acquisition tasks which require high accuracy, and performs equal in others. However, in order to present a convincing alternative to ray-casting, further work should explore variations in the configuration of the technique, as well as solutions to improve performance in tasks with large amplitudes.

14

(20)

References

[1] William Albert and Thomas Tullis. Measuring the user experience: collecting, analyzing, and presenting usability metrics. Newnes, 2013.

[2] Ferran Argelaguet and Carlos Andujar. A survey of 3d object selection techniques for virtual environments.

Computers & Graphics, 37(3):121–136, 2013.

[3] Felipe Bacim, Regis Kopper, and Doug A Bowman. Design and evaluation of 3d selection techniques based on progressive refinement. International Journal of Human-Computer Studies, 71(7-8):785–802, 2013.

[4] Aaron Bangor, Philip Kortum, and James Miller. Determining what individual sus scores mean: Adding an adjective rating scale. Journal of usability studies, 4(3):114–123, 2009.

[5] Doug A Bowman and Larry F Hodges. Formalizing the design, evaluation, and application of interaction techniques for immersive virtual environments. Journal of Visual Languages & Computing, 10(1):37–53, 1999.

[6] John Brooke. Sus: a “quick and dirty’usability. Usability evaluation in industry, page 189, 1996.

[7] Wei-Chi Chien and Marc Hassenzahl. Technology-mediated relationship maintenance in romantic long-distance relationships: An autoethnographical research through design. Human–Computer Interaction, 35(3):240–287, 2020.

[8] Facebook Technologies, LLC. Hand tracking documentation page, 2020. https://developer.oculus.com/

documentation/unity/unity-handtracking, Last accessed on 2020-10-16.

[9] Facebook Technologies, LLC. Oculus developer blog: Hand tracking - best practises, 2020. https://developer.

oculus.com/learn/hands-design-bp/,Last accessed on 2020-10-16.

[10] Facebook Technologies, LLC. Oculus developer release notes: v12, 2020. https://developer.oculus.com/

blog/oculus-developer-release-notes-v12, Last accessed on 2020-10-16.

[11] Scott Frees, G Drew Kessler, and Edwin Kay. Prism interaction for enhancing control in immersive virtual environments. ACM Transactions on Computer-Human Interaction (TOCHI), 14(1):2–es, 2007.

[12] Tovi Grossman and Ravin Balakrishnan. The bubble cursor: enhancing target acquisition by dynamic resizing of the cursor’s activation area. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 281–290, 2005.

[13] John Carmack. 3d interfaces are usually worse than 2d interfaces, 2019. https://www.facebook.com/

permalink.php?story\_fbid=2407256322842204&id=100006735798590, Last accessed on 2020-10-16.

[14] Regis Kopper, Felipe Bacim, and Doug A Bowman. Rapid and accurate 3d selection by progressive refinement.

In 2011 IEEE Symposium on 3D User Interfaces (3DUI), pages 67–74. IEEE, 2011.

[15] Joseph J LaViola Jr, Ernst Kruijff, Ryan P McMahan, Doug Bowman, and Ivan P Poupyrev. 3D user interfaces:

theory and practice. Addison-Wesley Professional, 2017.

[16] Jiandong Liang and Mark Green. Geometric modeling using six degrees of freedom input devices. In 3rd Int’l Conference on CAD and Computer Graphics, pages 217–222. Citeseer, 1993.

[17] Andrés Lucero, Audrey Desjardins, Carman Neustaedter, Kristina Höök, Marc Hassenzahl, and Marta E Cecchi- nato. A sample of one: First-person research methods in hci. In Companion Publication of the 2019 on Designing Interactive Systems Conference 2019 Companion, pages 385–388, 2019.

[18] I Scott MacKenzie. Fitts’ law as a research and design tool in human-computer interaction. Human-computer interaction, 7(1):91–139, 1992.

(21)

[19] I Scott MacKenzie. Fitts’ law. Handbook of human-computer interaction, 1:349–370, 2018.

[20] Kirsti Malterud. Systematic text condensation: a strategy for qualitative analysis. Scandinavian journal of public health, 40(8):795–805, 2012.

[21] Mark R Mine. Virtual environment interaction techniques. UNC Chapel Hill CS Dept, 1995.

[22] Carman Neustaedter and Phoebe Sengers. Autobiographical design in hci research: designing and learning through use-it-yourself. In Proceedings of the Designing Interactive Systems Conference, pages 514–523, 2012.

[23] Robin Schl¨unsen, Oscar Ariza, and Frank Steinicke. A vr study on freehand vs. widgets for 3d manipulation tasks. In Proceedings of Mensch und Computer 2019, pages 223–233. 2019.

[24] Daniel Vogel and Ravin Balakrishnan. Distant freehand pointing and clicking on very large, high resolution displays. In Proceedings of the 18th annual ACM symposium on User interface software and technology, pages 33–42, 2005.

16

(22)

TRITA-EECS-EX-2021:69

Object Selection by Relative Hand to Cursor Mapping: Design and Evaluation of a 2D-Object Selection Technique for Hand Tracking in Virtual Environments

INOM

EXAMENSARBETE MEDIETEKNIK, AVANCERAD NIVÅ, 30 HP

STOCKHOLM SVERIGE 2021 ,

Object Selection by Relative Hand to Cursor Mapping

Design and Evaluation of a 2D-Object Selection Technique for Hand Tracking in Virtual

Environments

MIKAEL DAHLGREN

KTH

SKOLAN FÖR ELEKTROTEKNIK OCH DATAVETENSKAP

Object Selection by Relative Hand to Cursor Mapping

Design and Evaluation of a 2D-Object Selection Technique for Hand Tracking in Virtual Environments

Abstract

Sammanfattning

Contents

1 Introduction

1.1 Object Selection

1.2 Motivation

1.3 Delimitation

2 Related Research

3 Design And Implementation

3.1 Design Method

3.2 Design Iterations

3.3 Implementation

4 Method

4.1 Procedure

4.2 Participants

4.3 Evaluation Application

4.4 SUS-Survey

4.5 Interviews

5 Results

5.1 Participants

5.2 Throughput

5.3 Error Rate

5.4 SUS-Survey

5.5 Interviews

6 Discussion

6.1 Impact of Covid-19

6.2 Future Research

7 Conclusion

References