A Comparison of WebVR and Native VR : Impacts on Performance and User Experience

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer Science

Master thesis, 30 ECTS | Datateknik

2020 | LIU-IDA/LITH-EX-A--20/058--SE

A Comparison of WebVR and

Native VR

–

Impacts on Performance and User Experience

Matteus Hemström

Anton Forsberg

Supervisor : Vengatanathan Krishnamoorthi Examiner : Niklas Carlsson

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin-istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam-manhang som är kränkande för upphovsmannenslitterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circum-stances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the con-sent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping Uni-versity Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Abstract

The Virtual Reality (VR) market has grown considerably in recent years. It is a technol-ogy that requires high performance in order to provide a good sense of presence to users. Taking VR to the web would open up a lot of possibilities but also poses many challenges. This thesis aims to find out whether VR on the web is a real possibility by exploring the current state of performance and usability of VR on the web and comparing it to native VR. The thesis also aims to provide a basis for discussions on the future of VR on the web. Two identical VR applications were built to make the comparison, one built for the web and one built for native Android. Using these applications, a two-part study was con-ducted with one part focusing on performance and the other on the user experience. The performance evaluation measured and compared performance for the two applications, and the user study used two separate questionnaires to measure the users experienced presence and VR sickness.

The performance study shows that the web application is clearly lagging behind the native application in terms of pure performance numbers, as was expected. On the other hand, the user study shows very similar results between the two applications, contradict-ing what would be expected based on performance. This hints at successful mitigation techniques in both the hardware and software used.

The study also suggests some interesting further research, such as investigating the relationships between performance, VR sickness, and presence. Other possible further research would be to investigate the effect of prefetching and adaptive streaming of re-sources, and how it could impact VR on the web.

(4)

Acknowledgments

We would like to express our greatest thanks to Niklas Carlsson. Without your help, motiva-tion and continued interest through these years, this thesis would never be finished.

A big thank you to Johanna and Mimmi for being our greatest supporters and always believing in us.

(5)

List of Figures

2.1 Valtech Store . . . 4

3.1 WebVR Overview . . . 13

3.2 WebGL Overview . . . 15

4.1 Implementation Overview . . . 26

4.2 Overview of the 3D Scene Building Process . . . 27

4.3 Component and Services Architecture . . . 28

5.1 Start view - WebVR application . . . 33

5.2 Start view - Native application . . . 33

5.3 Teleport indicator - WebVR application . . . 33

5.4 Teleport indicator - Native application . . . 34

5.5 Information window - WebVR application . . . 34

5.6 Information window - Native application . . . 34

5.7 Wall material menu - WebVR application . . . 34

5.8 Wall material menu - Native application . . . 34

6.1 Performance Measurements . . . 36

6.2 Individual FPS Measurements . . . 37

6.3 SSQ score change between pre and post exposure (calculated as post score ´ pre score) . . . 39

6.4 SSQ total score change between pre and post exposure without outliers . . . 39

6.5 IPQ scores for native compared to web (zero means equal scores, positive means native scored higher than web) . . . 40

(8)

List of Tables

3.1 Standardized loadings of items on respective subscales . . . 10

3.2 SSQ categories . . . 17

6.1 Performance Data Description . . . 38

6.2 Native correlations . . . 41

(9)

C

HAPTER

1

Introduction

1.1 Motivation

In recent years, the popularity of VR in the consumer market has grown considerably. There are great challenges with VR technology since it is very sensitive to performance and other factors that may influence usability and presence.

The portability and mobility that comes with the combination of smartphones and the web has been proven very successful in the consumer market. Great investments in products such as Google Daydream VR and Samsung Gear VR show that there is a clear strive to take VR to the mobile web.

Taking new technology and products to the web is not always easy. History reveals that demanding applications is often built for native platforms to take use of performance and de-velopment tools often provided by the native platform. For example, there is still an ongoing battle with "native versus web" in the business of smartphone applications.

With the combination of WebGL and WebVR it is possible to build VR experiences using web technology only, allowing VR to utilize the success and accessability of the web platform. But there is a concern of performance; can real-time rendered VR on the web achieve the high performance requirements?

1.2 Aim

The aim of this thesis is to find out if VR is ready for the web and vice versa. The thesis will explore the current state of performance and user experience of real-time rendered VR on the web by comparing with native VR. The result of the thesis could be used as basis when choosing whether or not to use a native platform or the web platform to develop a VR application. It should be noted that there is a rapid development of the technologies used to develop VR and especially for VR on the web, thus the aim of this thesis is also to discuss and predict the future of VR on the web.

1.3 Research questions

1. How does a WebVR application compare to a native VR application in terms of user experience?

2. Can the web platform deliver the performance required for VR applications in terms of frame rate, latency and other performance metrics?

(10)

1.4. Delimitations

1.4 Delimitations

This thesis focuses on the comparison of native VR and WebVR in terms of performance and experienced presence. Although VR is available both as native and web applications on multiple platforms and hardware devices, this thesis will only perform measurements and user study on the Android platform and the GearVR headset due to limited time and access to hardware.

The performance and user experience could also differ depending on the development process and choice of tools. The applications used in this thesis will be developed using the game engine Unity and the JavaScript framework three.js. Other engines such as Unreal or similar will not be taken into consideration.

(11)

C

HAPTER

2

Background

The comparative study of this thesis was done at the Stockholm office of Valtech. The ap-plications developed for the comparative study were VR experiences of a physical concept store, called Valtech Store. This chapter gives a short background about the company and Valtech Store.

2.1 Valtech

Valtech is a global IT consulting firm, specializing in digital experiences. Founded in France in 1993, Valtech has grown to more than 2500 employees in over 15 countries. In Sweden, Val-tech has 265 employees in two offices and aims to be a digital partner focusing on Val-technology, digital strategy and design.

2.2 Valtech Store

To stay relevant in a digital world, Valtechs customers need to be able to adapt and evolve regardless of business. This is also true for Valtech, who strives to stay up to date with trends and movements in the digital world. As a way to gather insights, knowledge, and to show what Valtech can offer customers, a smart online store concept called Valtech Store was cre-ated.

Valtech Store is a physical experiment store at the Stockholm office where a number of prototypes are shown. These prototypes demonstrates how new and emerging technologies such as Internet of Things, Big Data and Virtual Reality can be used in a physical retail space.

(12)

C

HAPTER

3

Theory

This chapter provides a theoretical background for the thesis. First, we present a brief expla-nation of the term Virtual Reality, followed by an introduction to some terms and methods for measuring and evaluating VR experiences. This is followed by a historical walkthrough of Virtual Reality. After this, technical descriptions of WebVR and the Unity game engine used for the native application are given. The chapter also describes why performance is a critical factor for VR, the side effects that poor performance might lead to, and presents some ways of mitigating performance issues.

3.1 What is Virtual Reality?

To the general public Virtual Reality (VR) is known by consumer available products such as the Oculus Rift and HTC Vive. The term is understood as a set of technologies that enables its users for immersive experiences that are even more captivating than high definition movies and games. But in the scientific community, Virtual Reality is often interpreted in terms of human experiences. Back in 1992, Jonathan Steuer introduced a definition of Virtual Real-ity that is not tightly coupled to technological hardware, but instead based his definition on "presence" and "telepresence". He defined presence as “the sense of being in an environment”, telepresence as “the experience of presence in an environment by means of a communication medium”, and Virtual Reality as “a real or simulated environment in which a perceiver expe-riences telepresence” [1].

Virtual Reality have other interchangeable terms such as "Artificial Reality", "Cyberspace", "Microworlds", "Synthetic Experience", "Virtual Worlds", "Artificial Worlds", and "Virtual En-vironment" [2] [3] [4]. Michael A Gigante stated in his book from 1993 [4] that "many re-searcher groups prefer to avoid the term VR because of the hype and the associated unreal-istic expectations”. He also writes that telepresence “represents one of the main areas of re-search in the VR community” which is also supported by the fact that the longest-established academy journal in the field is named "Presence: Teleoperators and Virtual Environments" [5].

3.2 History of VR

The ideas and concepts behind Virtual Reality far predates the term itself and can be argued to date back to the 1800s. In 1838, the first stereoscope was invented by Charles Wheatstone [6], using two images and mirrors to give the viewer a sense of depth. Technology has come a long way since then, but the concept of stereoscopic images is still used in todays virtual reality.

(13)

3.2. History of VR

3.2.1 The Beginning

In 1965, American computer scientist Ivan Sutherland introduced the concept of "The Ulti-mate Display" [7]. This concept describes the ultiUlti-mate display as “a room within which the computer can control the existence of matter.”, a description that fits well when applied to Virtual Reality technology of today. A few years later, Sutherland and his team at MIT built what is regarded as the first VR head-mounted display (HMD) system, called “The Sword of Damocles” [8]. The display was a large, ceiling-suspended device that a user strapped to his or her head. It was capable of displaying simple “wire-frame” shapes according to the users head movements. In the following years, plenty of research were done in the field improving both computer graphics and display capabilities, laying the foundation for what was yet to come.

3.2.2 The First Wave - 1990s

In 1982, an advanced flight simulator that used a HMD to present information to pilots, called Visually Coupled Airborne Systems Simulator (VCASS) was developed at the US Airforce Medical Research Laboratories. Two years later, in 1984, a stereoscopic HMD called Virtual Visual Environment Display (VIVED) was developed at the NASA Ames research center [4]. But it was not until the late 1980s that the actual term “Virtual Reality” was coined by Jaron Lanier, founder of the Visual Programming Lab (VPL) [9]. VPL Research was the first com-pany to develop and sell commercial VR devices, most notably the DataGlove and EyePhone HMD [3]. By the early 1990s, Virtual Reality had become very popular in both the scientific community and the general population. The potential of Virtual Reality started to show and a number of companies started to incorporate VR in their production and daily operations [10]. In 1994, Frederick Brooks held a lecture at University of North Carolina called “Is There Any Real Virtue in Virtual Reality?”. In this lecture Brooks assessed the current state of VR and concluded that “VR almost works” and listed a number of areas that needed to improve in order to fulfill Sutherlands 1965 vision. In 1999, Brooks revisited VR to see how much progress had been made in the five years since his lecture at UNC [11]. He stated “VR that used to almost work now barely works. VR is now real.” and found that VR was now ac-tually in use, not only in research labs but in actual production. He finished the report by presenting the main challenges that still needed to be solved: minimizing latency, rendering large models in real time, choosing the best type of display for each application, and improv-ing haptic augmentation. The research and further development of VR technology carried on but the consumer market and public interest collapsed and would not return until over ten years later.

3.2.3 The Rise of VR - 2010s

In 2012, a small company called OculusVR started by a VR enthusiast called Palmer Luckey reignited the VR hype. Being a VR enthusiast, Luckey collected old VR HMDs for his own use. Eventually he grew frustrated with the poor performance and immersion these ’90s era HMDs provided and decided to build his own version which he called the Oculus Rift. One of the prototypes Luckey built eventually landed in the hands of John Carmack, creator of games such as Doom and Quake. Carmack, in turn, showed this prototype at the Electronic Entertainment Expo (E3) which led to articles on tech sites and suddenly the Rift was the hottest device of the gaming industry [12]. In August 2012, the recently founded OculusVR launched a crowd-funding campaign on Kickstarter for their VR HMD, the Oculus Rift. The goal was to raise $250,000 in order to produce 100 HMDs. After 24 hours, the campaign had raised over $600,000 and when it was over the number had reached almost $2,5 million [13]. After the success of the Oculus kickstarter and the developer kits that followed, VR was once again a hot topic and more HMDs soon followed. As of today, there are a number of HMDs available on the consumer market. Apart from the Oculus Rift, the main alternatives are the

(14)

3.3. Immersion

HTC Vive and the Playstation VR [14] [15]. There are also a number of HMDs that works by inserting a smartphone into the headset. Of these, the main players are the Samsung GearVR, Google Daydream and the very cheap Google Cardboard [16] [17] [18].

3.3 Immersion

Immersion describes to what extent a system can deliver an environment that shuts out the real world. A good computer display will increase immersion because it will be a able to provide greater detail. A set of noise canceling head phones will increase immersion because it will hide the sounds of the real world. A good head tracking device will improve immersion because the user will be less likely to notice the computer rendered frames lagging behind. Immersion is thus defined in technological terms, and was well described by M. Slater and S. Wilbur in 1997 [19].

In a paper written by B. Witmer and M. Singer in 1998, another interpretation of immer-sion is used. They define immerimmer-sion as “a physiological state characterized by perceiving oneself to be enveloped by, included in, and interacting with an environment that provides a continuous stream of stimuli and experiences” [20]. Immersion as described by B. Witmer and M. Singer is a psychological state affected by the quality of the virtual reality experience. They recognize technology as one of the factors that enables immersion.

In 1999, M. Slater writes in response to B. Witmer and M. Singer that he will use the term "system immersion" to denote his meaning, and "immersive response" to denote the meaning of B. Witmer and M. Singer. This distinction is also made explicit by Schubert et al. in a paper written in 2001, in which they denote B. Witmer and M. Singers meaning of immersion as "psychological immersion" [21].

To avoid confusion, we will follow the distinction made by Schubert et al. and simply use "immersion" to denote M. Slaters meaning while using the term "psychological immersion" to denote B. Witmer and M. Singers meaning.

3.4 Presence and Telepresence

Telepresence was first explained by Marvin Minsky in an article published in 1980 in which he wrote about telepresence as a remote presence in which you could feel, see and interact at a remote location through telecommunication and robotics [22]. Sheridan and Furness introduced a new journal, "Presence: Teleoperators and Virtual Environments" in 1992 and continued to use Minsky’s wording of telepresence only when dealing with teleoperation and remote control. Over time the two terms, presence and telepresence, started to merge and are sometimes used interchangeably. The International Society of Presence Research states that presence is a shortened version of telepresence [23].

In 1997, Lombard and Ditton published "At the Heart of It All: The Concept of Presence" [24] in which they examined the concept of presence by identifying the previous takes of pres-ence found in literature and explicating the concept as a whole. Some of the takes on prespres-ence are conceptualized from social perspectives while others focus on perceptual and psycholog-ical immersion. Lombard and Ditton summarize the concept as “the perceptual illusion of nonmediation” which occurs when “a person fails to perceive or acknowledge the existence of a medium in his/her communication environment and responds as he/she would if the medium were not there”, or as Biocca and Levy puts it “the real world is invisible” [25]. The Lombard-Ditton definition of presence is based upon perception, mediation, medium and environment; terms that are well discussed and explained by J. Gibson in his book "The Ecological Approach to the Visual Perception of Pictures" written in 1978 [26]. The book chal-lenges earlier work in perception and brings multiple fields (physics, optics, anatomy and visual physiology) into account as he suggests “an entirely new way of thinking about

(15)

per-3.4. Presence and Telepresence

ception and behavior”. He writes that the “environment of animals and men is what they perceive” which is not the same as the physical environment in which we also exist.

3.4.1 Measuring Presence

One of the first well known and established methods of measuring presence is the use of the presence questionnaire (PQ) and the immersive tendencies questionnaire (ITQ) introduced by B. Witmer and M. Singer in 1998 [20]. They base the presence questionnaire on the fac-tors thought to contribute to presence: Control Facfac-tors, Sensory Facfac-tors, Distraction Facfac-tors and Realism Factors. The factors in these groups were derived from earlier work and is de-scribed in the first issue of Presence: Teleoperators and Virtual Environments. They conclude that individuals that report a high PQ score tend to report less simulator sickness.

About one year later M. Slater criticizes the Witmer-Singer questionnaire in "Measuring Presence: A Response to the Witmer and Singer Presence Questionnaire" [27]. He writes that “The purpose of this note is to explain why I would never use the W&S questionnaire for studying presence - even though I am sure that in itself it can lead to useful insights about the nature of VE experience”. His reasoning is that the questions measure differences in the individuals and not the immersive system, meaning that for example individual skill and experience will influence the result. He also criticizes the scale and summarization of scores by arguing that “A question that has high variability across subjects in the experiment will obviously correlate more highly with the sum than one with a lower variance”.

Freeman et al. finds in 1998 that “prior training given to observers significantly affects their presence ratings” and discusses that measuring the subjective construct of presence is potentially unstable [28]. In "Using Presence Questionnaires in Reality" published in 1999, M. Usoh et al. warns that presence questionnaires such as the Witmer-Singer questionnaire should not be used for comparisons across environments since the subjects will “"relativise" their responses to presence questions to the domain of their given experimental experiences only”.

A good property of questionnaires is that they can be completed after the experiment, and thus does not interfere with the subjects focus during the experiment. It does however make the results influenced by the memory of the subject, thus events in the later part of the experiment might have greater impact on the questionnaire result.

There are many different versions of questionnaires to measure presence. In the sum-marization of presence measurement techniques by W. Ijsselsteijn and J. van Baren in 2004, there are 28 presence questionnaires evaluated which they report as the largest category of methods [29]. In 2014, C. Rosakranse et al. identifies the five canonical questionnaires as SUS (Slater, Usoh, Steed), PQ (Witmer, Singer), IPQ (Schubert, Regenbrecht, Friedmann), ITC-SOPI (Lessiter, Freeman, Keogh, Davidoff) and the Lombard-Ditton questionnaire [30].

The IPQ is made up of items from previous presence questionnaires. As such it contains items from SUS, PQ and other questionnaires. It was first presented in 2001 by Schubert et al. in an article published in Presence [21]. The article reports two exploratory factor analyses conducted as two sequential dependent studies, followed by a confirmatory factor analysis that resulted in 14 items. Schubert et al. report that three distinct presence components were identified: spatial presence (SP), involvement (INV) and experienced realism (REAL). Out of the 14 items, 5 items load on spatial presence, 4 items load on involvement, and 4 items load on experienced realism. There is one additional item, a general “sense of being there” (G1), that load on all three components. The IPQ is based and supported by previous work such as the ITC-SOPI. It supports the critique of Slater against the PQ. Further, it is a short question-naire with categorized items, allowing distinction between the presence components.

In 2000, M. Slater and A. Steed described a new method to measure presence in "A Vir-tual Presence Counter" [31]. The motivation for a new measurement method for presence is anchored in, among others, the work by Freeman et al. Because of the subjective nature of presence, the subjects can “clearly be influenced in their responses by the information

(16)

3.4. Presence and Telepresence

that they gather during the course of the experiment” [31]. Any data accumulated and re-sults derived thereof may be influenced by the expectations of the subjects. It is clear that both Freeman et al., and M. Slater and A. Steed aims towards a more objective measurement method. The Virtual Presence Counter method is based on the assumption that an individ-ual only acts towards one environment in any given moment. Even though an individindivid-ual perceives input from multiple environments, such as the virtual environment, the internal mental environment and the physical environment, each action performed by the individual will be a response to a single environment. The method presented in "A Virtual Presence Counter" counts each transition from virtual environment (V) to real world environment (R). A Markov Chain can be constructed with the two states and used to calculate the probability of being in the virtual environment, a value that can be used as measurement of presence. The transition V Ñ R is called a "break in presence" (BIP). The Virtual Presence Counter is more objective than presence questionnaires but it is not fully objective since individuals make sub-jective decisions to report a BIP. Another problem with the method is that it requires focus from the subject during the experiment, which in turn leads to less psychological immersion and presence.

There are strengths and weaknesses with both subjective and corroborative objective mea-surement methods. An underlying problem is that presence is a multifaceted term. Most measurement methods tend to be subjective because of the early statements by T. Sheridan that presence is “[...] a mental manifestation, not so amenable to objective physiological def-inition and measurement”, and therefore “subjective report is the essential basic measure-ment” [32]. In 2000, Ijsselsteijn et al. concludes that the most promising method for measur-ing presence is to use an aggregated combination of both subjective and objective methods [33].

3.4.2 IPQ - Igroup Presence Questionnaire

The Igroup Presence Questionnaire (IPQ) is a questionnaire and scale for measuring presence, first introduced in 2001 by Schubert et al. [21]. The authors argue that presence is a result of the users interpretation and mental model of the virtual environment (VE) and identify two cognitive processes that combine to create a sense of presence: the construction of a mental model and attention allocation. The authors propose that the sense of being in and acting from within a virtual environment, together with the sense of concentrating on the virtual environment while ignoring the real environment are two components included in the sense of presence. Based on this theory, also backed up by Witmer and Singer [20], the authors perform two exploratory factor analysis studies to identify items that make up the presence construct. The exploratory factor analyses used questionnaires from previous studies by Wit-mer and Singer [34], Ellis et al. [35], Carlin et al. [36], Hendrix [37], Slater, Usoh and Steed [38], Towell and Towell [39] and Regenbrecht et al. [40]. Following the exploratory factor analyses, a confirmatory factor analysis was performed to reach a good model for measuring presence.

Table 3.1 shows how the items load on their respective category. The general item G1 is not included in the table, but it loads strongly on all three categories [41].

(17)

3.5. Native VR

Table 3.1: Standardized loadings of items on respective subscales

Category Item Study 1 Study 2

SP Sense of VE continuing behind me 0.583 0.623

SP Sense of seeing only pictures ´0.686 ´0.643

SP Sense of being in the virtual space 0.756 0.741

SP Sense of acting in the VE 0.847 0.727

SP Felt present in the VE 0.821 0.789

INV Awareness of real world stimuli 0.740 0.652

INV Awareness of real environment 0.762 0.845

INV Attention to the real environment ´0.780 ´0.783

INV Captivated by the VE 0.724 0.695

REAL How real VE seemed in comparison to the real world ´0.753 ´0.633 REAL Consistency of experiencing the VE and a real environment 0.708 0.728 REAL How real VE seemed in comparison to an imaginary world 0.795 0.730 REAL The virtual world seemed more realistic than the real world 0.564 0.767

3.5 Native VR

The majority of native 3D VR experiences are developed in the same way as traditional 3D material and games. The most common way is by using a game engine such as Unity or Unreal [42]. Both Unity and Unreal have SDK integrations for the major VR HMDs, making it easy to get started with VR development. In this thesis, the native application is developed in Unity.

In general, native applications have higher performance since the code is more platform specific and can utilize hardware in a better way, as explained further in Section 3.7. Native Android applications running on Android 4.3 or newer can use OpenGL ES 3.0 [43], which allows for a number of performance improvements over OpenGL ES 2.0 [44]. WebGL 1.0 (which is explained further in Subsection 3.6.3) does only expose the OpenGL ES 2.0 feature set.

3.5.1 Unity

Unity is a cross-platform game engine that is popular for developing games for PC, console and mobile devices. Launched in 2005 by Unity Technologies it was initially only available for Mac OS X but Windows support has been added since that. The list of supported target platforms has also grown since launch and now supports a large number of desktop, mobile, console, TV and web platforms. Unity is also a common choice for VR development and currently has support for Oculus, OpenVR, GearVR and Playstation VR [45]. Building a native VR application in Unity lets developers access VR devices directly from Unity through a base API compatible with multiple VR platforms. A number of automatic changes are made when VR is enabled, such as setting the correct render mode and FOV for HMDs and enabling head tracking input to the camera of the scene [46].

3.6 VR on the Web

When the web emerged in the early 1990s, it was mainly a collection of interlinked docu-ments. The typical document was a static web page written in HTML which was down-loaded, parsed and rendered by a web browser. By the late 1990s, dynamic web pages gained popularity which was achieved by embedding client-side scripts in web pages. The scripting language named JavaScript was created and popularized, as was the style sheet language called CSS.

(18)

3.6. VR on the Web

The evolved set of technologies from the 1990s enabled the rich web applications of the 2000s. The web showed promise as a platform for software applications as they “look and act more like desktop applications” [47]. Web technology was no longer used to display documents, but to create rich applications such as Google Maps. When introducing Google Maps via the official Google blog in 2005, Bret Taylor writes “Click and drag the map to view the adjacent area dynamically - there’s no wait for a new image to download.” [48].

3.6.1 Web Standards

The web provides an open platform that can be further developed by anyone. Thus, the web never seems to settle down; new technology, requirements and ideas are ever increasing. Open standards is an essential keystone to the success of the web. There are many groups working to create and establish standard specifications for emerging web technologies. One of the most important standards organizations for the web is the W3C, short for World Wide Web Consortium. The design principles of W3C encourages a "Web for all" and a "Web on everything" [49]. W3C define an umbrella of standards called the Open Web Platform that make up a major part of what makes the web.

WHATWG, short for Web Hypertext Application Technology Working Group, is a com-munity that plays an important role for the web standards. In 2000, W3C published the XHTML 1.0 specification, that "reproduce, subset, and extend HTML 4." [50] The WHATWG was founded in 2004 as a reaction to the direction W3C had taken with the introduction of XHTML, showing a lack of interest in HTML [51]. Since then, WHATWG has provided con-tinuously updated specifications documents without versioning for HTML, DOM, URLs and other web technology.

Some of the most important web technologies with well established standards are:

HTML "HTML is the World Wide Web’s core markup language." [52] W3C provides a ver-sioned standard for HTML, such as HTML 5.1 [52], while WHATWG provides a living standard, dropping the version number from the name and simply calls it "HTML" [53].

CSS Describes the rendering of structured documents. W3C provides an umbrella of CSS module specifications that together defines CSS. In the W3C document titled "CSS Snap-shot 2015" [54], 24 specifications are listed to define what CSS is.

JavaScript Most modern web pages use JavaScript, and can be considered the programming

language of the web [55, p .1]. In the late 1990s, JavaScript was standardized by ECMA (European Computer Manufacturer’s Association) and given the name ECMAScript. In general, the language is still referred to as JavaScript, even though the language standard is ECMAScript [56, Brief History] [55, p .265].

Document Object Model (DOM) An API that describes events, nodes and tree structure of

structured documents [57]. In practice, the DOM is the API exposed to JavaScript that enables web applications to read and update the structure and style of a document. Most of the development of the DOM is done by WHATWG, while W3C provide stable snapshots of the standard.

JavaScript APIs Both W3C and WHATWG has published a number of standards for

JavaScript APIs, such as the WebStorage standard. In general, W3C publishes JavaScript API standards separated from the HTML specifications [58], while WHATWG keeps the standards within the HTML specification document [53].

3.6.2 WebVR

With the release of the Oculus Rift and the newly ignited VR hype came the idea of bringing support for VR HMDs to the web. In 2014, work had begun at both Mozilla and Google to

(19)

3.6. VR on the Web

bring VR support to the web [59]. In March 2016, a proposal specification for the WebVR API 1.0 was announced and the W3C WebVR Community Group was launched [60] [61].

The Goal of the W3C WebVR Community Group is to help bring high-performance Virtual Reality (VR) to the open Web. Currently, access to VR de-vices is limited to specific native platforms. For example, Oculus Rift supports Windows and HTC Vive supports SteamVR platform only. With WebVR specs, VR experiences could be available on the Web across platforms. ([62])

The WebVR API aims to provide support for VR HMDs in web browsers by exposing access to VR displays and head tracking data to the web browser. This enables developers to read and translate movement and position data from the headset to movement in a three dimensional scene. A VR experience can be created by rendering a 3D scene using the per-spectives provided by the headset, and presenting frames to the headset display.

The typical steps to render a frame to a VR display, without error handling, is: 1. Get a VRDisplay object by using getVRDisplays (WebVR Specification line 198):

let vrDisplay;

navigator.getVRDisplays().then(displays => { vrDisplay = displays[0];

});

2. Get (or create) a canvas element and set it’s size so that it is large enough for both viewports (left viewport and right viewport). The recommended viewport sizes can be acquired through

getEyeParameters()(WebVR Specification line 24):

const leftEye = vrDisplay.getEyeParameters('left');

const rightEye = vrDisplay.getEyeParameters('right');

const canvas = document.getElementById('target-canvas');

canvas.width = Math.max(leftEye.renderWith, rightEye.renderWidth) _{* 2}; canvas.height = Math.max(leftEye.renderHeight, rightEye.renderHeight); 3. In response to a user gesture, pass the canvas to requestPresent() (WebVR

Specification line 99) and start a render loop specific to the VRDisplay:

vrDisplay.requestPresent([{ source: canvas }]).then(() => { vrDisplay.requestAnimationFrame(onVRFrame);

});

4. In a render loop, get the current projection and view matrices along with the current head pose using getFrameData() (WebVR Specification line 41), render each eye, and submit the frame using submitFrame() (WebVR Specification line 117):

let frameData;

function onVRFrame() {

vrDisplay.requestAnimationFrame(onVRFrame); // Update the frameData object

vrDisplay.getFrameData(frameData); // Render left eye

(20)

3.6. VR on the Web

Figure 3.1: WebVR Overview

render( frameData.leftProjectionMatrix, frameData.leftViewMatrix, frameData.pose.position, frameData.pose.orientation );

// Render right eye render( frameData.rightProjectionMatrix, frameData.rightViewMatrix, frameData.pose.position, frameData.pose.orientation );

// Capture current state of the canvas and display it vrDisplay.submitFrame();

}

As a whole, the WebVR specification extends 4 existing interfaces with WebIDL par-tials: the Navigator interface, the Window interface, the Gamepad interface and the HTM-LIFrameElement interface. In addition to the partials, the specification defines 9 other in-terfaces, one of which is an event that is tightly coupled with the Window interface. The architectural overview is visualized in Figure 3.1.

The VRDisplay interface (WebVR Specification line 1) is the main piece of the WebVR. It includes generic information of a VR device along with necessary functions to display content to its display. It is the core interface for accessing other WebVR interfaces associated with function or data related to the VR device, such as VRFrameData (WebVR Specification line 169) and

(21)

3.6. VR on the Web

The most important partial is the Navigator extension (WebVR Specification line 197) which provides access to the VRDisplay interface. It is a small extension, but acts as host of the functionality required to bootstrap a WebVR application because it can be used to check whether WebVR is enabled and to fetch the VRDisplay objects.

The Window interface extension (WebVR Specification line 221) defines a set of events that can be subscribed to, as such it is not a crucial interface that is required to create a VR experience. It does however allow the application to react to events such as when a VR device is connected or disconnected. A VR application can do without these events and instead require that the VR display is connected when loading the web application.

The Gamepad interface extension (WebVR Specification line 235) contains one single at-tribute called displayId, which can be used to identify if a Gamepad is associated with the specified VR display. It should be noted that the W3C Gamepad specification is still a work-ing draft as of January 2017 [63]. As such, the Gamepad API is not stable, has not reached W3C recommended status and is not very well supported by web browsers.

The extension of HTMLIFrameElement (WebVR Specification line 231) is to add the ap-propriate security requirements when combining IFrames and WebVR. A IFrame should not be allowed to access the WebVR API if not explicitly allowed by the ancestor.

3.6.3 WebGL

The idea of displaying 3D scenes on the web is as old as the web itself. Back in 1994, David Raggett wrote a paper "Extending WWW to support - Platform Independent Virtual Real-ity" [64] for the first WWW conference, describing his ideas of a platform independent 3D standard through a new declarative markup language called VRML (Virtual Reality Markup Language). The format became an ISO standard in 1997 and became known as VRML97. It enabled 3D scenes to be described in text, which could be transferred over the web and dis-played in a VRML viewer. VRML was superseded by X3D in 2004, which was more powerful and considered closer to the web as it supported the XML format [65, p .3]. To further close the gap between the web and 3D, X3DOM was developed which defined an XML names-pace with a<x3d>_{tag allowing X3D to be embedded in a HTML or XHTML page. Because}

of the declarative nature of the web page, 3D web applications struggled to meet the prac-tice of 3D programming which was to use imperative APIs such as OpenGL and DirectX. To fill this gap, Google developed O3D while Mozilla developed Canvas3D as attempts to give Javascript (which is imperative in nature) access to OpenGL or DirectX [66] [65, p .6]. Through these independent attempts emerged WebGL, developed and standardized through the Khronos Group.

The Khronos Group describes WebGL in the specification:

WebGL™ is an immediate mode 3D rendering API designed for the web. It is derived from OpenGL® ES 2.0, and provides similar rendering functionality, but in an HTML context. WebGL is designed as a rendering context for the HTML Canvas element. ([67])

As such, WebGL like X3DOM is not plugin-based but tightly coupled to the DOM. But in contrast to X3DOM, instead of manipulating the DOM tree to modify the 3D scene, the developer is in direct control of a OpenGL context through a JavaScript API.

As shown in Figure 3.2, the WebGL API is exposed to the developer by the getContext method of the canvas element. When a WebGLRenderingContext is created, WebGL also creates a drawing buffer in the process, onto which later API calls are rendered.

The drawing buffer is automatically presented to the HTML page. When exiting back into the event loop, thus allowing the web browser to render, the screen will be updated with the drawing buffer if changes has occurred.

(22)

3.6. VR on the Web

Figure 3.2: WebGL Overview

WebGL 1.0 is derived from OpenGL ES 2.0, and as such rendering is done in a similar manner. This allows programmers familiar with OpenGL to ease into WebGL development without having to learn a lot of new concepts.

3.6.4 Gamepad API

The Gamepad API is a W3C Specification that as of January 2017 is in the Working Draft stage. The Gamepad API enables developers to directly access and utilize gamepads and other controllers in web applications. It does this by providing interfaces that describes and interacts with gamepad data. For VR on the web, the Gamepad API is needed in order to use handheld controllers such as the Oculus Touch or HTC Vive controls, or the buttons and touchpad on the Samsung GearVR. To use a gamepad, a Gamepad object is accessed through an extension of the Navigator interface in the following way:

var gamepads = navigator.getGamepads();

This will return an array with all connected gamepads that have been interacted with [68]. The state of the gamepads can then be queried to check for pressed buttons or position of analog sticks in the following way:

var gp = gamepads[0];

// True if button with index 0 is currently pressed. gp.buttons[0].pressed;

// Value between -1.0 and 1.0 representing position of axes with index 0 gp.axes[0];

(23)

3.7. Web versus Native

3.7 Web versus Native

The debate on whether or not to develop mobile applications on the web platform or the na-tive platforms such as Android and iOS have been researched by A. Charland and B. LeRoux [69]. They list a number of factors that are important to consider when making the decision to develop a native or web application. These factors include cross-platform, programming lan-guage, API access, platform conventions, performance and more. A. Charland and B. LeRoux writes that “the performance argument that native apps are faster may apply to 3D games or image-processing apps, but there is a negligible or unnoticeable performance penalty in a well-built business application using Web technology”. VR applications require real time 3D rendering and can thus be victim of the performance argument.

There have been research in 3D performance web applications in comparison to native 3D applications. M. Mobeen and L. Feng claim that a WebGL call is now comparable to a native call because the JavaScript engine performance have been improved significantly in recent years [70]. R. Hoetzlein finds in a performance comparison that the native OpenGL application in his tests are about 7 times faster than the WebGL application [71].

There is a vast difference between how web applications and native applications load its assets. A web application must load assets using HTTP network requests, while a native application can load assets from disk. A web application must be careful not to load too many resources, because HTTP requests have high latency in comparison to loading from disk. Prefetching and level of detail are techniques that can be used to mitigate these problems, which are described later in Section 3.10.

3.8 The Importance of Performance

For a VR HMD to deliver a truly immersive experience and to actually convey a sense of presence, there are a lot of requirements. The fact that the display is right in front of you and takes up the entire field of vision raises the bar for performance and quality by a significant amount. To deliver a sense of presence through VR means that the vision and mind of the viewer needs to be convinced that the virtual images he or she is seeing is real. Michael Abrash names three broad areas that affect how we perceive virtual environments through a HMD: tracking, latency and the way that the display interacts with the user’s vision [72]. If these things are not done well it will impact immersion in a very negative way, resulting in loss of detail, graphical artifacts or the form of motion sickness that is commonly referred to as Virtual Reality Sickness when talking about VR.

3.8.1 Virtual Reality Sickness

One big drawback of immersive VR is the fact that some users experience symptoms very similar to those of motion sickness including nausea, eye strain, disorientation, headache and more. VR sickness is different from motion sickness due to the fact that the user does not actually move, the perceived motion is completely visually induced [73]. VR sickness is not a new phenomenon, a similar affliction known as simulator sickness has been known since at least the 1950s when flight simulators started to see use [74]. It is still not entirely known what causes VR sickness on a biological level and a few theories exist. The most common theory is sensory mismatch, which is when visual and other outside parameters are experienced differently by the human senses. For example if a user is experiencing motion in a virtual environment but in reality is sitting still would cause the visual system and brain to experience and expect a motion that the vestibular system does not feel. [75] There are also a number of technical factors that can contribute to VR sickness, such as errors in the positional tracking, latency, display refresh rate and image persistence [73] [76].

(24)

3.8. The Importance of Performance

3.8.1.1 Measuring VR Sickness

There are three primary ways of measuring VR sickness: questionnaires, postural instabil-ity, and physiological state. Questionnaires is the most commonly used method and was made popular with the Pensacola Motion Sickness Questionnaire (MSQ) in 1965. Today, the most widely used questionnaire is the Simulator Sickness Questionnaire (SSQ), developed by Kennedy et al. in 1993 [75]. The SSQ was derived from MSQ and made more suitable for what Kennedy et al. calls simulator sickness by removing parts that only concerned motion sickness [77]. The SSQ consists of a number of questions asking the user to rate the severity of symptoms on a four point scale. These scores are then computed for three individually weighted categories; nausea (N), oculomotor (O), and disorientation (D). A total score is cal-culated as the sum of all category scores multiplied by a constant. It should be noted that the SSQ does not provide an absolute measurement of sickness, but should be used as an instru-ment for correlation analysis. Table 3.2 shows the scoring and weights for each category in the SSQ, as well as the total score weight.

Table 3.2: SSQ categories

SSQ Symptom Weight

Nausea Oculomotor Disorientation

General discomfort 1 1 0 Fatigue 0 1 0 Headache 0 1 0 Eyestrain 0 1 0 Difficulty focusing 0 1 1 Increased salivation 1 0 0 Sweating 1 0 0 Nausea 1 0 1 Difficulty concentrating 1 1 0 Fullness of head 0 0 1 Blurred vision 0 1 1

Dizzy (eyes open) 0 0 1

Dizzy (eyes closed) 0 0 1

Vertigo 0 0 1

Stomach awareness 1 0 0

Burping 1 0 0

Total N O D

Weighted category score Nw =N ˆ 9.54 Ow=O ˆ 7.58 Dw =D ˆ 13.92 Total score is calculated as Nw+Ow+Dwˆ3.74

Each symptom is scored between 0 and 3 by the user, according to how much they expe-rience each symptom. The three categories N, O, D are then calculated by adding the user scores for the symptoms that are weighted with a 1 for each category in Table 3.2. As an example, if a user gives the "General discomfort" question a score of 2, N and O will get 2 added to their respective category score and D will not have anything added. This is done for all questions, giving a total score for each category. This score is then multiplied with the unit weight for each respective category to provide a more stable and reliable score. The total score is the sum of all weighted category scores multiplied by the total score weight 3.74. The

(25)

3.8. The Importance of Performance

weights used in the formula do not have any interpretive meaning, but are used to produce similarly varying scales to allow for easier comparisons [77].

3.8.2 Frame Rate, Latency and Persistence

Since VR is supposed to mimic the real world in a believable way, the performance of what you see needs to be as high as possible. Of course, it will never be as good as reality but hopefully technological advances will get us close enough for a good experience.

3.8.2.1 Frame Rate

Frame rate is the term for the frequency with which an image shown on a display is updated. Frame Rate is often measured and expressed in frames per second (FPS). The frame rate of a 3D application depends on the graphical complexity as well as the graphical computation power of the computer running the application. Another limiting factor is the fact that dis-plays have a limit to how fast they can update the shown image. This is commonly called the display’s refresh rate and is different from frame rate in that it includes repeating identical frames if the display has not received an updated frame from the source. Most traditional LCD displays today have a refresh rate of 60Hz, though displays capable of 120 or 144Hz are becoming increasingly common and some models are capable of over 200Hz. For VR HMDs, most have a refresh rate of around 90Hz, although truly immersive VR requires frame rates considerably higher than that. Michael Abrash theorizes that the refresh rate needed for the eye to not be able to differentiate VR from reality is between 300 and 1000Hz for 1080p reso-lution at 90 degree field of view [76].

3.8.2.2 Latency

Latency refers to the delay in time between an input to the system and the output produced and shown to the user. In the case of VR the output is the updated visual image shown in the HMD. For VR, the delay between head movement and updated images is one of the most critical factors in delivering a good experience and is often called Motion-To-Photon latency (MTP) [78]. Too much MTP latency is one of the primary causes of VR sickness [79]. Generally, latencies lower than 20ms tend to be unnoticeable by the human senses although some put the threshold for latency in VR systems as low as 3ms [80] [81].

3.8.2.3 Persistence

One visual artifact that is more commonly noticed in VR HMDs than in normal displays is smearing. This occurs when a user’s head is rotating while at the same time the eyes are focused on an object in the virtual world. Since the eye can focus and see clear images even during very fast rotations, a display that is refreshing images at a limited rate will end up looking smeared. This phenomenon happens when displays are full-persistence, meaning the pixels are lit during the entirety of the frame that the display is showing. One way that smearing can be minimized is by using low-persistence displays. With low-persistence, pixels are only lit for a short time during each frame but with higher intensity to compensate. This means that images are not displayed long enough to appear smeared during head rotation [76]. The most popular HMDs of today all incorporate low-persistence displays. Although low-persistence seems to solve one problem it might make another problem more noticeable, namely strobing. Strobing is the perception of multiple copies of the same image at the same time [82] and is mostly hidden if smearing is apparent. Strobing occurs when the distance an image moves between frames is greater than some threshold, often around 4-5 arcmin which would convert to 4-5 degrees/second of eye movement relative to the image at 60 frames per second [76] [83]. The clear solution to the strobing issue is increasing the frame rate. A

(26)

3.9. Real-time 3D

higher frame rate would reduce the time between frames and in turn the distance images move between frames.

3.8.3 Mitigation Techniques

Immersive VR obviously calls for high performance, both in terms of frame rate and MTP latency. Even though computers and smartphones has gotten more and more powerful for each year, maintaining a high frame rate and low MTP latency is still a challenge and might not always be possible. In order to help achieve this and to some extent mask the negative effects that occur when it is not possible, there are some techniques that can be used.

3.8.3.1 Time Warp / Reprojection

Time warp, also known as reprojection is a technique used in all of the major VR HMDs today. Time warp reduces the perceived MTP latency by modifying the generated image before sending it to the display to reflect movements that happened after rendering finished. Each rendered frame in VR is based on the positional data received from the HMD at the very start of the render loop. This means that at 60 FPS, where a new frame is displayed approximately every 17 ms, each image shown is based on positional data that is 17 ms old. Time warp reduces this latency by modifying the rendered image right before sending it to the display using newly captured positional data. When time warp is run separate from the rendering loop, it can also help to maintain a high frame rate. This technique is known as asynchronous time warp (ATW) and is the version most commonly used. Separating the rendering and warping process makes it possible for time warp to intervene when a frame is taking too long to render. If a frame is not rendered in time for submitting to the display it will cause the last shown frame to be shown again, causing a noticeable judder. With ATW, time warp can be applied to the last finished frame to reflect new movement, masking the missed frame and smoothing out visual artifacts that might otherwise occur.

3.8.3.2 Space Warp

Another technique that is only used in the Oculus Rift at the moment but is under develop-ment for the HTC Vive is what Oculus calls Asynchronous Space Warp (ASW). Where time warp only accounts for rotational movements of the user’s head, space warp goes one step further and also accounts for animations and movements of objects within the virtual world as well as movement of the first-person camera. When a VR application fails to maintain the set frame rate, ASW steps in just like ATW and modifies the last rendered frame to account for any movement and animations happening in the scene. When used together with ATW it will help VR applications to maintain low latency and high frame rates and allow VR to be run on less powerful hardware. Since space warp performs prediction and extrapolation of movements, there is a risk of visual artifacts when it fails. Some typical situations where ASW might fail are rapid changes in lighting and brightness, and rapid object movements where ASW needs to fill in the space left behind when the object moves.

3.9 Real-time 3D

A 3D scene is typically made up of a set of vertices. The process of rendering a 3D scene is a multi step process. A simplified, high-level description of these steps are:

1. The vertices of the 3D scene are projected into 2D screen space.

2. Each vertex is assembled into triangles which in turn are transformed through the ras-terization process into "fragments", that represent the area of the screen that the triangle occupies.

(27)

3.9. Real-time 3D

3. Each fragment is colored and can then be used to composite the final image by keeping the fragments closest to the camera for each pixel.

The process of rendering a 3D scene is called the graphics pipeline. In reality it is a more complex process than the steps described above and typically involves tasks such as lighting, texture lookups and shadow mapping [84]. Most of the steps above can be grouped into steps that can be effectively parallelized. One such step is the fragment operation step; the color of each fragment can be calculated in parallel [85, pp .880–881]. Because of this property of the graphics pipeline, it did not suit the CPU very well, which is better at executing a series of dependent operations. Thus, the graphics pipeline is normally executed on a GPU (Graph-ics Processing Unit) that in contrast to the CPU offers a much higher throughput (executed instructions per time unit) at the cost of having a high latency [85].

Up until the early 2000s the GPU was a fixed function processor. The 3D graphics pipeline was a single purpose engine and it enforced a specific step by step rendering process [85]. The GPUs of today enable a flexible and general purpose graphics pipeline, broadening and empowering new usage areas such as protein folding, while offering an astonishing amount of computational power [84].

Software developers uses graphic APIs such as OpenGL or DirectX to utilize GPUs. These APIs implements a graphics pipeline and enable a programmable pipeline through shader programming languages: HLSL (High Level Shading Language) for DirectX and GLSL (OpenGL Shading Language) for OpenGL [84] [85].

3.9.1 Measuring Real-time 3D Applications

When B. Goldiez et al. published "Real-Time Visual Simulation on PCs" in 1999 they dis-cussed the metrics and performance of 3D rendering of applications and hardware by intro-ducing a benchmark suite [86]. In regards to evaluating a PC graphics system, they write that “Average frame rate is typically the metric for such bench- marks, with triangle throughput and pixel fill rate not uncommon alternatives.” Although 1999 may seem like ancient times in regards to 3D applications, average frame rate is still a popular metric [71] [70] [87].

Frame rate is a rate metric as it is calculated by dividing the number of times an event (in this case a frame is produced) occurs over a given time. The frame rate is derived from the execution time required by a computer system to render a frame. D. J. Lilja writes in [88] that “program execution time is one of the best metrics to use when analyzing computer system performance” and that we can “[...] use them to derive appropriate rates”.

A problem with measuring programs, including 3D and VR applications, is that the inves-tigated programs and instrumentation programs execute on the same hardware. The result is that by observing the program, we change what we are trying to measure. As such, the measurement tools are not entirely predictable as they introduce error and noise. As such, it is important to be aware of the accuracy and precision of any results. Further, because of the complex nature of the computer, there are many variables such as cache misses that are hard to account for. This can produce outliers and introduce errors to the result, making it important to measure several times and to report mean, median and variance [88].

Computer processors of today often use dynamic frequency scaling to decrease generated heat and to save power, which is especially important on mobile devices with limited battery capacity [89]. While conducting a test, it is important to account for frequency and thermal changes of CPU and GPU as they impact performance. For example, if one test is conducted directly after the other, the processors may start with higher temperatures on the second test, causing lower clock rates, lower performance and a biased result.

3.9.1.1 Measuring Performance in WebGL

To measure rendering times and frame rate in WebGL applications, researchers in [90] and [87] force the WebGL canvas to redraw continuously while counting the number of times

(28)

3.10. Prefetching and Level of Detail

the scenes were rendered. Congote et al. report in [87] that they use setTimeout() function to create the redraw loop. However, Mozilla recommended the usage of requestAnimationFrame()function when performing animations because the callback interval will generally match the display refresh rate [91].

3.10 Prefetching and Level of Detail

A 3D web application require many resources, such as JavaScript, style sheets, images, and models. To keep load times low, it is possible to defer resources that are not required initially, and to begin with resources of lower quality. There is a trade-off between visual detail and performance. The process of adapting rendering quality to the executing environment is called level of detail, often shortened as LOD.

3.10.1 Prefetching

Prefetching is the process of fetching resources that may be accessed in the near future. Prefetching techniques are widely used many computer fields, such as hardware [92], com-pilers [93], and the Web [94]. In early work on prefetching in memory references by A. Smith in 1978, he states that “[b]y prefetching these pages before they are actually needed, system efficiency can be significantly improved.“ [92] Smiths pages are not web pages, but memory pages; but his statement is supposedly true for web pages as well. Prefetching for the Web was researched as early as 1995 when Padmanabhan et al. in [94] suggested new HTTP meth-ods (GETALL and GETLIST) to allow the server send both the HTML document and images in a single response. Their suggestions never cherished on the Web. But in 2004, Fisher et al. presented other techniques in Link Prefetching in Mozilla [95], which are the techniques used on the Web today, that is also standardized by W3C as resource hints. A Web server can propose resources using a HTTP link header or a HTML link element, which the browser can act upon:

Link: <texture.jpg>; rel="prefetch" <link rel="prefetch" href="texture.jpg">

By deferring HTTP requests and keeping initial resources to a minimum, the page load time can be decreased. The motivation being, as Padmanabhan et. al. put it: “People use the Web to access information from remote sites, but do not like to wait long for their results“ [96].

Similarly, when it comes to prefetching and network optimization for 3D-scenes and VR-applications, there are a several techniques to use aswell. The most obvious one is to only transfer the resources that are required initially, and wait with other resources until they are needed, as mentioned above. For advanced applications where a user can move freely between different scenes, this poses another challenge of optimizing the scene fetching by downloading only the scene that the user will move to next. This requires some sort of prediction or smart prefetching algorithm to avoid interruptions for loading resources. In [97], the authors present a player for branched video that uses prefetching policies, smart buffer management and parallell TCP connections for buffer workahead. The authors show that their implementation can provide seamless playback even when users wait with their branching decisions until the last second. A branched video is similar to a 3D application with multiple possible scene choices, where each scene can be seen as the equivalent of a possible video branch choice. As such, similar solutions for prefetching could be used for a 3D web application.

(29)

3.10. Prefetching and Level of Detail

3.10.2 Level of Detail

Level of detail (LOD) techniques can be used to initially load low quality textures and mod-els, and then swap them for high quality resources as they are downloaded. LOD is a well researched topic, but the main motivator for the topic has been to fully utilize the hardware, not to speed up page load. Luebke et al. write in [98] that ”[t]he complexity of our 3D models - measured most commonly by the number polygons - seems to grow faster than the ability of our hardware to render them.”, a statement that is magnified by the fact that VR applications has to render each polygon twice. While growing 3D models is a rendering problem, it is also a fetching problem for the Web platform.

There are a number of other ways to reduce the visual complexity of a 3D scene. One solution is to calculate what parts of a 3D-scene are hidden from view and then not render those objects at all. This is often done with Z-buffers [99] and occlusion culling [100]. Another approach that is highly applicable to VR applications builds on the fact that the human eye only senses fine detail within a circle of only a few degrees. This technique is called foveated rendering and works by tracking the user’s gaze and adapting the rendered resolution and LOD based on this. This is especially relevant for displays with a high field of view (FOV), such as VR headsets, where big performance gains can be had by not rendering the entire im-age in high quality. Studies have shown significant performance savings on both VR headsets and desktop displays when using foveated rendering [101] [102].

(30)

C

HAPTER

4

Method

To make the comparison between native VR and VR on the web, two VR applications were developed. One was a native Android application and the other a web application. The applications were made as identical as possible to allow for a fair comparison. The imple-mentations of these applications are described in Section 4.2.

A literature review was made, helping with the choice of evaluation method. The evalua-tion and comparison consisted of two parts, one that measured the applicaevalua-tion performance, and one that measured the presence and VR sickness through a user study. The user study was made separate from the performance measurements, meaning that no comparisons can be made between individual user sessions and the application performance in that session. The user study let users try both versions of the application in separate sessions and answer both SSQ and IPQ to measure VR sickness and presence.

Section 4.4 presents the performance measurement procedures while Section 4.5 describes the presence and VR sickness measurement procedures along with the user study. The rea-soning and theory behind this multi-method approach is described in Section 4.1.

4.1 Research Method - A Multi-method Strategy

Multi-method research as a distinctive research method was described in 1989 by Brewer and Hunter [103], although its origins can be traced back to 1959 when Campbell and Fiske [104] used a multi-method to study psychological traits. In 2006, Brewer and Hunter em-phasize that multi-method research does not imply that “one must always employ a mix of qualitative and quantitative methods in each project” [105]. This is what differentiates multi-method from mixed-multi-methods, which imply a mix of both qualitative and quantitative data [106]. A multi-method strategy is more flexible than mixed-methods as it advocates choosing a combination of methods based on the particular problem, instead of relying on a fixed set of methods.

4.1.1 Rationale for the Chosen Method

The literature review showed, as described in Subsection 3.8.1.1, that bad VR experiences can lead to VR sickness. As such, VR sickness was a construct of interest because it could potentially differ between the implementations.

The literature review revealed three major factors to be measured and compared between the web and native VR applications: presence, VR sickness and application performance. As shown in Subsection 3.9.1, measuring software is not entirely predictable. In addition, presence is a highly subjective construct and is hard to measure. By combining multiple mea-surements, presence, VR sickness and application performance, the validity of the research is improved. Brewer and Hunter argues that using a multi-method approach “tests the validity

A Comparison of WebVR and Native VR : Impacts on Performance and User Experience

Linköping University | Department of Computer Science

Master thesis, 30 ECTS | Datateknik

2020 | LIU-IDA/LITH-EX-A--20/058--SE

A Comparison of WebVR and

Native VR

Impacts on Performance and User Experience

Matteus Hemström

Anton Forsberg

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

List of Tables

C

1

Introduction

1.1

Motivation

1.2

Aim

1.3

Research questions

1.4

Delimitations

C

2

Background

2.1

Valtech

2.2

Valtech Store

C

3

Theory

3.1

What is Virtual Reality?

3.2

History of VR

3.2.1

The Beginning

3.2.2

The First Wave - 1990s

3.2.3

The Rise of VR - 2010s

3.3

Immersion

3.4

Presence and Telepresence

3.4.1

Measuring Presence

3.4.2

IPQ - Igroup Presence Questionnaire

3.5

Native VR

3.5.1

Unity

3.6

VR on the Web

3.6.1

Web Standards

3.6.2

WebVR

3.6.3

WebGL

3.6.4

Gamepad API

3.7

Web versus Native

3.8

The Importance of Performance

3.8.1

Virtual Reality Sickness

3.8.2

Frame Rate, Latency and Persistence

3.8.3

Mitigation Techniques

3.9

Real-time 3D