Visualizing Realtime Depth Camera Configuration using Augmented Reality

(1)

LiU-ITN-TEK-A--17/021--SE

Visualizing Realtime Depth

Camera Configuration using

Augmented Reality

Isabell Jansson

(2)

LiU-ITN-TEK-A--17/021--SE

Visualizing Realtime Depth

Camera Configuration using

Augmented Reality

Examensarbete utfört i Medieteknik

vid Tekniska högskolan vid

Linköpings universitet

Isabell Jansson

Handledare Ali Samini

Examinator Karljohan Lundin Palmerius

Norrköping 2017-06-08

(3)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

(4)

Abstract

This report documents a thesis project in Media Technology that has been carried out at SICK IVP AB and the Department of Science and Technology at Linköping University.

A Time-of-Flight camera from SICK IVP AB is used to monitor a region of interest which implies that the camera has to be configured, it has to be mounted correctly and be aware of the region of interest. Performing the configuration process currently induce the user to manage the captured 3-Dimensional data in a 2-Dimensional en-vironment which obstructs the process and possibly causes configuration errors due to misinterpretations of the captured data. The aim of the thesis is to investigate the concept of using Augmented Reality as a tool for facilitating the configuration process of a Time-of-Flight camera and evaluate if Augmented Reality enhances the understanding for the process. In order to evaluate the concept, a prototype application is developed. The thesis report discusses the motivation and background of the work, the implementation as well as the results.

It is shown that Augmented Reality enhances the understanding of the captured data from the Time-of-Flight camera as well as decreases the interpretation time for the data. It is also shown that intangible interaction is considered simple and natural in this context. Additionally, it is shown that Augmented Reality facilitates the configu-ration process and enhances the understanding of which area the camera is monitor-ing.

(5)

(6)

Acknowledgments

I would like to thank SICK IVP AB for the opportunity to carry out this master thesis project, and especially my supervisor Johan Falk for providing key insights to the thesis. Additionally, I would like to thank my supervisor Ali Samini and my examiner Karljohan Lundin Palmerius at Linköping University.

Finally, I would like to extend my gratitude to my family and friends for their sup-port during my studies.

(7)

(8)

5.3 User Interface . . . 25 5.4 Object hierarchy . . . 25 5.4.1 Hands Manager . . . 26 5.4.2 Box . . . 27 5.4.3 Camera . . . 28 5.4.4 Cartesian Box . . . 28 5.4.5 Corner Box . . . 29 5.4.6 Blob Receiver . . . 29 5.4.7 Blob Sender . . . 29 5.4.8 Point Cloud . . . 30 6 Usability Tests 33 6.1 Test 1 - Interaction with one virtual object . . . 33

6.1.1 Test design . . . 33

6.1.2 Results . . . 34

6.2 Test 2 - Evaluation of the concept . . . 34

6.2.1 Test design . . . 34

6.2.2 Results . . . 34

7 Discussion 37 7.1 Intangible interaction for this purpose . . . 37

7.1.1 Accuracy . . . 37

7.2 AR as a tool for this purpose . . . 38

7.3 Different potential solutions . . . 38

8 Conclusion and Future Work 41 8.1 Conclusion . . . 41

8.2 Future Work . . . 42

Bibliography 43 Appendices 45 A User Test 1 45 A.1 Inledande beskrivning . . . 45

A.2 Utförande . . . 46 A.3 Sammanfattning . . . 47 B User Test 2 49 B.1 Inledande beskrivning . . . 49 B.2 Utförande . . . 50 B.3 Sammanfattning . . . 51

(10)

List of Figures

2.1 The different images’ captured by the camera . . . 6

2.2 The 2D image (left) and 3D point cloud (right) captured by the ToF camera visualized side by side in SOPAS ET . . . 7

2.3 The impact of the orientation of the point cloud for interpreting the data . 7 2.4 Specify a region of interest through Polar Reduction . . . 8

2.5 Specify a region of interest through Cartesian Data Reduction . . . 10

3.1 Left handed coordinate system (left) and right handed coordinate system (right) . . . 17

3.2 Relation between the world coordinate system and an object coordinate system . . . 18

3.3 Data flow through the graphic pipeline stages in Direct3D 11 . . . 19

5.1 A hovered object is rendered with a bounding box . . . 24

5.2 Start menu . . . 26

5.3 User Interface . . . 26

5.4 Cartesian Data Reduction through HoloLens . . . 30

(11)

(12)

1 Introduction

This master’s thesis has been carried out at SICK IVP AB and the Department of Sci-ence and Technology at Linköping University. This chapter introduces the company and discusses the motivation and the goals for the thesis, as well as the the method and delimitations.

1.1 SICK IVP AB

SICK IVP AB, herinafter SICK, is a subsidiary of the German company SICK AG located in Linköping, Sweden. They develop Machine Vision solutions for industrial production and quality assurance.

1.2 Motivation

A Time-of-Flight (ToF) camera from SICK has several possible fields of application. It can be used to monitor a region of interest, automatize a machine, a process or other equipment that are used in industrial production etc. This implies that the camera needs to be configured, it has to be placed correctly and for example, if the camera is supposed to monitor a specific area the application needs to know which area it has to monitor. These areas are referred to as regions of interest and are currently specified through the software SOPAS Engineering Tool (ET) which is developed by SICK. To achieve the configuration of the regions of interest, the user has to under-stand the 3-Dimensional (3D) data that the camera registers and be able to interact with it through the 2-Dimensional (2D) interface in SOPAS ET. Visualizing 3D data on a 2-Dimensional (2D) screen is often problematic and data can easily be misinter-preted by the user. Currently there is no connection between the visualized data in SOPAS ET and the real world which obstruct the process. The regions of interest are specified in relation to the captured data, which induce that misinterpretation of the data will directly cause errors on the regions of interest. For further reading about this problem, see Chapter 2.

Augmented Reality (AR), or Mixed Reality (MR), is a technique that allows the real world to be enhanced with a virtual world. With AR it is possible to visualize the captured data in its real context, allowing the user to look at the captured point cloud

(13)

1. INTRODUCTION

as a virtual object overlaying the real world. The user would be able to configure the region of interest in relation to the real world and not in relation to the captured data, which currently is the case. This could maybe simplify the configuration process and reduce possible errors due to misinterpretations. For further reading about AR, see Section 2.2.

1.3 Aim

The aim of this thesis is to investigate the concept of visualizing and exploring real-world data in its real context through AR and if the configuration process of a ToF camera is facilitated by using AR as a tool during the process. Microsoft HoloLens [1], herinafter HoloLens, is the first standalone Optical See-Through Head-Mounted Dis-plays (OST-HMD) on the market and will be used for rendering in AR. It will be investigated how this device meets the requirements for this purpose and the need of facilitating the configuration process at SICK. As will be described in Chapter 2, the configuration process for a ToF camera provided by SICK is configured through SOPAS ET. There are two possible approaches for specifying the region of interest for the ToF camera:

• Polar Reduction - Reacts to changes in depth. Specified through two horizontal planes which represent the upper and lower limits of the region, as well as an angle representing the field of view for the area.

• Cartesian Data Reduction - Reacts to changes in height. The outer limits are spec-ified through a bounding box.

The aim of the configuration process is only to specify the region of interest for the ToF camera, the algorithms for monitoring the area are running inside the camera. In order to evaluate the concept of using AR as a tool for visualizing captured data from the ToF camera and simplify the configuration process, a prototype application for HoloLens will be created. The configuration process includes interaction with virtual objects and it is essential that this process is simple and intuitive in order to evaluate the concept of AR as a tool. The configuration process is performed on the factory floor where the machines are located and therefore it is important that the process includes as little equipment as possible, it has to be fast and simple to put on the device and get started with the process. It will be investigated if the supported gesture recognition in HoloLens is a good tool for the interaction process, if it is easy and gives the required accuracy for the configuration. Intangible interaction techniques are during previous research proved to be less accurate and more time consuming than tangible interaction techniques. However, HoloLens is a new device on the market which has intangible interaction as its main interaction technique, which in this case would benefit the environment. Though, intangible interaction has to meet the requirements for accuracy to be able to use it in the process.

(14)

1.4. Method

Two user tests will be carried out during the project. The first test will be a smaller pilot test with focus on evaluating the interaction with one virtual object in order to gain insight into how and why the interactions could be improved and if it is simple to use. The second test will be more comprehensive than the first with focus on conducting the configuration process and understanding the captured data in order to evaluate the concept of AR as a tool to facilitate the configuration for SICK. The concept of using AR as a tool to control and configure a ToF camera will also be discussed in relation to the current system.

1.4 Method

The work will be divided into three main parts:

• Prestudy of the need for the company and the current state-of-art in AR. • Developing a prototype application for configuring the regions of interest and

controlling the captured data through the HoloLens.

• Evaluation of the concept of AR as a tool for configuring the regions of interest with the prototype application as well as controlling the captured data for the ToF camera.

1.5 Research questions

The research questions are specified below.

• Does Augmented Reality enhance the understanding of the data captured by the ToF

camera compared to the current visualization tool?

The data is currently visualized through SOPAS ET with no connection to the real world which reduces the understanding. Visualizing the data in its real context gives direct feedback which would possibly simplify the configuration process and prevent misconfigurations.

• Can interaction with virtual objects be accomplished through gestures and hand

recog-nition such that it appears natural when using the HoloLens?

Using gesture interaction has been considered time-consuming during previ-ous work. Though, for the purpose of this configuration it is essential that the process includes as little equipment as possible. The process also requires in-teraction with bigger objects only and performing manipulations on objects re-lated to the real world which could maybe benefit of a process as similar as possible to interactions with real objects.

• Is the hand recognition in HoloLens precise enough to be applied in the configuration

process of the ToF camera?

Earlier studies indicate that hand interaction can be inaccurate and is therefore not recommended when precise results are needed. For the configuration pro-cess it is not required to specify correctly in terms of millimeters, and therefore it could be enough to use the hand for specifying the regions.

(15)

1. INTRODUCTION

• Does HoloLens meet the requirements to be used as a tool during the configuration

pro-cess of a ToF camera such that it facilitates the propro-cess and enhances the understanding?

Since this device is new on the market there are few earlier studies that can be applied directly on this application. The device could be enough for confirming the potential of using AR as a tool for the configuration process, but not meet the requirements for using it in the industry.

1.6 Delimitations

The delimitations for the project are stated below.

• HoloLens is the only available AR device and is therefor the only device that will be evaluated for this purpose. The thesis will focus on if AR is a suitable tool for controlling the captured data and configuring the camera and if the device, as well as intangible interactions, is a good choice for the purpose. • It will not be investigated how the ToF camera can be automatically detected by

the HoloLens to get the position and orientation of the camera and render the captured data from its correct view. A virtual camera will be aligned manually to the real camera in order to achieve the correct transformations.

• For evaluating the concept, it is not required to implement both of the algo-rithms for specifying regions of interest in the prototype application and there-fore only the Cartesian Data Algorithm will be implemented.

(16)

2 Background and Related Work

This chapter describes the current configuration process for a ToF camera and the problems during the process. It also discusses the current techniques available for investigating the depth data from the camera and its drawbacks. Further on, AR, available devices and the advantages and disadvantages of their technique are dis-cussed.

2.1 Current working procedure

SICK develop multiple sensors for different solutions among the production and quality assurance industry. A smart sensor can be used to facilitate heavy and dan-gerous work, monitor an area to keep track of changes, perform quality assurance in order to locate and remove products or components with production errors etc. A process involving smart sensors induce a healthier work place, increased production and less wastage which also affects the impact of the environment, depending on the application.

2.1.1 Device and configuration software

A Time-of-Flight (ToF) camera is a camera system that provides a depth image over a captured area. By measuring phase delay between a radiated and reflected IR wave that is launched from each sensor pixel, the distance to an object can be calculated from the electric charge values, the speed of light and the signal frequency [2]. Fur-ther on, the distances can be used to produce a 3D image. Though, this is not a 3D image in its correct definition since data from the object’s back will not be able to be registered by the camera and therefore not be able to be constructed. This type of reconstruction is often called a 2.5-Dimensional (2.5D) object [2].

The sensor Visionary-T is a 3D camera based on the ToF technique and is used for monitoring areas over time to keep track of changes. The camera provides realtime 3D data at up to 30 frames per second. The configuration process for the camera is currently performed through the software SOPAS ET which is developed by SICK. The tool can be installed as an application on the computer or used through a web

(17)

2. BACKGROUND ANDRELATEDWORK

interface. When the camera is configured it runs as a stand-alone device and its im-ages can be visualized through SOPAS ET. The connection between the Visionary-T sensor and SOPAS ET is established through an internet connection.

2.1.2 Analyze depth data through SOPAS ET

The camera measures the distance with the ToF principle in each point which is stored in a distance map, the reflectivity of the IR-light from the sensor which is stored in an intensity map, and calculates the confidence for each measured distance value and stores it in a confidence map. A dark pixel corresponds to a low value which in the distance image means that a shorter distance was captured. A shorter distance does probably have higher confidence which means that the corresponding pixel in the confidence map will be bright. The three images’ can be seen in Figure 2.11_.

(a) Distance map (b) Intensity map (c) Confidence map

Figure 2.1: The different images’ captured by the camera

The depth image is used to create a point cloud. The point cloud can be seen trough SOPAS ET as a 2D image, a 3D point cloud or side by side as illustrated in Figure 2.2. The pixels are colored depending on if the user is interested in the depth, intensity or confidence data.

SOPAS ET offers several tools for investigating the data in order to understand it. The scene can be moved, rotated and zoomed in order to view the point cloud from different angles and distances. The user is able to visualize a grid corresponding to the orientation of the floor and a virtual camera to facilitate the understanding of the point cloud’s orientation. If the orientation of the point cloud should correspond to the position of the grid, and to be able to configure the camera further on, the user has to specify the position and orientation of the sensor. An example is shown in Figure 2.3 where Figure 2.3a displays the point cloud of a room rendered upwards and Figure 2.3b displays the same room but after the point cloud has been rotated.

2.1.3 Configuration process through SOPAS ET

Once the orientation and position have been specified for the sensor, the regions of interest can be specified. By defining the regions of interest, the camera only regis-ters data in the defined area. The regions of interest can be specified through two

(18)

2.1. Current working procedure

Figure 2.2: The 2D image (left) and 3D point cloud (right) captured by the ToF camera visualized side by side in SOPAS ET

(a) Before rotating (b) After rotating

(19)

different approaches in SOPAS ET. The first method is called Polar Reduction where the camera outputs a curve over the nearest objects to the camera within the specified region. The user specifies the region by placing two horizontal planes defining the upper and lower limit of the region, an angle of view and the number of segments in the angle which will determine the accuracy of the output curve. An example of the process is shown in Figure 2.4. Figure 2.4a illustrates the arrangement, the camera is facing a shelf where only the upper shelf should be defined as the region of inter-est. Figure 2.4b illustrates how two planes delimits the upper and lower data and Figure 2.4c illustrates the scene when the angle has been specified. The figure also displays the angle which are divided into the specified number of segments and the output curve are rendered as the black line in the figure. If something is removed from the shelf, the curve will change which makes the camera able to react to the change. Figure 2.4d illustrates the result of the process.

(a) Arrangment - Only the upper shelf is relevant

(b) Place two green planes

(c) Specify angle of intrest (d) Overview of the result

Figure 2.4: Specify a region of interest through Polar Reduction

The second method, Cartesian Data Reduction, aims to output the highest data relative to a grid within a specified volume. The grid can be considered as a discrete height

(20)

2.2. Augmented Reality

map. The user defines the volume’s lower and upper limits for the X, Y and Z coor-dinate as well as the resolution of the grid. An example of the process can be seen in Figure 2.5, Figure 2.5a displays the arrangment where the camera should monitor a stack of boxes. Figure 2.5b illustrates the discrete height map displayed in SOPAS ET when the user has defined the volume and the resolution of the height map (number of segments in the grid) and Figure 2.5c displays the output from the camera in 2D and 3D. If a a box i removed, the discrete height map is changed which triggers the algorithm to alert.

2.1.4 Problems of the process

As described in the previous Section 2.1, SOPAS ET is used to configure the sensor and analyze the data. Even though the software provides a 3D point cloud it is only rendered in a 2D screen which potentially causes confusion for the user, it is difficult to sense the depth and the visualization method requires correct orientation of the point cloud in order to understand the data.

There is no connection between the captured point cloud and the real world which makes it difficult to understand the data and control it. Since there is no connection to the real world, distinct features are required in the real world in order to interpret the data. The process induce that the user has to switch between looking on the screen and looking at the real world to understand what the point cloud represents. SOPAS ET requires that the user manually specifies the height and the orientation of the camera in order to retrieve the correct configuration for it. SOPAS assumes that the camera is placed in(0, 0)in(x, y)and the specified height will affect which po-sition in the real world that corresponds to the origin in SOPAS. This is important to be aware of as the user since the region of interest is specified in SOPAS’ coordinate system, the user has to measure in the real world in relation to the origin to know how the region should be specified. This is time consuming and requires that the user knows exactly the mounting setting for the camera. Since no visual feedback from the real world is provided it is difficult to control that the specified height and orientation are correct, but also that the mounting of the camera in the real world is correct. As Figure 2.3a illustrated, incorrect orientation of the point cloud is diffi-cult to interpret which obstruct the process and eventually causes misconfigurations when the regions of interest are defined. Since the regions of interest are specified in relation to the point cloud and not to the real world, misinterpreting the point cloud directly causes misconfigurations of the regions of interest.

2.2 Augmented Reality

Augmented Reality (AR) techniques enhances the real world with virtual objects. In contrast to Virtual Reality (VR) techniques where the user is completely immersed inside the virtual world with the real world completely occluded, AR keeps the real world visible for the user. Virtual objects are overlaying real objects or composited with the reality. They are also interactive in realtime and visualized in 3D. In contrast

(21)

(a) Arrangment (b) Descrete height map

(c) Output of the Cartesian Data Reduction

Figure 2.5: Specify a region of interest through Cartesian Data Reduction

to VR, it can be seen as a supplement of the reality rather than a replacement [3]. AR is an example of Intelligence Amplification (IA), facilitate a task for a human using a computer. By combining virtual objects with the real world the user’s perception and interaction with the real world are enhanced and the virtual objects are used to display additional information to the real world that the user’s senses cannot detect directly [3].

(22)

AR is a possible tool for facilitating the analysis of captured data and configuring the ToF-camera since it allows the user to investigate the data in its real context. This section gives an introduction to available devices and techniques in the field of AR and discusses why AR can enhance the understanding on the data.

2.2.1 Devices and techniques

The AR-technique can generally be divided into two categories, independently of the type of device, marker-based AR and marker-less AR [4]. The difference between the techniques is the method for finding where an object should overlay the real scene. Marker-based AR uses a marker in the real world, e.g. a QR-code, for overlaying the object, while marker-less AR tries to detect features in the real world to use as a marker.

There are several devices available to explore AR, e.g. mobile devices and OST-HMD. With mobile devices the user still has to investigate 3D data in a 2D context, though it has a direct connection to the real world. Most of the available OST-HMD devices only display additional information to the real world, as text or symbols projected as a 2D image with no interaction with the real world, e.g. devices similar to Google Glass [5]. There are a limited number of devices that records the environment and en-ables the user to explore virtual objects together with the real world objects without requiring markers. A device similar to HoloLens is Oculus OST Rift [6], though this device does not enable hand tracking and is not a standalone computer which induce that the device is connected to computer. However, it enables a similar augmented experience of the objects. OST Rift is also significantly larger than HoloLens. For the kind of configuration process that is required at SICK, it is essential that the device is a standalone computer and requires as little additional equipment as possible. HoloLens is running the Windows Holographic platform under the Windows 10 operating system. It contains an internal measurement unit which includes an ac-celerometer, a gyroscope and a magnetometer, four environment understanding eras providing head tracking, one ToF camera providing hand tracking , a color cam-era, four microphones and one ambient light sensor. It uses an Intel 32 bit architecture and has a Custum-built Microsoft Holographic Processing Unit (HPU) in addition to the Central Processing Unit (CPU) and Graphics Processing Unit (GPU). It has 64GB Flash, 2GB RAM (2GB CPU and 1GB HPU). It has two see-through holographic lenses that generates holograms, 2 HD 16:9 light engines and automatic pupillary distance calibration since every user has a different interpupillary distance [1].

2.2.2 Interaction techniques

Interaction with virtual objects in AR can be divided into two categories: tangible and intangible interaction techniques [7]. Tangible interaction techniques refers to techniques where the user physically touches something, e.g. touch-based systems as mobile screens and device-based systems as keypads or wands. Device-based systems allows the AR-application to retrieve data from device’s accelerometer and compass to get position and orientation of the device, which could be used for

(23)

ma-2. BACKGROUND ANDRELATEDWORK

nipulating the orientation of the virtual object. On the contrary, intangible interaction induce that there is no physical connection between the user and the device. The input comes from midair gestures from the user’s hand or through speech etc. A system using midair gestures as input can either be marker-based with markers attached to the hand, or marker-less where the hand and finger are recognized by the system [7].

Experiments of different variants of tangible and intangible interaction through a marker-based and marker-less approach have been tested in terms of accuracy, time spent on the task, engagement and entertainment in previous works. For mobile devices a comparison between tangible and intangible interaction were made for selecting objects and menus as well as translating the object [7]. The overall re-sults indicates that using a touch screen gives the highest performance in terms of accuracy and time spent on the task, while a finger-based approach is considered more fun and engaging. Using a device could be considered as a compromise, in both performance and entertainment. A detailed list of the compared techniques are presented in Table 2.1 [7].

Another evaluation regarding hand-based interaction with interfaces in AR also indi-cates that tangible approaches are preferred when navigating through menus, partic-ipants during the tests explicitly stated interest for using an everyday object as a tool when interacting with virtual objects. [8]. Using a hand-gesture recognition system has limitations regarding the accuracy which obstruct the selection of smaller objects such as icons in the menu. In the context of AR, it is essential that the user is able to interact with smaller objects as menu items or other virtual objects in the augmented world. Hand-based interaction in AR often requires a specific gesture to perform an operation. For menu navigation, some systems limit the interaction to be performed by one hand while others consider two-hand interaction. However, two hands do not necessary result in a more effective navigation even though it is designed to han-dle more complex operations, but it provides a more immersive and natural feeling which also is an important aspect for intuitiveness and simple interaction [8]. An-other aspect of interacting with menus is the positioning of the menu. The position of the user interface is considered disturbing if it limits the user’s view, especially when larger menus are displayed. Results of the study indicates that the user prefers the ability to determine the position of the interface [8].

2.2.3 AR for enhanced understanging

There are several independent theoretical frameworks indicating that Augmented Reality can be used for enhancing the learning process and increasing the under-standing [9]. It is a cognitive tool which positions the user in a physical and social context which primarily connects AR to situated and constructivist learning theory. The situated learning theory implies that all learning occurs in a specific context and the interactions between people, places and objects etc. is proportional to the quality of the learning. Situated learning through an immersive interface, as in AR, is impor-tant in term of transfer, the application of knowledge learned in one situation that

(24)

Table 2.1: Comparison between different interaction techniques for mobile AR

Technique Remarks

Tangible methods

Freeze-View Touch (Touch screen where the screen is frozen during the interaction)

Compared to a gesture-based approach, the results indicates this solution to be less stressful and has a higher learning rate. Device-based interaction with midair

vir-tual objects (no physical connection be-tween the objects and the real world)

This method required the least time for translation tasks.

Touch-based interaction with midair vir-tual objects

Comparing to device-based and finger-based, this approach achieved the highest results for selecting tasks.

Intangible methods

Interaction with objects related to the real world (opposite to midair)

Objects attached to the real world in-tended to have better performance of the interaction than midair, specially for gam-ing application since it was considered en-gaging.

Finger-based (with markers) interaction with midair virtual objects

Less performance than touch-based and device-based, but more engaging due to the enhancement of the real world.

Bare-hand-based AR interface (User’s hand as marker for where the virtual ob-ject should be positioned)

High accuracy and recognition rate.

Gesture-based interactions through differ-ent devices

More entertaining and engaging com-pared to freeze-view touch based interac-tions. Several devices were tested together with a mobile device as screen: a mobile device, a Kinect depth sensor and a Prime-sense depth sensor. More work is required for higher accuracy, but meets the basic requirements for general interaction tasks and the user is able to operate naturally.

(25)

can be applied in another situation. The constructivist learning theory implies that new knowledge and understandings depends on the individual’s earlier experiences, knowledge, sociocultural background and context. It assumes that knowledge is embedded in the context of which it is used in; managing authentic tasks in realistic situations [9].

By interacting and manipulating invisible or abstract data in 3D through AR it is proved that the conceptual understanding is improved [10]. Another benefit of AR is that it enables visualizing 3D objects in its natural shape instead of on a 2D screen. It also enables navigation through gestures which gives a natural interaction with the objects.

A previous study of multi-dimensional data visualization with HoloLens [11] shows that the device enables rapid insight and easy comparison of the results in a natu-ral space while considering a larger data set than what would have been possible to achieve on a computer screen. Though, the experiment is not exploring the full po-tential of HoloLens, it has the popo-tential of visualizing higher dimensions of the data and even though it appears in the real world it does not interact with it. Despite this, it shows the advantages of exploring real-world data with AR tools.

(26)

3 Theory

This chapter aims to give the theoretical background of the essential areas for this thesis. It discusses design principles and how the usability is evaluated, as well as relevant theory behind 3D graphics.

3.1 Usability and Usability tests

When developing an application with human interactions it is recommended to have a user centered focus. This section aims to introduce relevant design principles to apply and how the usability is evaluated.

3.1.1 Design Principles

To develop a simple and user-friendly interface there are some guidelines available. These guidelines can be applied to almost every human-computer interaction system, though they have to be interpreted for each environment [12].

• Strive for consistency: Be consistent in the use of actions, terminology, consistent etc.

• Cater to universal usability: The interface should be able to adapt to the user’s experience, e.g. frequent users are able to use shortcuts and new users are able to find help if needed.

• Offer informative feedback: The system should provide feedback for every action. The feedback may vary depending on the action, a larger action should give a more substantial feedback.

• Design dialogs to yield closure: A sequence of actions should have a beginning, middle and end for giving the user the satisfaction of accomplishment.

• Prevent errors: Prevent users from making errors and if an error occurs, the user should be guided through the process in order to handle the error.

• Permit easy reversal of actions: If it is possible, actions should be reversible. This encourage the user to explore unfamiliar options since it gives the impression that errors can be undone if something goes wrong.

(27)

3. THEORY

• Support internal locus of control: Make the user feel like he or she is in control of the system.

• Reduce short-term memory load: Reduce short-term memory load by keeping the interface as simple as possible. The short-term memory for a human is limited. A good design of the user interface enhances the immersion of the application. Cog-nitive system with too much information causes stress and frustration for the user while too little information causes tiredness among the users. The immersion is en-hanced when the cognitive system are challenged to its border without exceeding the limits.

3.1.2 Evaluation

Usability tests are performed in order to evaluate a product by observing actual and potential users while using the product under controlled conditions. It gives the developer information about if the product follows its intended purpose and if the product is easy and user friendly. With a usability evaluation, developers are able to discover flaws in the design before the release and improve the product [13].

It is important to define a clear purpose of the test and inform the participant about it. The purpose will affect how much the moderator, the person in charge of the test, interacts with the participants. Earlier stages of the development requires more interaction than later stages since it may test prototypes or exploring different design alternatives. The type of test is also affecting the interactions, if the goal is to find issues and explore design possibilities more interactions are required than if the test is measuring performance. Since the participants purpose are to evaluate the product, there are no answers and they cannot make mistakes during the test. However, it is important for the moderator to be the leader of the tests. [13].

It is important to let the participants work on tasks without being interrupted by talking from the moderator and letting the participants explain in their own words what they experience. This will result in a more natural experience and a better result of the evaluation. The moderator is there to observe, listen and, if necessary, give the user tasks to explore and encourage the participants to explain what they experience. The amount of interactions may vary depending on the developing stage, earlier stages may require more guidance than later stages [13].

To prevent the moderator to influence the participants, the test can be conducted with prewritten instruction and questions that ensures that every participant re-ceives similar information. If it is possible, the moderator could leave the room to prevent influencing the participants [13].

3.2 3D Graphics

3D graphics considers modelling a 3D object, animation of the object and rendering of the scene. The scene is often rendered as a 2D image. This section discusses some key

(28)

3.2. 3D Graphics

theory of coordinate systems and transformations for handling objects in the scene as well as the rendering pipeline for 3D graphics in Direct3D 11.

3.2.1 Coordinates systems

Coordinate systems are required as a frame of reference for the space. A coordinate system for a 3D scene has three axes, x, y and z and the order and direction of the axes depends on whether the coordinate system is right- or left handed, the coordinate systems are illustrated in Figure 3.1 [14].

x y z x y z

Figure 3.1: Left handed coordinate system (left) and right handed coordinate system (right)

Several coordinates systems can be used to represent different objects in their own coordinate system. The world coordinate system sets the frame of reference for the world, while an object coordinate system is used to simplify calculations for an object. An object coordinate system can be transformed to the world coordinate system by a transformation, further reading in Section 3.2.2. Figure 3.2 illustrates an object which is positioned in (0, 0, 0) in object coordinates, but in (4, 1, 0) in world coordinates. The object also has a rotation around the z´axis with 150_.

Before rendering, objects are transformed into the camera coordinate system which can be seen as an object coordinate system for the camera.

3.2.2 Object Transformations

A transformation can be described by a 4ˆ4 matrix. Translation, rotation, scaling and shearing can all be described by a matrix and are used to set the position and orientation of an object in space. They are affine transformations which means that the bottom row is always[0 0 0 1]and a combination of these always result in an affine transformation as well. Suppose an object has n vertices vi, i P[0, n]. Each vertex can

be transformed by a translation, rotation, scaling and shearing which are denoted as T, R, Sscale and Sshear respectively and transform each vertex to a new position

v1

(29)

3. THEORY

x y

Object coordinate system

World coordinate system

1 1 1 1 1 z 1 z 1 1 (4,1,0)

Figure 3.2: Relation between the world coordinate system and an object coordinate system

transform [14] and is described in Equation 3.1. Matrix multiplication is associative which implies the equalization in the equation [15].

v1

i = (TRSshearSscale)vi = T(R(Sshear(Sscalevi))), i P[0, n] (3.1)

3.2.3 Perspective Projection

For rendering the scene as a 2D image the scene has to be projected onto an image plane which is performed through a perspective projection [14]. When rendering a stereo view for an OST HMD the object has to be transformed depending on which eye that is being rendered since it can be considered as two cameras.

3.2.4 Graphics pipeline

The graphics pipeline have minor differences depending on the Application Pro-gramming Interface (API). Available APIs are Open Graphics Library and DirectX. This section discusses the graphics pipeline stages in Direct3D 11 which is a DirectX API for rendering 3D graphics. An overview of the data flow from input to output through the stages in the pipeline can be seen in Figure 3.3 and a description of the stages follows below.

• Input-Assembler Stage: This is the first stage in the rendering pipeline which has the purpose to assemble primitive data ( e.g. points, lines and triangles) from the user-specified buffers into different primitive types (e.g. point lists, line lists and triangle strips) that will be handled in the following pipeline stages [16].

(30)

3.2. 3D Graphics Input-Assembler Stage Vertex Shader Stage Hull Shader Stage Tessellator Stage Domain Shader Stage Geometry Shader Stage Pixel Shader Stage Output-Merger Stage Rasterizer Stage Output-Merger Stage Memory Resources (Buffers, Textures, Constant Buffers)

Figure 3.3: Data flow through the graphic pipeline stages in Direct3D 11

• Vertex Shader Stage: This stage performes per-vertex operations (e.g. transfor-mations, morphing and per-vertex lighting) on the output data from the Input-Assemble Stage [16].

• Hull Shader Stage: Direct3D 11 includes three stages that implements tessel-lation where low-detail subdivision surfaces are converted into higher-detailed primitives which allows the pipeline to input a lower amount of polygons and still render in high detail. This reduces the memory consumption and improves the performance of the program. The Hull Shader Stage is the first stage in the tessellation pipeline and the remaining two stages follows below. It takes control points defining a low-order surface as input and transforms them into points that define a patch [16].

• Tesselator Stage: This stage subdivides a domain into smaller objects (e.g. tri-angles, points or lines) and has a canonical domain in a normalized coordinate

(31)

3. THEORY

system. It operates once per output patch from the Hull Shader Stage and out-puts texture coordinates and the surface topology to the following stage [16]. • Domain Shader Stage: The vertex position for a subdivided point can be

calcu-lated with the output control points defining a patch from the Hull Shader and the texture coordinates from the Tesselator stage [16].

• Geometry Shader Stage: Unlike the vertex shader, this stage ables operation on a full primitive and is able to use edge-adjacent primitives as input. It also has the ability to generate new vertices to the output. The output of the shader stage is appended to an output stream object with the topology point stream, line stream or triangle stream. The output is sent to either one, or both, the vertex buffer via the Stream Output Stage and the Rasterizer Stage. Depending on the device, this stage may not be supported and skipped [16].

• Stream Output Stage: This stage is continuously streaming vertex data from the previous stage (or the Vertex Shader Stage if the Geometry Shader is inac-tive) to one or more buffers in the memory. The data can be streamed back to a pipeline either through the Input-Assemble Stage or through shaders using a loading function [16].

• Rasterizer Stage: In this stage vector information from shapes or primitives are converted into a raster image. This includes clipping vertices to the view frus-tum, perspective projection, mapping primitives to a 2D viewport and decide how the Pixel Shader should be invoked [16].

• Pixel-Shader Stage: This stage enables per-pixel lighting and post-processing [16].

• Output-Merger Stage: The final rendered pixel is generated in this stage by using the pixel data from the Pixel Shader and results from previous shaders e.g. if the pixel is visible or not and blending the final color of the pixel [16].

(32)

4 Prestudy

A prestudy was conducted in the beginning of the work in order to gain insight into relevant areas and investigate which techniques to use. The investigated areas and the results follows below.

4.1 Investigation of the problem

An investigation of the problem was conducted to understand the problem of the current solution for configuring the regions of interest for a ToF camera. The current solution for verifying the data from a ToF camera and configure the regions of in-terest are performed through the software SOPAS ET. This induce the user to work in a 2D environment with 3D data which makes it difficult to sense the depth of the captured data and the data can easily be misinterpreted if the captured data has the wrong orientation. It also lacks connection to the real world which makes the process unintuitive and could cause errors due to misunderstanding of the data, the user has to specify the regions of interest relative to the point cloud and not to the real world. Misinterpreting the point cloud is directly causing errors of the configuration. This also implies that the user has to switch between looking at the 2D screen and the real world in order to match objects in the real world to the captured point cloud in or-der to interpret the point cloud. AR is a possible tool for the configuration process, though since this process often is performed at a factory floor it is essential that the process includes as little equipment as possible and uses a standalone device.

4.2 Interaction techniques in AR

An investigation about recent research regarding interaction in AR and how it can be applied to a possible solution for SICK was also conducted. Previous works indicate that tangible interaction is preferred when high accuracy is required but intangible interaction is considered more engaging and entertaining. Though, these evaluations were conducted with mobile devices and through other hand-tracking systems and not with HoloLens which is available during the thesis. It was chosen to implement intangible interaction even though previous research indicates that it is not to pre-fer. The configuration process does not require accuracy in terms of millimeters and therefore, the accuracy while using intangible interaction could be enough. Since

(33)

4. PRESTUDY

HoloLens is a new device on the market and is built to be used with intangible inter-action, it was chosen to investigate if this is accurate and simple enough to be used in the application and if it benefits the process to use the hand for manipulating virtual objects in a similar approach that a user would interact with a real object. It is also essential that the configuration process includes as little equipment as possible.

(34)

5 Implementation Details

This chapter gives an introduction to relevant implementation details and the object hierarchy in the prototype application.

5.1 Development environment

When developing applications for HoloLens there are two possible approaches, Di-rectX or Unity. Due to the low performance in HoloLens, it was chosen to use DiDi-rectX for the prototype application to keep the application as simple as possible. With Unity it is easier to add different features to the hologram, it is easier to display a cursor, add menus and buttons etc. But Unity also includes unnecessary components for this application which decreases the performance. DirectX only handles simple primitives as points, lines and triangles which induce more work and there are no tools for displaying similar features as in Unity, everything has to be handles by the developer. The open source wrapper for the DirectX API, SharpDX [17], is used for rendering. The application is written in C#.

5.2 Interaction with virtual objects

As discussed in Section 2.2.1 and 2.2.2, interaction with virtual objects can be accom-plished with several techniques. HoloLens enables information about the orientation of the head as well as position and gesture recognition for one hand (only one hand can be tracked at the time but both hands can be used). Microsoft has chosen to implement intangible interaction with their menus, sample holograms and windows where a cursor is controlled through the user’s gaze direction and the hand is used to click, hold and drag etc. The hand can also be used for adjusting the size of a window or object where the position of the hand is tracked.

For this application it was chosen to implement intangible interaction with only the hand as a tool for interacting with virtual objects e.g. aligning a virtual camera to the real camera for rendering the point cloud from the correct view or by specifying the region of interest for the Cartesian Data Reduction algorithm. Since intangible interaction does not give any haptic feedback, feedback has to be given by alterna-tive methods since it is important for the experience and helps the user understand

(35)

5. IMPLEMENTATIONDETAILS

what is happening [12]. When hovering a selectable object, a white bounding box is rendered around the object. According to the design principle Strive for consistency a consistent feedback system is easier to learn and understand for the user. Different virtual objects can have different appearance and therefore it was not motivated to use color, opacity or alternate between solid or wireframe rendering modes. Render-ing a boundRender-ing box also maintain the focus on the selected object.

To select and manipulate (move or rotate) a virtual object the hold gesture is used with a modified threshold for triggering a hold event. The hold gesture is the most similar gesture to how a user would grab a real object, though it was considered slow and ungainly if the user had to wait for the original hold event to be triggered. The gesture is performed by positioning the hand as can be seen in Figure 5.1 and then closing the thumb and the index finger together while performing the manipulation. According to et. al. [7], objects with connection to the real world works better with intangible interaction than objects with no connection to the real world.

To get feedback about if the HoloLens registers the hand position and where it is reg-istered, a cursor is rendered. It is rendered as a non-filled square when no object is selected, and filled during a manipulation. HoloLens is registering the hand position at the center of the hand and not at any finger tip. This causes the hand to be in front of the virtual object that the user wants to manipulate which was considered disturb-ing and obstructed the eyes from focusdisturb-ing on the object. An offset to the registered hand position is set to solve this problem. The cursor, as well as the white bounding box that appears when an object is hovered, are shown in Figure 5.1.

(36)

5.3. User Interface

5.3 User Interface

When working with DirectX with SharpDX there are no applicable libraries for creating and rendering interfaces and buttons in a 3D holographic view, rendering 2D objects together with the 3D holograms. Due to the time limitations it has not been prioritized to port an existing library to work with SharpDX and HoloLens or create a new library for this. The solution is to let the application switch between a 2D and a 3D context depending on if an interface should be rendered or the 3D hologram. The interfaces are XAML views which are created when the application launches. Depending on which menu that should be rendered, the application nav-igates to different XAML pages, enables different menus to be shown for the user. The interaction with XAML views, as well as the position and some design choices, are limited in HoloLens and cannot be modified. This implies that the user has to interact with the interface by navigating a pointer though the gaze and use the hand to click, which is the standard interaction technique in HoloLens. The device is using its environment mapping techniques for trying to position the XAML view on a wall if there is any. When it has found a position, it is possible for the user to reposition it in a suitable position, if it is necessary.

Initially when the application starts, the menu in Figure 5.2 is shown. The user is able to specify the IP address and port for the camera. When the user presses Connect to

camera, the application tries to establish a connection to the camera. If it successes, the text Connect to camera is changed to Connected and the button is disabled while the button Start Application is enabled. If the application cannot open the connection, the error message Could not connect to camera is shown. The interface is designed to prevent the user from making errors by enabling buttons when they should not be pressed, e.g. the application will not be able to start if the connection could not be established, and if an error occurs, the user is informed. This is according to the design principles presented in Section 3.1.1 [12].

When the user has pressed Start Application, the holographic view is launched. If the user opens a menu now, the interface in Figure 5.3 is shown. This menu contains all settings the user is able to control. The user can switch between different manipula-tion modes. Move is used to move an object, Rotate is used to rotate an object around the y´axis and Tilt is used to tilt the object back and forth. Apply Cartesian Data

Reduc-tionturns on or off the Cartesian Data Reduction algorithm and visualizes a Cartesian

Box, Section 5.4.4, and View Point Cloud is used to render or hide the captured data from the camera.

5.4 Object hierarchy

This section presents the object hierarchy and explains the ideas behind the main parts.

(37)

Figure 5.2: Start menu

Figure 5.3: User Interface

5.4.1 Hands Manager

The HoloLens recognize a few standard gestures as tapping and holding and the pur-pose of this class is to register the interactions and respond to the events. The gestures

(38)

5.4. Object hierarchy

are used to trigger events for manipulating the virtual camera as well as the region of interest.

5.4.2 Box

This is an abstract class to enable all boxes (camera, cartesian box and corner box) to be handled similar. A box can be selected by hovering the cursor over the object. If the ray that is launched from the user’s head in the direction towards the hand is intersecting the box, it is hovered and the bounding box is rendered around the object. The intersection is tested through a ray-box intersection test for an oriented bounding box, the bounding box is not aligned with the coordinate axes. If the user performs the hold gesture, the manipulation starts. A box could be manipulated by moving and rotating the box. The rotation can be performed around the z´axis as well as tilting the object back and forth. The new position for the object when it is moved is calculated by Equation 5.1.

Ý Ñ_x O =ÝÑxH+ ˆx∆r ÝÑx_H´ ÝÑx_F +ÝÑ_x E (5.1) In the Equation ÝÑ_x

O is the new position for the object, ÝÑxH is user’s head position,

Ý

Ñ_x_F_{is user’s hand positio,n and ˆx}_∆_{is the normalized direction vector from Ý}Ñ_x_H_{to Ý}Ñ_x_F which is calculated in Equation 5.2.

ˆx∆ = Ý Ñ_x_F_{´ Ý}Ñ_x_H ÝÑx_F´ ÝÑx_H (5.2) The ratio r is the ratio between the distance from the user’s head to the camera in the new direction and the distance between the user’s head and the hand position, Equation 5.3.

r = dO

dF

(5.3) The ratio is calculated once when a movement is started and dO and dF are defined

in Equation 5.4 and 5.5. dO = ˆx∆(ÝÑx_O_{´ Ý}Ñx_H) (5.4) dF= ÝÑx_F´ ÝÑx_H (5.5)

When the movement is initiated, the object should move from its initial position in-dependently of the start position for the hand. To achieve this, the position has to be corrected with a small error vector ÝÑ_x

Ewhich is calculated according to Equation 5.6.

The error vector as well as the ratio are calculated once per manipulation in contrast to the remaining part of Equation 5.1 which is calculated every frame.

Ý Ñ_x_E ₌ÝÑ_x_O_´₍ÝÑ_x_H₊ _ˆx_∆_r ÝÑx_H´ ÝÑx_F ) (5.6)

(39)

5.4.3 Camera

In order to render the point cloud from the correct view the application has to be aware of the position and orientation of the ToF camera. This is done by aligning a virtual model of the camera with the real camera. This class handles the virtual camera and calculates the final transformation matrix for the camera. The final trans-formation matrix Ttot is composed by a translation matrix Tmove and two rotation

matrices from the rotation around the y-axis Rrotateand tilting Rtilt, see Equation 5.7.

Ttot =TmoveRrotateRtilt (5.7)

The rotation around the y´axis is calculated by measuring the relative hand move-ment, ÝÑ_x

F(current)´ ÝÑxF(init), along the direction f orward ˆ up where f orward is the

forward direction from the user’s head and up is the direction upwards from the user’s head. The relative movement corresponds to the movement from the manipu-lation started until the current position of the hand. The angle is calculated according to Equation 5.8.

angle= (ÝÑ_x

F(current)´ ÝÑxF(init))¨(f orward ˆ up) (5.8) For the tilting, the angle only depends on the relative movement along the y´axis since the direction upwards will be consistent during all sessions.

5.4.4 Cartesian Box

A Cartesian Box represents the outer bounds in the Cartesian Data Reduction algo-rithm. This box has to be scalable to be able to represent any regions of interest that should be specified. In each corner of the box, a smaller box called corner box, is rendered as a handle to use for scaling. If the ray from the head in the direction to-wards the hand intersects a corner box the cartesian box should be scaled according to the hand movement. Depending of which handle that is selected, the sign of the scaling in x, y and z are different. The amount of the scaling in each axis is described in Equation 5.9.     w+∆_w w 0 0 0 0 h+∆_h h 0 0 0 0 d+∆_d d 0 0 0 0 1     (5.9)

In the equation, w, h and d are the width, height and depth respectively of the box, and ∆w, ∆h and ∆d are the relative movement from the initial position of the corner box to its new position calculated similar as a new position for an object, according to Equation 5.1 along each axis separately. After the scaling, the position of the box has to be corrected by the translation of the half relative movement in order to make the scaling appear in the direction of the selected corner box and not symmetrically around the center of the cartesian box.

(40)

The final transformation of a cartesian box is described in Equation 5.10 where Sscaling

is the scaling described above.

Ttot= TmoveRrotateRtiltSscaling (5.10)

The cartesian box is rendered as a transparent solid model with a wireframe for illuminating the edges of the box.

In order for the real camera to apply the Cartesian Data Reduction algorithm, the outer bounds for the cartesian box has to be sent to the camera. The real camera assumes that it is placed in the origin which induce that the coordinates for the cartesian box has to be transformed into the camera coordinate system. This is done by multiplying the final transformation matrix for the cartesian box, Ttot, with the

inverse camera matrix. The coordinates are sent to the camera by the blob sender class. If the camera registers anything inside the cartesian box, the box is changing its color.

Figure 5.4 illustrates a cartesian box that is used to specify the region of interest. The cartesian box is placed inside a shelf, and Figure 5.4a illustrates the cartesian box when it is empty. If something is registered inside, as illustrated in Figure 5.4b, the box is colored blue.

5.4.5 Corner Box

This class represents a box that acts as a handle for a corner that should be manip-ulable. This class has the similar intersection control as remaining boxes. The po-sition and orientation of a corner box is decided from its parent, a cartesian box. Though, the rotation has to be performed around the center of the cartesian box and not around the corner box itself.

5.4.6 Blob Receiver

This class handles the connection between the HoloLens and the ToF camera that receives and parses the data from the camera. The connection is established through a TCP-socket. The camera sends an array of bytes that has to be converted into a depth image where every pixel is an unsigned integer and corresponds to the depth at a position projected onto the image plane. The depth values maps 1:1 from an unsigned integer to millimeters in the world. It also receives the intensity image, the confidence image and information about the camera’s lens parameters which is needed to convert a pixel in the image plane to a position in the real world.

5.4.7 Blob Sender

The real camera uses different ports to send and receive data which induce that an-other connection has to be established in order to send back information to the cam-era. This class opens another connection through a TCP-socket and translates all variables that should be set for the camera, e.g. the outer bounds for the Cartesian Data Algorithm, into a binary stream and sends it in the camera.

(41)

(a) Nothing is detected inside the box

(b) Something is detected inside the box

Figure 5.4: Cartesian Data Reduction through HoloLens

5.4.8 Point Cloud

This class creates the point cloud that corresponds to what the camera is registering. The resolution of the point cloud depends on the camera resolution which is 144 ˆ 175 pixels. The point cloud is created from two 2D images, one storing the image coor-dinates and one storing the depth image from the camera. The first image is created with 144 ˆ 175 pixels which represents vertices in the position [0 ´ 174, 0 ´ 143, 0]

where the x and y coordinate correspond to the image coordinate in the depth image received from the camera. The vertices are loaded to the buffer once and the depth

(42)

image from the camera is loaded to the buffer every frame since it has changed since the last frame. The conversion from an image coordinate to a world coordinate is performed in the vertex shader. After the conversion the position of the point cloud corresponds to a position in the world in millimeters, HoloLens coordinate system maps to meters which induce a final scaling. This results in a point cloud assuming that the camera is positioned in[0, 0, 0]in the real world which is probably not the case. The point cloud has to be transformed to its correct position which is given from the camera position and rotated according to the virtual camera’s orientation. Since HoloLens uses stereo rendering, every vertex is multiplied with a projection matrix depending on which eye that is rendered. These projection matrices are provided by the HoloLens.

During the development it was chosen to render the point cloud as a solid trans-parent mesh together with a wireframe. Due to poor contrast of the holograms in HoloLens it is not sufficiently to render the point cloud as default points. A possible solution would be to increase the point size, but since the point cloud aims to visual-ize what the data sees it was evaluated as a clearer visualization method to render it as a solid mesh. Though, rendering as a solid mesh reduce the interpretation of the depth and is therefore complemented with the wireframe. The rendered point cloud is illustrated in Figure 5.5.

(43)

(44)

6 Usability Tests

Two usability tests were conducted during the development. The first test was a mi-nor pilot test of the interaction with a virtual object to gain insight into how different interactions are perceived and if the process is simple and natural. The second test aimed to evaluated the concept of visualizing the captured data in its real context, as well as evaluating the configuration process for the ToF camera through AR with intangible interaction. This chapter presents how the tests were conducted as well as some key results from the tests.

6.1 Test 1 - Interaction with one virtual object

The aim of this test was to investigate whether the interaction with the virtual camera was intuitive and simple and to get feedback about what needed to be improved for enhancing the experience. The interaction with the virtual camera can be applied to other objects as the a box defining a region of interest and therefore it was chosen to limit the test to only evaluate the interaction with a camera. During the evaluation the application was separated such that it did not handle the connection with the ToF camera, it only had a virtual camera that could be moved, rotated and tilted. The test can be found in Appendix A.

6.1.1 Test design

An introduction was given to the participants to inform them about the aim and the process of the test. They were also informed about the different gestures that the HoloLens is able to recognize. The participants were encourage to use the method

think aloudduring the test to inform the mediator about their thoughts. The

partic-ipants were given written instructions in order to guarantee that they all received equal information before the evaluation started. During the test, the participants were given specific tasks to perform. The tasks were designed to initially introduce the participant to the HoloLens and the standard interactions in order to make the participant aware of the device. The participants were guided to start the applica-tion and thereafter, step by step, try different interacapplica-tions. Lastly, the participants were asked to perform a more comprehensive task which included all of the previ-ous steps.

Visualizing Realtime Depth Camera Configuration using Augmented Reality

LiU-ITN-TEK-A--17/021--SE

Visualizing Realtime Depth

Camera Configuration using

Augmented Reality

Isabell Jansson

LiU-ITN-TEK-A--17/021--SE

Visualizing Realtime Depth

Camera Configuration using

Augmented Reality

Examensarbete utfört i Medieteknik

vid Tekniska högskolan vid

Linköpings universitet

Isabell Jansson

Handledare Ali Samini

Examinator Karljohan Lundin Palmerius

Norrköping 2017-06-08

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

Abstract

Acknowledgments

Contents

List of Figures

1 Introduction

1.1

SICK IVP AB

1.2

Motivation

1.3

Aim

1.4

Method

1.5

Research questions

1.6

Delimitations

2 Background and Related Work

2.1

Current working procedure

2.2

Augmented Reality

3 Theory

3.1

Usability and Usability tests

3.2