Evaluation of Intuitive VR-based HRI for Simulated Industrial Robots

(1)

University of Mälardalen

Master Thesis

Evaluation of Intuitive VR-based HRI for

Simulated Industrial Robots

Author:

Joonatan Mänttäri

Supervisor: Dr. Giacomo Spampinato Examinor: Prof. Lars Asplund

(2)

Abstract

While the accessibility and technology behind industrial robots is improving as well as be-coming less expensive, the installation and conguration of industrial robot cells still proves to be an expensive venture, especially for small and mid-sized companies. It is therefore of great interest to simulate robot cell installations, both for verication of system functionality as well as for demonstration purposes for clients.

However, the construction and conguration of a simulated robot cell is a time-consuming process and requires expertise that is often only found in engineers who are experienced with software programming and spacial kinematics. If the process were to be simplied it would bring great advantages not only concerning the more ecient use of the time of software engineers but also in marketing applications.

As this paper will show, the use of Virtual Reality (VR) in simulating, displaying and controlling robots is a well investigated subject. It has been shown thatVRcan be used to show robot simulation in more detail and to specify path movement in task programming. This paper focuses upon nding and evaluating an intuitive Human Robot Interface (HRI) for interacting with simulated robots using virtual reality.

An HRI is proposed and evaluated, using the Oculus Rift Head Mounted Display (HMD)

to display a 3-dimensional (3D) VRenvironment of a Robot Cell in ABB RobotStudio. Using marker-based tracking enabled by ARToolkit, the user's position in real world coordinates is forwarded to the virtual world, along with the position and orientation of a hand-held tool that allows the user to manipulate the robot targets that are part of the simulated robots program.

The system as an HRI was successful in giving the user a strong sense of immersion and giving them a much better understanding of the robot cell and the positions of the dened robot targets. All participants were also able to dene robot targets much faster with the proposed interface than when using the standard RobotStudio tools. Results show that the performance of the tracking system is adequate with regards to latency and accuracy for updating user position and hand-held tool when using a video capture resolution of 640x480.

(3)

Acknowledgements

I would like to thank my examinor and supervisors both at University of Mälardalen and at Sejfo AutoIT for their guidance and support during this thesis. I would also like to thank Sejfo AutoIT and Karl Ingström specically for giving me the opportunity to conduct such an interesting and motivating master thesis.

(4)

Acronyms

2D 2-dimensional. 2, 20, 25, 26, 29

3D 3-dimensional. iii, 1, 4, 8, 1830, 36, 37 API Application Programming Interface. 11, 13 AR Augmented Reality. iii, iv, 3, 11, 14, 2229, 32 CAVE Cave Automatic Virtual Environment. 30 CVE Collaborative Virtual Environment. 23 DTW Dynamic Time Warping. 24

FOV Field of View. 11, 30, 37, 7779 GMI Groups Manager Interface. 23 GUI Graphical User Interface. iii, 2, 3 HII Hardware Independent Interface. 24

HMD Head Mounted Display. iii, iv, 1, 4, 6, 10, 11, 18, 19, 2124, 26, 28, 30, 79 HRI Human Robot Interface. iv, 14, 24, 30, 41, 7274

IC Integrated Circuit. 24

IPC Inter-Process Communication. 34 IPD Inter-Pupilary Distance. v, 37 ISF Incremental Sheet Forming. 20 kNN k-Nearest Neighbor. 24 LED Light Emitting Diode. 10

MCAR Mobile Collaborative Augmented Reality. 28 OLED Organic Light Emitting Diode. 10

(5)

SDK Source Development Kit. 11, 30

SME Small and Medium-Sized Enterprises. 3 TCP Transmission Control Protocol. 34 TUI Touch User Interface. 2

VR Virtual Reality. iv, 1, 3, 4, 11, 1830, 33, 36, 37, 79 VRML Virtual Reality Modeling Language. 20

(6)

List of Figures

1 A Teach Pendant handheld device for monitoring and conguring ABB industrial robots. . . 2 2 The ABB RobotStudio Graphical User Interface (GUI) displaying the rotational

manipulation tool. . . 3 3 An illustration of the geometry involved in stereopsis. Image acquired from the

work by De Silva et. al [51]. . . 5 4 A side-by-side stereogram that is to be viewed with the wall-eyed method. Image

acquired from the report by Johansson [30]. . . 6 5 A side-by-side stereogram that is to be viewed with the cross-eyed method. Image

by Voklan Yuksel [57]. . . 7 6 The Holmes stereoscope, which utilizes prismatic lenses to direct and isolate the

users eld of view for each eye so that the dierent perspectives reach each eye without strain. Image by Davepape [23]. . . 7 7 An autostereogram showing a picture of a 3D mermaid if focused correctly. Image

from http://cdn.acidcow.com. . . 8 8 A simplied visualization of the parallax barrier controlling which pixels are seen

by what eye. Image from Wikimedia Commons [20]. . . 8 9 Clock-wise polarized light being turned into linearly polarized light by a quarter

wave plate in the viewer's glasses. The linearly polarized light is polarized along the transmission axis of the linear polarizer lter so that the image is passed. Counter clock-wise polarized light would become polarized along an orthogonal axis and instead be blocked. Image proivded by Wikimedia Commons [22]. . . . 9 10 An example of Liquid Crystal Shutter Glasses. The necessary electronics are

hidden under the frames. Image proivded by Amidror1973 at en.wikipedia [11]. . 9 11 An example of plastic red-cyan anaglyph glasses. . . 10 12 A demonstration of how two separate image feeds are displayed to each user with

the Oculus Rift HMD. Image from www.venturebeat.com. . . 10 13 An example of the how the image sent to the Oculus is distorted with barrel

distortion to account for the lens distortion. . . 12 14 The Oculus Rift Developers Kit out of the box. . . 12 15 A screenshot of the RobotStudio environment. . . 13 16 A summary of the method for marker-tracking and display of virtual models that

ARToolkit uses for Augmented Reality (AR). Image from RToolkit Documenta-tion page [13]. . . 14 17 The relationship between the marker coordinate system and the camera

coordi-nate system in ARToolkit. Image from the work by Kato and BillingHurst [31]. . 14 18 The results of the dierent image processing technigues utilized by ARToolkit.

(7)

19 The perpendicular vector pair v1 and v2 calculcated fromu1 and u2. Image from

the work by Kato and BillingHurst [31]. . . 16

20 Tracking range of ARToolkit depending on marker size. Image from ARToolkit homepage [1]. . . 17

21 Tracking error of ARToolkit depending on distance to and rotation of marker. Image from from ARToolkit homepage [1]. . . 17

22 The system set-up used by Natonek et. al [25] . . . 18

23 A picture of the Oculus Rift HMD system. . . 19

24 The view of the VR simulation in the work by Kopasci . . . 19

25 An overview of the software system used in Martinez et al. . . 20

26 The hardware setup used in Martinez et al. . . 20

27 The interface provided to users for the VR robot in the work by Zhu et al. [59] . 21 28 An overview of the system developed by Yuan et al . . . 22

29 An example of the view available to the operator in the system discussed in the work by Tang et al. [53] . . . 22

30 The working area together with the VR environment background available in the work by Tao et al. [44] . . . 22

31 The view displayed to the tele-operator, as provided by the AR system by Moon et al. . . 23

32 The HMD used in the work by Mollet et al. [43] (left) and an example of the AR view in the 3rd level of control (right). . . 24

33 A system overview of the HRI interface discussed in the method by Cerlinca et al. [18] . . . 24

34 The user scanning and creating a digital model of the work object in the work by Reinhart et al. [49] . . . 25

35 The projected height-map a digital model of the work object in the work by Reinhart et al. [49] . . . 25

36 The AR video of the simulation in the work by Reinhart et al. [49] . . . 25

37 Examples of the AR overlay on the video from the camera on the robot (left) and of the VR environment (right) in method developed by Akan et al. [8] . . . 26

38 The hardware setup of the system used in the work by Wang et al. [54] . . . 26

39 An example of the AR interface in the method by Lambrecht et al. [36] . . . 27

40 The translation of human grips to VR robot grips discussed by Aleotti et al. [9] . 28 41 An example of the AR overlay displayed to a user in the work by Boulanger et al. [16] . . . 28

42 An overview of the AR system used in the work by Boulanger et al. [16] . . . 29

43 The prototype of the hand-held tool used to dene robot targets by the user. The tool was comprised of a pre-dened marker used for tracking which was axed on a vertical handle. . . 31

44 A graph representing the perceived size of an object relative to its real size de-pending on the distance to the viewer. . . 32

45 The Sixense Razor Hydra as seen in the hands of a user playing the modied Tuscany demo by Sixense. . . 33

46 An overview of all the processes in the system and their communication with one another. . . 34

47 The main loop task of the RobotStudio Add-in. . . 35

(8)

49 The view that is sent to the Oculus once the add-in has been started. As men-tioned before, the view of the right is shifted with the Inter-Pupilary Distance (IPD) of the user in order to gain the stereoscopic 3D eect. In the view of the

left eye the target display window can be seen. . . 37

50 The camera setup used for tracking the user's head and hand. . . 38

51 A side view of camera 1 and the tracking area. . . 38

52 Sample rates and frame-rates for 320x240 video capture resolution. . . 43

55 The LIMIT 1060 Laser distance meter used to calculate precise positions during testing. . . 45

56 The metal xture that held the head tracking marker during performance testing. Camera 1 which is mounted on the ceiling can be seen in the top of the picture. . 46

57 A close-up of the small co-centric cylinder mentioned. . . 46

58 Actual marker position (in mm) versus measured marker position (in mm) with 320x240 video resolution. . . 47

61 The error between actual and measured position depending on total distance to the marker and the video resolution. The graph does not continue for the 320x240 resolution since the algoritm failed to locate the marker at further distances. . . . 49

62 The Epson picker industrial robot used when conducting tests that required a known path with constant length and speed. . . 50

63 The set-up used for testing the tool marker tracking test. Here the Epson robot can be seen to the top left, and the camera which is mounted to the metal xture to the bottom right. . . 51

64 Results obtained with 320x240 video resolution . . . 52

76 Results obtained with 320x240 video resolution and a -20 degree Rotation around the Y axis . . . 60

(9)

80 Results obtained with 640x480 video resolution and a -45 degree Rotation around

the Y axis . . . 62

97 A graph showing the time taken by each participant to nish the task given to them by using both methods. . . 74

98 A top-bottom and side view of the old camera set-up where the head tracking camera was in a corner near the ceiling of the testing room. . . 76

(10)

Introduction

Ever since the introduction of robots into the industry and academia, the problem of how to best create Human Robot Interfaces (HRIs) has existed. Although robots are a means of automating tasks without the need for human control, the conguration, programming and installation is still something that requires an as ecient as possible method of HRI.

Traditional forms of HRI include text-based command interfaces and 2-dimensional (2D) GUIs and Touch User Interfaces (TUIs) as well as voice commands. It is also possible to combine these interfaces to create multi-modal interfaces [8].

Text-based interfaces are often very time consuming and a rather cumbersome method to in-terface to the robot. The upside is however that the form of input is often close to the "language" that the robot understands, meaning that these interfaces are often easiest to implement and rarely result in misinterpretations by the robot. For example with the RAPID coding language used to program ABB industrial robot, dozens of lines of code must be produced to generate a movement along a single path and there is very little feedback to the user about what are the results of the users' actions.

An example of a GUI/TUI solution would be the one presented on the Teach Pendant that connects to ABB industrial robots, which can be seen in 1. This, just as with any touch screens as well as the manipulator tool in RobotStudio 2, provide the human interacting partner some visual stimuli in the interaction process. TUIs also provide the user the medium of touch in the communication with the robot, however it is most often used only for selection of menu options which is in turn not as a natural way to of interaction for humans.

Figure 1: A Teach Pendant handheld device for monitoring and conguring ABB industrial robots.

Speech is also a medium of communication that has been incorporated in HRI for some time. Together with other interfaces such as GUIs [8] it can provide a broader multi-modal method of interacting with the robot. Users can give commands that require detail with the GUI, while giving non-ambiguous, perhaps urgent, commands such as stop, start or slow down with speech. This enables the users to utilize more mediums of communication available in them, giving

(13)

necessarily the most intuitive way for humans to interact with robots or other machines. Current HRI methods focus more on anthropomorphic ways of interaction [46], meaning interfaces that mimic methods of communication already used by humans. That is why for some time now the interest has been growing in the development of a more intuitive interface method incorporating Virtual Reality (VR) and AR for HRI. With these technologies, users can give instructions and receive feedback in a completely immersed visual and sometimes audible and haptic plane, allowing for an authentic sensation of concrete inter-connectivity with the robot.

Figure 2: The ABB RobotStudio GUI displaying the rotational manipulation tool. This method of interaction with robots could reduce the development time of automation solutions by oering faster, more intuitive ways to control both simulated and real-world robots. It could also eliminate the requirement of expertise in software and robot programming, enabling a larger user base to congure and interact with the robot. This would be an extremely benecial development for Small and Medium-Sized Enterprisess (SMEs) as it would greatly increase their ability for automation. SMEs often face problems when nancing automation solutions as the installation, conguration and programming costs are so high due to extensive amount of labour time needed. Also they often lack the expertise required [47] to congure and reprogram the utilized industrial robots .

However, as mentioned these problems could be alleviated if not removed by more intuitive HRI methods incorporating VR and Augmented Reality (AR). The technologies could even be applicable in marketing. They would enable for a more immersed and stimulating visual demonstrations while presenting products for a company. More importantly they could provide it faster as well, which is of great interest since time frames are often short between acquiring a client and starting production of robot cells.

Scope of This Report

As this paper will show, the use of VR in simulating, displaying and controlling robots is a well investigated subject. It has been shown that VR can be used to show robot simulation in more detail and to specify path movement in task programming. However, none of the works have taken the step further in order to create an engineering system prototype.

This paper will focus upon nding and evaluating intuitive ways of interacting with simulated robots using virtual reality. It should be possible to visualize, understand, program and modify

(14)

simulated robot cells through a time-ecient, natural way of communication. After implement-ing these methods they will be evaluated by beimplement-ing tested by real employees in an automation company. In order to establish a base reference, the standard method of manually programming and inserting co-ordinates will also be evaluated the same way.

From these results, it should be concluded whether the use of VR in robot simulation visu-alization and programming is more intuitive and practical than the standard manual method of data entry and visualization and if so, how much of an advantage results from it.

Thesis Outline

The following section, Background, will give an overview of the basic theory behind stereoscopic 3-dimensional (3D) visualization and marker-based pose tracking. It will also discuss the hard-ware tools used which are the Oculus Rift Head Mounted Display (HMD) and the softhard-ware tools ABB RobotStudio, ABB RobotStudio SDK, and ARToolkit. The Related Work section then discusses the methods used and results achieved in relevant state of the art implementations. Following this, the Method section will discuss the software and hardware systems designed and implemented in order to create the HRI. In the Results section measurements regarding the system tracking performance and viability as an glsHRI are shown. The Discussion section discusses the consequences of the observed results and the systems capabilities while any pos-sible future work and improvements of the methods explored can be found in the Future Work section. Resulting conclusions that can be drawn from the work are stated in the Conclusions section, followed by any appendices.

(15)

Background

Introduction to Stereoscopic 3D

Stereoscopy is a method for creating articial depth in a at image [30] and thereby causing the brain to believe there is three-dimensional data in two-dimensional images or lms. The technique incorporates the manipulation of the binocular stereopsis that human eye-sight would normally encounter when seeing a three-dimensional scene, which is an important physiological cue for giving the feeling of depth for the human brain [51]. The following is a short explanation of the basics behind stereopsis and how it acts as an indication of depth for human vision.

In gure 3, it is possible to see how anatomy of the human pair of eyes results in two slightly dierent images for each eye. Given that the pair of eyes is focusing on the same point, P, then the center of each eye's fovea consists of the image cast by it. However, the image cast by point Q is then projected not at the center of the fovea but instead α degrees away from it in one eye and β degrees away from it in the other eye. The binocular disparity, hereby referred to as η , will then be (β − α) and is measured in degrees of visual angle. It is this binocular disparity that acts as a stimulant to enable the human brain to have depth perception, and has both a magnitude and a direction. If we let the angular disparity caused by point Q to be denoted as ηQ then

ηQ= (β − α) (1)

Figure 3: An illustration of the geometry involved in stereopsis. Image acquired from the work by De Silva et. al [51].

It is possible to prove geometrically that this disparity is proportional to the distance d, which is the relative depth of the point Q with respect to point P. It can also be proven [17] that it is inversely proportional to the square of v, which is the viewing distance.

ηQ∝

d

v2 (2)

Using the same logic as when determining ηQ, we get the binocular disparity for point R,

(16)

ηR= (−β) − α

ηR= −(β + α)

(3) From equation 1 and 3 and by looking at gure 3 it can be seen that ηQ > 0and ηR < 0.

The sign dierence is interpreted by the human brain as the relative positioning of points Q and R to point P. It is this stimulus that plays an important role in depth perception. If the binocular disparity is positive the object is seen to be behind the point of focus and likewise if is negative the object is seen to be in front of the point of focus.

Stereoscopy uses this principle to create the illusion of depth in a at image, by providing these slightly dierent images one to each eye. This can be done in a multitude of ways. The implementation described in this report utilizes HMDs to deliver the stereoscopic 3D eect, but other popular methods will be discussed as well in this section. Some require dierent types of tools while others can be viewed as-is with the naked eye, and are more practical for dierent situations.

Side-by-Side

This is a very simple method to experience the 3D stereoscopic eect. Two images are placed next to each other, where one image is the perspective that the left eye would have and the other image shows the perspective for the right eye. There are generally two ways of viewing a side-by-side stereogram and neither require any viewing tool. The method used depends on the position of the images.

Figure 4: A side-by-side stereogram that is to be viewed with the wall-eyed method. Image acquired from the report by Johansson [30].

If the image portraying the left eye perspective is on the left the images are viewed by keeping the eyes parallel, called the "wall-eyed" method. This can be achieved by trying to focus on a point behind the image. An example can be seen in gure 4. However, if the image showing the left eye's perspective is shown on the right, the images are viewed by crossing the eyes, and thereby giving each eye the correct perspective to enable the brain to extract the depth information as seen in gure 5. As can be expected, if the wrong viewing method is used for attempting to view the stereogram, the depth information is reversed.

(17)

Figure 5: A side-by-side stereogram that is to be viewed with the cross-eyed method. Image by Voklan Yuksel [57].

It is also possible to use a tool called a stereoscope which can be seen in 6. It eliminates everything but the image that each eye should see, removing a lot of the distractions and non-essential stimuli to the brain resulting in much less strain for the viewer.

Figure 6: The Holmes stereoscope, which utilizes prismatic lenses to direct and isolate the users eld of view for each eye so that the dierent perspectives reach each eye without strain. Image by Davepape [23].

Autostereogram

Autostereograms only consist of one image. However, they most often consist of repeated hori-zontal patterns in the image and are called "wallpaper" or "random dot" autostereograms. They use the same principal of fooling the brain's sense of depth perception using binocular disparity to cause the image to be displayed on a plane behind the actual depth of the image. This is achieved most often with the wall-eyed technique and by letting the left eye focus on an element in the pattern and the right eye focusing on a similiar element on a repeated tile of the pattern. The depth at which the elements appear on this plane and the depth of the plane depends on the spacing between the patterns. This allows for objects that are repeated with a shorter spacing to appear closer to the viewer and objects that are repeated with a longer spacing to appear further away depth-wise, as can be seen in 7. It is also possible to use computer programs to generate autostereograms by the use of depth maps which are greyscale depictions of the desired view depth of a model where lighter pictures are closer to the viewer and darker pixels instead have increased depth. The programs take the image that the model should appear in, often a horizontally repeating pattern of random dots, and shift the pixels according to the depth-map picture to increase/decrease the spacing distance. This results in the appearance of the depth described by the depth map and the model is thereby displayed to the viewer with depth information.

(18)

Figure 7: An autostereogram showing a picture of a 3D mermaid if focused correctly. Image from http://cdn.acidcow.com.

There are also autostereoscopic displays that achieve the same eect. These displays, such as the one used in the Nintendo 3DS, limit the pixels that each eye can see, in order to display separate images for each eye. This is often achieved by the use of parallax barriers. Parallax barriers are placed in front of the display and consist of a layer of material with precisely positioned slits that allow each eye to see a dierent set of pixels and thereby dierent pictures, as can be seen in gure 8.

Figure 8: A simplied visualization of the parallax barrier controlling which pixels are seen by what eye. Image from Wikimedia Commons [20].

Polarized 3D glasses

Polarized 3D stereoscopy require the user to wear a certain type of glasses which have polarized lenses. The images to be viewed are projected by two seperate projectors through seperate lters on onto a silver or aluminum covered screen in which the orientation of the electric eld of the light that is projected from the two projectors has been specied through the choice of lter. There are two dierent types of polarization, linear and circular. In linear polarization the electric eld of the polarized light is directed vertically and horizontally from the projectors and with circular polarization it is directed either clock-wise or counter clock-wise, see 9. The glasses that the viewer wears has matching lters for each eye and only allow light to pass through from one of the directions, eectively enabling each eye to only see one of the two projected images.

(19)

Figure 9: Clock-wise polarized light being turned into linearly polarized light by a quarter wave plate in the viewer's glasses. The linearly polarized light is polarized along the transmission axis of the linear polarizer lter so that the image is passed. Counter clock-wise polarized light would become polarized along an orthogonal axis and instead be blocked. Image proivded by Wikimedia Commons [22].

Liquid Crystal Shutter Glasses

As the name implies, this method involves the viewer wearing glasses that have lenses that act as shutters. The lenses contain liquid crystal and a polarizing lter and turn black when a voltage is applied to them, eectively blocking the view. The display device presents alternating picture streams, one for the left eye and one for the right eye at a high frequency. The shutter glasses are then synced with the display device to block the view for one eye at one of the alternating image streams and only allow the other to be shown to each eye. This way the stereoscopic 3D eect is enabled since each eye only sees the corresponding perspective's image stream. It is important to keep the update frequency high, at around 60 Hz meaning 30 Hz per alternating image stream, in order to not cause ickering or ghosting which is when a certain amount of the image meant for one eye is seen by the other.

Figure 10: An example of Liquid Crystal Shutter Glasses. The necessary electronics are hidden under the frames. Image proivded by Amidror1973 at en.wikipedia [11].

Anaglyph

Anaglyphic 3D incorporates the use of color ltering to separate what images are visible to each eye. There are a multitude of dierent colors that can be used for ltering but the most common is red-cyan [30]. Using this method, the image that is meant for the left eye is reduced to only having the red color channel displayed while the image containing the right eye's perspective is reduced to only display the green and blue (together cyan) color channels. They images are

(20)

then superimposed on top of each other which leads to an image that looks completely normal except with red and cyan horizontal edges when not wearing the glasses. The user, wearing red-cyan anaglyph glasses, only sees one of the superimposed images with each eye however. The red lter allows only grey scale color from the red image to pass to the left eye, while the cyan lter allows the colors from green to blue to pass to the right eye. Since these images are as usual slightly dierent to simulate the diering perspectives of the left and right eyes, the eect of depth vision is once again achieved via binocular disparity. The color of the image is however never fully reconstructed by the brain, as only grey (black/white), green and blue color channels were ever nally perceived, leaving the red channel missing. This is a phenomenon that occurs with all the dierent colors used for anaglyphic 3D. No matter what colors are chosen, the viewer will be rendered partially color blind in some sense [55].

Figure 11: An example of plastic red-cyan anaglyph glasses.

Head Mounted Displays

The method used to achieve stereoscopic 3D with HMDs is fairly straight-forward when compared to the other methods discussed here. HMDs simply most often use two seperate Light Emitting Diode (LED) or Organic Light Emitting Diode (OLED) screens for each eye or in some cases, such as with the Oculus Rift [12], one screen that is split and used for both eyes as can bee seen in gure 12. Each eye only sees one image as it is simply the only one displayed to it, so when they are presented the image corresponding to the perspective of that eye the stereoscopic 3D eect is achieved. HMDs also often use lenses that magnify the image presented to give a better view to the user. A drawback with this method is that since the displays are located so close to the viewer's eyes, the image can often appear pixelated. This problem seems to be however disappearing as HMDs have started to utilize displays with up to full HD resolution [45].

Figure 12: A demonstration of how two separate image feeds are displayed to each user with the Oculus Rift HMD. Image from www.venturebeat.com.

(21)

The Oculus Rift Head Mounted Display

The term HMD refers to a wearable display which is often integrated into some form of glasses or helmet [37]. HMDs can come in either see-through or fully enclosed versions [40]. See-through HMDs can superimpose graphics onto lenses while allowing users to still see the environment around them, making them more suitable for AR applications. Fully-enclosed HMDs completely cut o the user's vision of the outside world so that they can only see what the display(s) show them, making them more suitable for VR applications. HMDs have been a popular method to display VR for some time [25], [38], and even more so now that more modern light-weight HMDs such as the Zeiss Cinemizer [58], Sony HMZ T2 [52] and Z800 3DVisor [26] can of-fer head-tracking and reasonable screen resolutions at acceptable prices. The Oculus Rift [45] however provides signigicant advantages over these HMDs by oering greater screen resolution, signicantly higher Field of View (FOV), extremely responsive head-tracking and a lower price. It provides a high-denition OLED screen and both horizontal and vertical head-tracking capa-bilities with the following specications [12]

Display Specications • 7 inch diagonal viewing area

• 1280 x 800 resolution (720p). This is split between both eyes, yielding 640 x 800 per eye. • 90 Degree horizontal FOV, 110 degree vertical FOV

• 64mm xed distance between lens centers • 60Hz LCD panel

• DVI-D Single Link • HDMI 1.3+

• USB 2.0 Full Speed+

Tracker Specications • Up to 1000Hz sampling rate

• Three-axis gyroscope, which senses angular velocity • Three-axis magnetometer, which senses magnetic elds

• Three-axis accelerometer, which senses accelerations, including gravity

The Oculus Rift is for the time being sold as a development platform, so it is shipped with a Source Development Kit (SDK) allowing developers to integrate the technology into their games and applications. The SDK comes with an Application Programming Interface (API) to access the functions of the HMD and some example programs with source code, giving insight in how the API can be used and what the Oculus Rift is capable of. Since the lenses that are included in the Oculus Rift distort the image, the SDK also provides code for barrel distorting the image in software in order to negate the distortion caused by the lenses.

(22)

Figure 13: An example of the how the image sent to the Oculus is distorted with barrel distortion to account for the lens distortion.

A consumer version of the Oculus Rift is also under development which will include im-provements such as reduced weight and full HD resolution [3]. The Oculus Rift was created by Palmer Luckey and got to its current state with the help of Kicktarter [33].

Figure 14: The Oculus Rift Developers Kit out of the box.

ABB RobotStudio and ABB RobotStudio API

ABB RobotStudio is a program used for simulation and the oine programming of ABB in-dustrial robots. It provides tools that enable users to train, program, and optimize simulated versions of their ABB industrial robots which are based on an exact copy of the real software that runs on the robots in production. This results in very realistic simulations that allow the

(23)

robots.

Figure 15: A screenshot of the RobotStudio environment.

An API exists for RobotStudio, which allows developers to interface to the functionality of the robot by writing their own plug-in programs for RobotStudio, called "Add-ins". Add-ins have the ability to manipulate a multitude of things such as the simulated robot, the graphical view, the conguration les, inputs/outputs and access the virtual controller that controls the robot. It is with the use of an add-in that the user in the proposed method can control the RobotStudio environment through the use of the Occulus Rift.

ARToolkit

Since the developer version of the Oculus Rift only supports head tracking in terms of rotation and not position, ARToolkit was used to track the user's position. ARToolkit is a framework that allows for the superimposing of virtual graphics on to a live video feed of the real world using a marker-based approach. The principal is as follows (list from ARToolkit Documentation page [13]):

• The camera captures video of the real world and sends it to the computer.

• Software on the computer searches through each video frame for any square shapes. • If a square is found, the software uses some mathematics to calculate the position of the

camera relative to the black square.

• Once the position of the camera is known a computer graphics model is drawn from that same position.

• This model is drawn on top of the video of the real world and so appears stuck on the square marker.

• The nal output is shown back in the handheld display, so when the user looks through the display they see graphics overlaid on the real world.

(24)

Figure 16: A summary of the method for marker-tracking and display of virtual models that ARToolkit uses for AR. Image from RToolkit Documentation page [13].

Finding Marker Coordinates

The size-known markers used by ARToolkit enable it to calculate a transformation matrix from marker coordinate system and the camera coordinate system (see gure 17). This is done by rst using image processing on the acquired image. The image is rst thresholded and then regions whose outline contour can be described by four line segments are found. The equations for these lines and the coordinates of the four vertices of the resulting square are stored for use in later processing. A visual representation of the results of these steps can be seen in gure 18.

Figure 17: The relationship between the marker coordinate system and the camera coordinate system in ARToolkit. Image from the work by Kato and BillingHurst [31].

(25)

Figure 18: The results of the dierent image processing technigues utilized by ARToolkit. Image from the work by Kato and BillingHurst [31].

The regions are then normalized and the the sub-image within the region is compared to patterns that were given to the system to identify markers using template matching. The normalization process uses a perspective transformation represented by the following equation:

  hxc hyxc h  =   N11 N12 N13 N21 N22 N23 N31 N32 N33     Xm Ym 1   (4)

Each variable in the transformation matrix is found by substituting screen coordinates and marker coordinates of the detected marker's four vertices for (xc, yc) and (Xm, Ym) respectively. Following this, normalization can be done by using equation 4.

When projecting two parallel sides of a square marker on the acquired image, the equations of the lines in given in camera screen coordinates are:

a1x + b1y + c1= 0, a2x + b2y + c2 = 0 (5)

The value of these parameters has already been obtained for each of the markers during the line-tting process discussed earlier. Equation 6 describes the perspective projection matrix P that results from the camera calibration process. The equations of the planes that these two sides are on can be represented as equation 7 in the camera coordinates frame if xc and yc in

equation 6 are substituted for x and y in equation 5. P =     P11 P12 P13 0 P11 P222 P23 P13 0 0 1 0 0 0 0 1     ,     hxc hyxc h 1     =P     Xc YC ZC 1     (6) a1P11Xc+ (a1P12+ b1P22)Xc+ (a1P13+ b1P23+ c1)Zc= 0 a2P11Xc+ (a2P12+ b2P22)Xc+ (a2P13+ b2P23+ c2)Zc= 0 (7) Dening normal vectors of these planes as n1and n2 respectively, the direction vector of two

parallel sides of the square is found with the outer product n1×n2. Given that two unit direction

(26)

vectors should be perpendicular. Unfortunately, the vectors won't be exactly perpendicular due to image processing errors.

In order to compensate for this, two perpendicular unit direction vectors are dened by v1

and v2 in the plane that u1 and u2 are in, as seen in gure 19.

Figure 19: The perpendicular vector pair v1 and v2 calculcated fromu1 and u2. Image from

the work by Kato and BillingHurst [31].

Since the unit direction vector which is perpendicular to both v1 and v2 is v3, the rotation

component V3x3 in the transformation matrix transforming from marker coordinates to camera

coordinates as specied in the following equation:     Xc Yc Zc 1     =     V11 V12 V13 Wx V21 V22 V23 Wy V31 V32 V33 Wz 0 0 0 1         Xm Ym Zm 1     =V3x3 W3x1 0 0 0 1     Xm Ym Zm 1     = Tcm     Xm Ym Zm 1     (8) is [Vt 1, V2t, V3t].

Since V3x3 in the transformation matrix is known, by using equation 8 and 6, the marker's

four vertices' coordinates in the marker coordinate frame and the camera screen coordinate frame, eight equations that include the translation component WxWxWx are generated and the

value of them can be obtained from these equations. However there may be some errors in the transformation matrix mentioned above. This can be reduced by using the transformation matrix to transform the vertex coordinates of the markers in the marker coordinate frame to the coordinates in the camera screen coordinate frame. The transformation matrix is then optimized when the sum of the dierence between the transformed coordinates and the coordinates that are measured from the image goes toward a minimum. Although the transformation matrix includes six independent variables, only the rotation components are optimized, as dealing with all the six independent variables would require too much computational cost. The translation components are then re-estimated by using the method mentioned above. By the iteration of this process the transformation matrix is more accurately found.

Limitations

The performance of ARToolkit depends on several factors such as the environment lighting, the performance of the camera being used, and the distance to, rotation, and size of the marker being used. Below graphs showing the performance of marker tracking by ARToolkit at dierent distances:

(27)

Figure 20: Tracking range of ARToolkit depending on marker size. Image from ARToolkit homepage [1].

Figure 21: Tracking error of ARToolkit depending on distance to and rotation of marker. Image from from ARToolkit homepage [1].

(28)

Related Work

Immersion into a simulated robot cage environment is a subject that has been of interest for some time. As far back as 1995, Natonek et al. [25] presented a method for intuitively viewing the workspace of a simulated industrial robot as well as programming of paths for the robot by manipulating its movement. The user was presented the VR world with a 3D HMD and controlled the view angle by rotating their head while controlling movement with a 3D mouse, which can be seen in gure 22. The VR environment was created by constructing a model of the robot and then using a camera on top of the robot to recognize objects in the working space and reproducing them in the VR world.

Figure 22: The system set-up used by Natonek et. al [25]

Luciano et al. produced similar results shortly after [38] using the VFX-1 HMD for head tracking and displaying the VR environment which was built with Criterion Renderware 1.4. Using this set-up the user could view the robot cell in simulated operation from any angle and even follow parts in motion. The user could however not manipulate the robot, only its own viewing angle of the environment.

Newer HMDs have of course been developed, such as the Oculus Rift [45]. This HMD is mainly developed for gaming purposes but a developer's kit has been made available which ships with a prototype Oculus Rift HMD unit and access to the SDK which interfaces information about the head tracking, and stereo video feeds [15]. Other HMDs will be mentioned in their related works in this paper as well.

(29)

Figure 23: A picture of the Oculus Rift HMD system.

VR in Robot Simulation

Work by Bick et al. poses great relevance [14] and presents a method for displaying a simulated VR version of a robotic production plant. The VR world was built using Vega where all the 3D models are imported, animated and manipulated. The 3D models were created using ModelGen II, a 3D modeling program similar to CAD. This VR environment is then presented to the user with an alternating stereoscopic image projector with corresponding crystal eyes 3D glasses, to allow the user to be immersed into the VR environment. A 3D mouse also enables the user to traverse the VR world in 6 DoF. The authors mention how this can be used to do a walkthrough for customers to allow them to see the production line in-action without the actual physical real-world environment being present. The authors also mention how the system's modularity enables it to quickly and easily build up dierent scenes for dierent production plants.

Simulation of robots in a 3D VR environment was also presented in the work by Kopasci [34]. Kopasci enabled users to view a VR replicated version of a Fanuc industrial robot in operation as seen in gure 24. The VR environment is built using VirCA [27] where the models were generated using Google Sketchup, 3DSMAX and Solidworks and then exported to the Ogre format which is the graphics engine of VirCA. The authors state that the movement for the robot in the VR environment is procured from a le generated by the real robot when in operation, but mention that a serial communication link directly from the robot controller to VirCA is under development.

(30)

Kovács also utilized the VirCA engine in his work of monitoring Incremental Sheet Forming (ISF) by VR environment [35]. The solution provides a real time continuously animated rep-resentation of the real life robots, working objects and working environment. The author also discusses that vizualizing the VR environment in 3D with 3D glasses is also a possibility.

Martinez et al. [41] developed a method for displaying a virtual 3D model of a robot used for testing biological specimens was brought forward, together with a proposal for the software and hardware architecture of the system. As can be seen in the software system described in 25, Martinez et al. used Matlab and Simulink to produce the simulation movements and actions of the robot. These commands were then forwarded to the 3D environment that was created in the open source Blender software via python structs sent by UDP. The user would see and interact with the real-time environment on the host computer, while all real-time calculations and commands were done on the target computer which communicated with both the robot controller and the host computer, as can be seen in gure 26. The result was a virtual 3D environment viewed from a 2D screen where the user could see the simulation of the robot and see if any kinematic or boundary errors occurred.

Figure 25: An overview of the soft-ware system used in Martinez et al.

Figure 26: The hardware setup used in Martinez et al.

Another popular, if possibly somewhat outdated method of displaying simulated robots via VR is the use of Virtual Reality Modeling Language (VRML), a language used for the compact description of 3D models and movements [48], [59], [42], [10]. This is often used in the conjunction of a web-based interface, as is the case with Qiu et al. [48], Zhu et al. [59] and mohamad et al [42]. In the work done by Zhu et al. a web interface to a simulation of the CINCINNATI industrial robot is provided to users. The model of the robot is done in NX Unigraphics and is then converted to VRML, which is the same method used by Qiu et al. A java applet incorporating EAI interfaces the web browser of the user and the VRML content, enabling the user to manipulate the robot by typing angular and linear movements into the user interface gure 27, as well as perceive the simulated animation.

(31)

Figure 27: The interface provided to users for the VR robot in the work by Zhu et al. [59]

VR in Robot Tele-Operation

In work presented by Yuan et al. [56] VR was used by users to navigate with a mobile repair robot. The user was given the task to navigate a real-life repair robot to the location of a damaged container and perform repairs using the visual information given to them by a simulated repair robot in a VR environment. The 3D environment was built using the EON virtual reality interaction software as an engine while the 3D models in the environment were made using 3DSMAX and rendered using OpenGL and Visual C++. This VR environment was displayed in stereoscopic 3D to the user via an HMD that the user wears and the user was able to control the robot's movements using a data glove. An overview of the hardware structure of the system can be seen in gure 28. Every command executed by the simulated robot in the VR environment was also executed by the real-life robot, resulting in the user being able to successfully navigate the physical robot to the physical damaged container by interacting with the simulated robot in its VR environment. The authors do however mention that some indescrepencies can occur between the virtual environment and the real-life environment since the VR environment is not updated. This leads to the user having to rely on camera feeds on-site and on the robot to correct for these errors. In related work by Ding et al. which is based on the same platform [19], a stereo vision camera system is put on the physical robot as well. However this system seems to mostly be used for object recognition and safety protocols such as object avoidance for the physical robot that keeps the user from damaging the robot, and not for calibrating/refreshing the VR world.

(32)

Figure 28: An overview of the system developed by Yuan et al

The work of both Tang et al. [53] and Tao et al. [44] describe a method for the tele-operation of a construction robot using virtual reality. As was discussed in the work of Yuan et al. the user would operate a real-life physical robot by interacting with a simulated one in VR. By donning a `3Dvisor' HMD the user was immersed into the VR environment that was rendered by OpenGL and was able to manipulate the robot with two joysticks that incorporated force-feedback. The VR world incorporated in this work had the advantage of being based on the 3D visual readings of a `DigiClops' stereovision camera mounted on the physical robot, meaning the VR world is constructed as a reection of the real-world in real-time, eliminating the indescrepencies that Tang et al. had. The VR environment for Tang et al [53] consisted of the robot and its working area which can be seen in gure 29. In the work by Tao et al. the VR environment is expanded to also provide a background of the working area including realistic textures (gure 30).

Figure 29: An example of the view available to the operator in the sys-tem discussed in the work by Tang et al. [53]

Figure 30: The working area to-gether with the VR environment back-ground available in the work by Tao et al. [44]

Also centering around the platform of tele-operated construction robots, the work of Moon et al. instead incorporated the use of AR in aiding the tele-operator [29]. The construction robot had a camera mounted on it, but in order to widen the eld of view that was accessible

(33)

construction site was displayed around the image as can be seen in gure 31. The AR envi-ronment was created and managed using NAVERlib which is built upon tools such as OpenGL Performer, Virtual Reality Peripheral Network (VRPN) and Digital Video Transport System (DVTS). By receiving the video from the camera, and the angle and position of the camera, the correct view of the AR environment could be shown to the user who could also change the view using a joystick.

Figure 31: The view displayed to the tele-operator, as provided by the AR system by Moon et al.

Another interesting implementation of virtual reality in tele-operation of robots is in the work of Mollet et al. [43]. The authors describe a way of tele-operating groups of physical robots using their simulated VR versions in a Collaborative Virtual Environment (CVE). The CVE provides an abstraction of the remote world, which standardizes and simplies to reduce the cognitive load on the user. The VR world is supported by Microsoft Robot Studio, which the users are immersed into with `ARVision' 3D HMDs. Headtracking with `Polhemus' magnetic tracking enables users camera angles to be rotated by rotating their heads, but in order to move the users must use a joystick. Through the Groups Manager Interface (GMI) the users can manipulate and follow the robots in three dierent ways. The rst method provides the highest level of abstraction where users can give arbitrary commands such as `move here' or `follow robot nr x' while looking at a helicopter view or user speciced view of the virtual environment and its members. The second method allows the users to take control of a virtual robot directly, which gives control of the robots movements directly to the user while still oering assisted visual information and abstraction in the VR world as well as navigational assistance such as obstacle avoidance before the commands are sent to the physical robot. The third and lowest abstraction level method allows users to see exactly what the real world robots see through their camera feeds and directly control the real world robot's movements. The users can receive visual aides in this view through AR, such as the position of the robot on a mini-map or superimposed markers on the ground representing objectives as seen in g 32.

(34)

Figure 32: The HMD used in the work by Mollet et al. [43] (left) and an example of the AR view in the 3rd level of control (right).

VR and AR in Local Control

In an approach by Cerlinca et al. [18] an HRI was developed for the intuative control of an industrial robot used for creating simple or composite Integrated Circuit (IC)-based protoypes. The interface which can be seen as part of gure 33 used hand-gestures as the method of com-munication which were observed using a Microsoft Kinect, ltered using 3D Kalman lter, and classied using Dynamic Time Warping (DTW) and k-Nearest Neighbors (kNNs) algorithms. The gestures together with a 3D mouse were used by the user to interact with a VR version of the robot's working environment which consisted of the components repository, the IC boards repos-itory and the working area. When the user had built the simulated VR version of the product, the information was sent to the robot controller via an Hardware Independent Interface (HII) and the physical product was produced.

Figure 33: A system overview of the HRI interface discussed in the method by Cerlinca et al. [18]

(35)

VR and AR were implemented in an interface to an industrial robot developed by Marin et al. [39]. Users were presented with a 3D virtual model of the real-robot in which they could change the camera angle, as well as 2 real-life video feeds. AR was used to superimpose graphical aides onto the video streams such as highlighting the position of the gripper. The user interacted using the mouse and keyboard and the 3D environment was displayed using a 2D monitor. If the real-life robot was busy or unavailable it was also possible for users to simply interact with a simulated industrial robot. The industrial robot was trained to recognize objects and to be able to move them around given user commands to grip, move, and release.

AR can also by itself be utilized in manipulating and simulating industrial robots [49] as shown by Reinhart et al. in their projection based interface for industrial robots. By using a video beamer, laser projector, hand held stylus and a motion tracker it was possible for the user to interact with the robot in several ways. One possibility was to give the robot new target positions by pointing with the stylus. Another was to give commands with the stylus via a menu that was projected on to a at area near the work object. The user could also scan the work object using the stylus (g. 34) and see a superimposed color height-map projected on the object (g. 35) as well as being able to include the object in a simulation for the robot. The user could view this simulation as an AR overlay on a video feed of the real working station, being notied of any spacial errors as seen in gure 36.

Figure 34: The user scanning and creating a digital model of the work object in the work by Reinhart et al. [49]

Figure 35: The projected height-map a digital model of the work object in the work by Reinhart et al. [49]

Figure 36: The AR video of the simulation in the work by Reinhart et al. [49]

Akan et al. [8] demonstrate a multimodal interaction method for interacting with robots which incorporated both AR and VR. The users could issue commands using both speech and mouse events on the AR generated visuals on the screen displayed to the user. Results of the

(36)

users input could then be seen prior to execution as a simulation in the AR super-imposed view as well as a VR environment view. In accordance with the other tele-operating methods, the program was only executed if the simulation could be done without errors. The possibility to change the VR world view which was presented on a 2D screen was not mentioned.

Figure 37: Examples of the AR overlay on the video from the camera on the robot (left) and of the VR environment (right) in method developed by Akan et al. [8]

In research presented by Wang et al [54] a method for human-computer interaction using bare hands is proposed. This interface was used to simulate and evaluate assembly designs and consisted of an HMD and a web camera (see g 38). The system used marker-based tracking to super-impose virtual 3D models of the assembly parts, which were rendered from LTS exported CAD les using OpenGL. A nger tracking algorithm was then used to create a virtual tool for manipulation of the AR objects.

Figure 38: The hardware setup of the system used in the work by Wang et al. [54]

VR and AR in Robot Programming

In work by Lambrecht et al. [36] the authors introduce spatial programming of an industrial robot using AR. The goal was to be able to program paths and tasks for industrial robots using gesture recognition and positional tracking. The method uses OpenGLES to create the AR objects that are displayed on a Samsung Galaxy 2. The same handheld device's camera is used to recognize 2D hand gestures from a video stream from its camera using OpenCV 2.4.0 Java Bindings. This data is combined with the 3D position of the hand acquired through a Kinect in

(37)

screen of the Galaxy S2 and can nally be executed on the real-life robot via by commands in the GUI on the cellphone.

Figure 39: An example of the AR interface in the method by Lambrecht et al. [36] Kawasaki et al. [32] present a method for teaching gripping tasks to a humanoid robot arm. The user interacts with a virtual reality environment that is a representation of the actual real-world system. 3D cameras convey the position of the actual real-world object to the VR environment, as well as the position of the real-world user's hand. The gripping gesture is recognized by a data glove that enables force feedback to allow the user to feel the result of gripping the virtual object in the VR space. By interacting with the virtual environment the user is free of both the communication lag between real world and VR world, as well as from the pressure of causing a system failure as the commands are only sent to the real-world robot if they pass the virtual simulation constraints. By recognizing and segmenting the users' gestures, the result was teaching the robot to move objects by splitting the action into move, approach, grasp, translate, place and release tasks.

Similarly, Aleotti et al. [9] developed a method for teaching pre-grasp pathing and gripping motion to a robot arm using VR. The learning data used to teach the robot was extracted from the VR representation of a real-world human grasp, instead of the physical grasp itself. The grasping movement was replicated into VR using a `Cybertouch' data glove and rendered using OpenGL and the position of the user's hand was tracked using a FasTrack' 3D motion tracking device so that combined the position and type of grasp could be determined. The authors mention how the ability to get information about the contact normal in a virtual grasp give an advantage over studying real-world grasps. An example of real-world grasps and their VR counterparts can be seen in 40.

(38)

Figure 40: The translation of human grips to VR robot grips discussed by Aleotti et al. [9]

VR and AR in Personnel Training

Another popular use of VR is personnel training. Since training with the real machines, materials and tools can be expensive [21] it is of great interest to be able to train personnel with methods incorporating VR where costs for the physical tools, maintenance and materials are negated. This has inspired works such as those by Crison et al. [21]. and [50] Rossman et al.

Figure 41: An example of the AR overlay displayed to a user in the work by Boulanger et al. [16]

Boulanger et al. [16] propose a method for Mobile Collaborative Augmented Reality (MCAR) which enables users to utilize computer-based tools in training scenarios in the real world. With MCAR, users receive instructions and tips via AR during training exercises. 3D models of dierent relevant objects were made using OpenGL. Relevant objects in the real-world were then tagged with square markers functioning as 4x4 Matrices of bits. The user would then wear an HMD with a camera on it. Using the video feed from the camera, it was possible to determine the relative position and orientation of the HMD to the real life object through video processing. This relative position was then used to create a transformation matrix that oriented the 3D model to align with the real-world position of the object. Finally the 3D model was superimposed onto the real world view of the user with the HMD (see gure 41). The whole system can be seen in gure 42 which resulted in users receiving visual tips and superimposed models on their real-world views which aided them in training exercises.

(39)

Figure 42: An overview of the AR system used in the work by Boulanger et al. [16] In the work by Cryson et al. the focus is put on the interface to the VR, which is an ad-hoc haptic interface controller mimicking the handle of a milling machine that the personnel are to be trained at. The user interacts with the VR environment which is displayed as a 3D model of the milling tool and the milled material. Events in the simulated training exercise are fed back to the user with the force feed-back in the tool and messages on the screen that displays the world. Rossman et al. [50] use the basis of a robotic simulation VR environment to simulate forest environments which can be used for vehicle training exercises and re control training. The forests were built from data acquired by external data such as satellite pictures and laser readings on forest machinery. They were then displayed using the VeroSim VR software where OpenGL was used for rendering. The authors note how a 2D screen and mouse + keyboard are enough to traverse the VR environment but how 3D eects and controls such as data gloves can be added to give a better feeling of immersion.

The Intended Contributions of This Work

In order to take the applications of VR and AR one step further in the industry, this work aims to propose a VR-based tool that can measurebly aid the growth of industrial robotics and production automation. To expand upon the type of work that Lambrecht et al. [36] did, this tool would not only be a new concept but also provide an increase in performance in some aspect of the product automation process. The proposed solution described in this work intends to do this by reducing programming time of industrial robots as well as increasing awareness and spacial understanding of the automation solution. This gives potential customers a more concrete example and demonstration of what they are being oered, as well as reducing the competency requirements for programming industrial robots.

(40)

Method

System Selection Process

In order to provide an immersive and reactive virtual world for the user, several design choices had to be made, including:

• How should the VR world be displayed to the user? • How should the user provide input?

• How should head positional tracking be implemented?

In order to answer these questions dierent possibilities were assessed and system components were chosen to give the optimized system performance within the project's budget.

Virtual World Display

The choice of how to represent the virtual world of the robot cell to the user was in essence a choice of which HMD to use, as alternative 3D full view display methods such as the Cave Automatic Virtual Environment (CAVE) were far too expensive for the scope of this project. In the end, the Oculus Rift was chosen for its numerous advantages over other HMDs mentioned in the background section, including:

• Relatively low price of 400 USD

• 90and 110 Degree horizontal and vertical FOV respectively • Fast head orientation tracking (1000 Hz sampling rate) • Orientation tracking of all roll pitch and yaw.

• Included SDK upon purchase that enabled access to necessary components.

Despite the downsides of the Oculus' screen's low resolution and low pixel switching rate, it was decided that the advantages of the Oculus outweigh its disadvantages and was chosen as the hardware platform for displaying the VR world to the user over the other HMDs.

User Input

The goal of the HRI was for the user to be able to interact with and manipulate the robot in an intuitive way. That meant that the method for dening or redening robot targets in the robot's program should feel as natural as possible. Humans use hands in everyday interactions with all types of objects, whether it be to relocate, operate, or sense. This means that most

(41)

how they are oriented. It was because of this reason it was decided that the robot targets were to be dened using a hand-held tool as seen in gure 43. The user could position and orientate this tool however they would like the robot to later position and orientate the tool center point. This way, the will of the user can be expressed as naturally as possible and forwarded to the robot with all the necessary information. This of course led to the task of choosing how to track the position and orientation of this required tool.

Figure 43: The prototype of the hand-held tool used to dene robot targets by the user. The tool was comprised of a pre-dened marker used for tracking which was axed on a vertical handle.

Any other input such as menu option selections was decided to simply be done with a wireless air mouse. It was also deemed practical for the user to be able to use the keyboard as well to move around the virtual environment, so that the act of positioning oneself in the virtual reality through walking in the real world was a possibility and not a restriction. This gave the system several advantages including enabling the tracking area to be smaller than the virtual world, allowing the user to quickly choose a general area where they wish to begin exploring the robot cell by walking around, and also allowing the user to completely interact with the robot without leaving the computer if they so desired.

User Position Tracking

To solve the question of how to track the user's position, three options were considered consisting of two predominantly dierent technologies.

The rst option was to use a vision algorithm based on blob detection. The Oculus Rift would be tted with a ball shaped object, then the algorithm would use the size of the perceived object and compare it with a known size of the object at a known distance to get the current distance to and position of the object. This method had the advantage of being less computationally demanding than marker-based tracking and the oculus could be rotated in anyway up to ∼160

(42)

the accuracy of the system would suer greatly, as the perceived size of an object depending on the distance can be described as:

H = a/d (9)

where H is the apparent height, a is the actual height and d is the distance to the object as a multiple of the original distance the actual height was measured. This looks like a typical y = k/x function and a visual representation with a = 0.05 m can be seen in gure 44. From here it can be seen that the actual distance increase a 1 pixel decrease in the perceived size of the ball in the image would mean starts to escalate very quickly. This means the resolution of such an algorithm would deteriorate to an unusable level after short distances and not be a practical choice.

Figure 44: A graph representing the perceived size of an object relative to its real size depending on the distance to the viewer.

The second option was marker-based tracking using ARToolkit. ARToolkit is popularly used for AR applications, superimposing virtual models at the position and orientation of real world markers. In order to do this, ARToolkit must nd the position and orientation of the marker, which was the component that was relevant to this project. ARToolkit uses an approximation algorithm in order to estimate the pose and position of the marker, providing millimeter reso-lution although not always millimetre precision. By examining the tables shown in gure 20 in the Background section concerning ARToolkit, it was deemed that using a marker of sucient size it would be possible to implement ARToolkit in tracking the user's head.

The ability of being able to calculate the marker's orientation also became an important functionality when it was decided that the user would point out robot targets using the orienta-tion and posiorienta-tion of a hand-held tool. It was decided that it would be best to use two separate cameras in a conguration as shown in gure 50 and gure 51. One camera would be used for head tracking and the other would be mounted on the Oculus and used for tracking of the hand-held tool. This was due to several reasons, one being the fact that the marker on the hand-held tool should preferably be as small as possible, and since the accuracy of the tracking program becomes worse with increased distance, the distance from the camera to the hand-held tool tracker should be minimized. Another reason was that the head tracking camera's view of the marker can be blocked by the user depending on where the camera used for head tracking is situated. However, humans tend to look at what they are working with, so if a camera mounted