Improving AR visualizationwith Kalman filtering andhorizon-based orientation : – To prevent boats to run aground at sea

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer and Information Science

Master thesis, 30 ECTS | Datateknik

2018 | LIU-IDA/LITH-EX-A--18/031--SE

Improving AR visualization

with Kalman filtering and

horizon-based orientation

–

To prevent boats to run aground at sea

Förbättring av AR-visualisering med Kalmanfiltrering och

ho-risontbaserad orientering - för att förhindra båtar att gå på

grund

Pontus Hero Ek

Supervisor : Jonas Wallgren Examiner : Ola Leifler

(2)

The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circum-stances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the con-sent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping Uni-versity Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

c

(3)

Abstract

This thesis researched the possibility of improving the compass of smartphones as the earth’s magnetic field is not strong and is easily disturbed, either by the environment or technology. The compass is used in Augmented Reality (AR) when the AR visualization should correspond to a position on earth. The issue lies in oscillating input values to the compass that reduces the AR experience.

To improve the AR experience without the use of external equipment, this work tried to both filter the incoming values with a Kalman filter and to know the direction by capturing an image with a horizon that was image processed. The Kalman filter achieved a reduction in incoming disturbances and the horizon was matched against a panorama image that was generated from 3D data. The thesis starts off with requirements and contents of AR and goes through the different approaches that begins with a LAS point cloud and ends in matching horizons with normalized cross-correlation.

This thesis furthermore measures performance and battery drainage of the built ap-plication on three different smartphones that are nearly a year apart each. Drift was also measured as it is a common issue if there is no earthly orientation to correct itself unto, for instance the magnetometer. This showed that these methods can be used on OnePlus 2, Samsung Galaxy S7, and Samsung Galaxy S8, there is a steady performance and effi-ciency increase in each generation and that ARCore causes less drift. Furthermore this thesis shows the difference between a compass and a local orientation with an offset.

The application that was made focused to work at sea but it was also tested on build-ings with good results. The application also underwent usability tests that showed that the applied functionalities improved the AR-experience. The conclusion shows that it is possi-ble to improve the orientation of smartphones. Albeit it can go wrong sometimes which is why this thesis also presents two ways to indicate that the heading is off.

(4)

(5)

Acknowledgments

I would first like to thank my supervisor Jonas Wallgren at Linköping University. The feed-back, answers, and proofreading improved this thesis with guidance that guided me in the right direction. I would also like to thank Ola Leifler and Tobias Larsson for their support during the thesis.

Finally, I must express my sincere appreciation for the help from Jonas Ekskog that helped with valuable inputs for the application and thesis.

(6)

Abstract iii

Acknowledgments v

Contents vi

List of Figures viii

List of Tables ix 1 Introduction 1 1.1 Motivation . . . 1 1.2 Aim . . . 2 1.3 Research questions . . . 3 1.4 Delimitations . . . 3 2 Background 5 2.1 The application . . . 5 2.2 Geographical data . . . 5

2.3 Data from geodata . . . 6

2.4 Combitech . . . 6 3 Theory 7 3.1 Augmented reality . . . 7 3.2 Horizon-based orientation . . . 10 3.3 Filter . . . 13 3.4 Sensors . . . 15 3.5 Usability . . . 16 4 Method 21 4.1 Pre-study . . . 21 4.2 Data . . . 21 4.3 Graphical design . . . 22 4.4 Implementation . . . 22

4.5 Measurable effects with the implementations . . . 28

4.6 Usability tests . . . 29 5 Results 31 5.1 Pre-study . . . 31 5.2 Data . . . 31 5.3 Design . . . 31 5.4 Implementation . . . 32

(7)

5.6 Usability tests . . . 41

6 Discussion 43

6.1 Results . . . 43 6.2 Method . . . 46 6.3 The work in a wider context . . . 48

7 Conclusion 49

7.1 Future work . . . 50

Bibliography 51

A Appendix A 57

A.1 Kalman filter . . . 57 A.2 Quaternions . . . 59

B Appendix B 61

B.1 Test plan . . . 61 B.2 Questionnaire . . . 62

(8)

3.1 The architecture of AR . . . 8

3.2 Reality-Virtuality Continuum . . . 9

3.3 Example of a histogram from Otsu’s threshold method . . . 11

3.4 How prediction and correction works . . . 14

3.5 Safety classification and its subattributes . . . 17

3.6 Subjective satisfaction classification and its subattributes . . . 18

4.1 Extraction process from 3D data to a panorama image . . . 24

4.2 Rotations of the smartphone . . . 25

4.3 Kalman filter . . . 26

4.4 Understanding ARCore with its local perception . . . 27

5.1 View in Unity . . . 32

5.2 View from smartphone with a rendered horizon . . . 32

5.3 Original image . . . 33

5.4 Image after Otsu’s method . . . 33

5.5 Generated horizon curve . . . 34

5.6 Overlaid horizon detection . . . 34

5.7 Generated panorama from mesh . . . 34

5.8 More results from rooftop . . . 35

5.9 Results from boat images . . . 35

5.10 Panorama where the boat images were taken . . . 35

5.11 Image where the generated horizon were too high . . . 36

5.12 Image where the generated horizon were too low . . . 36

5.13 Overlaid horizon detection on buildings . . . 36

5.14 Panorama for house image 3 . . . 37

5.15 Kalman filter result . . . 37

5.16 How the filter handles disturbances . . . 38

5.17 Kalman filter on a low-pass filter input . . . 38

5.18 World orientation with vessel rotation and local orientations . . . 39

(9)

List of Tables

3.1 Average kernel filter . . . 10

3.2 A 1x5 kernel . . . 12

3.3 Edge detection kernel . . . 12

5.1 Frames per second for the application with different smartphones . . . 39

5.2 Allocated time for the horizon detection on Samsung Galaxy S7 with different resolutions . . . 40

5.3 Allocated time for the horizon detection on the three different smartphones with a height of 120 . . . 40

5.4 Battery consumption for different smartphones . . . 40

5.5 Average answer from the participants . . . 41

(10)

(11)

1 Introduction

This chapter presents the motivation and aim behind this thesis. Furthermore, this chapter lists the research questions that will be answered to arrive at the aim. Lastly, there will be some comments on the restrictions in this thesis in the delimitations section.

1.1 Motivation

There is an ongoing breakthrough [43] with technologies that illustrate information. Two of these technologies are augmented reality (AR) and virtual reality (VR). Their new way of visualizing information may be a solution to handle the ever increasing amount and different type of information that is available. AR can be used in many industries and have recently been used mostly in prototyping [43]. AR is commonly thought of an assistant that visualize information in real-time, making it easier to make something correct. A fascinating idea of AR is to use it to save lives [47].

The idea of displaying a real-time augmented experience is not a recent discovery. How-ever, since cell phones have been improving in performance and increasing the number and quality of sensors, a new area of usage for AR has bloomed. The idea of using AR in smart-phones is something that is quite new as it started to make sense when smartsmart-phones with cameras and high computational power came in 2007, such as the iPhone [23]. Smartphones have been improving in performance at a high rate but is the performance nowadays still improving at a high rate?

In the last couple of years there has been a decrease in demand for performance of smart-phones. The speed of improvement that smartphones has, will maybe receive an additional boost when mobile virtual reality, artificial intelligence, and 4k screens [52] have had its break-through. Some argue that we are in the breakthrough right now.

Combitech answers this by investing in AR related projects and a problem that has oc-curred in these projects is that the magnetometer is easily disturbed. This leads to a suffering AR experience when the AR visualization elements is displaced and stutters. This occurs when a correct heading of the smarthpone is required as there is a need for a position in the smartphone to correlate to the positions of objects in the real world.

Combitech has before collaborated with Thomas Porathe which is why it was obvious to follow his work [42] to explore AR in the marine environment. His work is based in Norway

(12)

which has a high boat density and difficult fluctuations on the seafloor which leads to many accidents.

The statistics at sea show that there have been around 32 deaths per year over 2007 to 2016 [6]. The same report illustrates that approximately 20% of the accidents with a deadly outcome comes from boats running aground. Furthermore, the statistics from Redningssel-skapet, Norway’s rescue association at sea, illustrated that leisure boat accidents have in-creased in recent years. There were 249 leisure boat accidents in 2015 followed by 261 in 2016 and 330 in 2017 [5]. One cause for this was illustrated with a survey by Gjensidige, a Norwe-gian insurance company. They noted that 40% of leisure boat drivers at sea in a section in the the middle of Norway could not understand nautical charts [45].

Therefore, AR at sea is a great area to experiment with for the purpose of reducing the accidents. This thesis focus on improving the placement of an AR view that illustrate a red no-go area [42] when the phone is directed at sea where the water is shallower than three meters. The issue is that a smartphone’s orientation is lousy because its magnetometer is not accurate enough which result in fluctuations of the AR visualization elements. The inaccuracy lies in that the magnet field around the earth is not so strong and therefore the magnetometer is easily disturbed. The magnetometer gives lousy results if for instance, the smartphone is near equipment that disturbs the signal such as a boat hull or a phone-case that has a small magnet to keep it closed.

This displacement of the no-go area reduces the user’s trust in the application as this disturbance will cause the AR-visualization on the screen to be offset. This problem with the magnetometer occurs when there is a requirement for better precision of the location, meaning that the world coordinates in latitude and longitude correspond to the same area in the phone. Therefore, other common AR applications do not suffer from this problem as the AR view does not correspond to a specific location.

1.2 Aim

This thesis aims to research and select an appropriate approach to increase the accuracy of AR. This research is meant to contribute to the field of AR to further improve the accuracy of AR on a smartphone, particularly when it is hard to receive correct input to the smartphone from its sensors. Two main approaches will be evaluated separately and the integration of the approaches will be evaluated on the smartphone’s perceived orientation with a usability test.

There have been other studies in the AR field that estimate the orientation of the smart-phone for AR applications. One of the most promising recent approaches to horizon-based orientation is from a joint study by NASA and Stanford University [8]. Their approach is be-lieved to obtain a more correct horizon, but there is no indication of how long it took for the horizon to be detected. Therefore the difference in conditions further reduced the inclination to use their method as they used a slow movement vehicle with an independent Graphics processing unit (GPU). There is no information about which GPU was used so the processing resources was assumed to not be available on a handheld device.

Another approach to improve an AR visualization is to filter input of the orientation from the sensors in a smartphone. The Kalman filter is a filter that has been proposed in ear-lier studies to reduce the impact of disturbances in signals. A previous article tested how a Kalman filter estimated three rotations [61]. The Kalman filter in the article only followed a step function and reacts similar to a low-pass filter, but still illustrates a useful scenario of using a Kalman filter for lowering the impact of disturbances on rotations.

This aim can specifically be traced back to a paper released in 2012 by Michael Geryautz and Dieter Schmalstieg [21]. When the potential of AR applications are discussed it is ob-served that:

(13)

1.3. Research questions

The most significant technical challenge on mobile devices is measuring the camera’s pose (that is, its position and orientation) in relation to relevant objects in the environment.

The more specific aim is to improve the accuracy on the application and it has been de-cided that two main approaches will be researched:

• Improve the accuracy by the use of horizon-based orientation to extract the horizon with image processing techniques. This horizon is then matched against a rendered horizon from 3D data.

• Filter data which seem faulty in the input.

The thesis further includes a test, to see the changes caused by the implemented algo-rithms. This test will compare the accuracy of the original approach with the modified ap-proach in this thesis. This test will be a visualization of the generated mesh from the point cloud set from Lantmäteriet [30], a Swedish authority for property formation and geographi-cal information. The mesh illustrates the surface of the environment and this will be overlaid on the camera to know that we have the correct information of where we are.

Furthermore, battery consumption, frames per second, and time for the matching of hori-zon will be recorded in order to see the consumption of the implementations. The outcome of this thesis is an application that achieves a high accuracy with the help of the three ap-proaches that can be seen in the list above. The added functionality will also be evaluated by a usability standpoint on how the application can more easily be trusted.

1.3 Research questions

This thesis answer four research questions:

1. Can horizon-based orientation be realized so that the generated horizon matches against a 3D mesh with the correct heading? If the heading from the matching is in-correct, how can the application and user know?

2. How does the heading of the smartphone’s magnetometer compare to a heading that is altered with a correctly applied filter?

3. How will the implementations affect the battery-life and the CPU usage of the smart-phone?

4. How will the application with all the approaches integrated compare against the com-mon orientation vector with magnetometer in Android Studio in usability?

1.4 Delimitations

This thesis only looks at a stand alone smartphone. There will not be any other equipment involved, such as a solution with an external compass.

(14)

(15)

2 Background

This chapter involves a background to the application and information regarding the basic elements in this thesis.

This thesis focuses on how to improve the accuracy when the AR in a smartphone is rendered at sea. The application is motivated by an increasing number of accidents at sea and how a smartphone’s orientation can be improved.

2.1 The application

The application is a prototype on how a solution that uses AR can look like in order to reduce casualties at sea. The focus of this thesis is specifically on how to improve the smartphone’s accuracy of its rotation compared to the real world. The idea behind the application originates from Thomas Porathe’s article: 3-D Nautical Charts and Safe Navigation [42]. This article further presents how the cognitive workload would be reduced with a navigation system that can be displayed in real time with red no-go areas.

This thesis will improve the visualization of red no-go areas that would be marked if the bottom of the sea is shallower than three meters. The augmented image on the smartphone should display the bottom of the sea correctly. How it is evaluated for the image to be correct can be read in the method chapter.

2.1.1 Requirements

The application has a target group that will use this application in situations where it is im-portant that the user feel secure and can trust the application. The correctness of the visual-izations is therefore important due to the sense of security it contributes to the user.

2.2 Geographical data

The coordinate system that is used is the SWEREF 99 TM together with the height system of RH 2000 which comes from Lantmäteriet [30]. They are Sweden’s official reference systems for the plane and the height system respectively. The data has an accuracy of 0.1m on a hard-surfaced area and readings display that the average standard error on a hard-hard-surfaced area is

(16)

0.05m. The standard error in plane is on average 0.25m, however the accuracy of height in a single laser point can differ more if the terrain is inclined.

2.3 Data from geodata

The data from Lantmäteriet came as a LAS dataset, which had to be converted to an OBJ file format that can be accessed by Unity. LAS is a data format that contains a point cloud. It is an industry-standard binary format for storing airborne lidar data [59]. Lidar stands for Light Detection and Ranging and is a technique for measuring the distance to the surface of a distant object [7]. Lidar measures the distance to an object by illuminating the object with a light source and determine the time it takes between the emission and the detection of the reflection. The approach of lidar is the same type of approach as radar technology and is an optical remote sensing technique.

2.4 Combitech

Combitech is a technical consult firm that operates in Scandinavia. Combitech has the intent of being at the front in recent technologies. Therefore they have a department called Reality Labs that is researching and experimenting on new technologies. The short projects that experiment with the new technologies have the aim of both attracting and testing the new technology towards customers. AR is a very hot topic in their section and they have often worked with AR in the last year.

(17)

3 Theory

A horizon in this thesis is found by first transforming an image to grayscale and blurring the image. Afterwards every pixel in the image is segmented into either black or white. Where the black pixels are the ground and the white pixels are the sky. Then the horizon is proposed to be where the white pixels meets the black pixels, starting from the top. The chapter is divided into five sections which are: augmented reality, horizon-based orientation, filter, sensors and usability.

3.1 Augmented reality

In previous decades augmented reality has been something from science fiction but has now become known worldwide. The popularity of AR may be a result from enthusiasts of science fiction. However there exist some great features that AR have, such as visualization. With the help of a great visualization tool, the future can look brightly on the idea of visualization in education [50], medicine [47] and in the industry [43].

Augmented reality was discussed decades ago and got its definition in 1994 [36]. The first prototype of a hand-held device comes from an article in 2003 [57]. The study tested and saw that it was possible to make a portable AR prior to smartphones. The study used a Hewlett-Packard PocketPc and a 320x240 camera to create a road map application that had a 3D compass that displayed where to go. This can be said to be the first mobile augmented reality (MAR) which is defined as the augmented reality in mobile devices where you can access the AR experience anywhere you are [39]. MAR has been going very well because smartphones have much location awareness with the help of its sensors - for instance ac-celerometer, gyroscope and GPS.

According to Singh and Singh, AR is defined as a view of the real, physical world that include additional information that enhance the view [51]. The augmentations is the extra data that is incorporated in the view. The information can consist of any data type such as text, 3D object or video.

The AR architecture that is meant for the smartphone is illustrated in figure 3.1 and has its original from [51] but have been modified by the author.

One can see that first there is an observed reality which takes a scene into a reality sensor (camera). This image is later passed into a trigger matcher (pattern) with metadata such as geographical identification. The metadata is then passed along to an augmentation selector.

(18)

Figure 3.1: The architecture of AR

This augmentation selector also obtains additional data that is required and creates the aug-mented image that should be outputted. Lastly the augaug-mented image and the original image is combined into the presented reality. In some contexts this procedure is simplified into: sensors to processing and then processing to display.

This architecture is simply the foundation that is required to present an AR view. What is even more important is an AR experience that is to one’s satisfaction. To achieve an adequate AR view, there are five characteristics that can be evaluated.

1. Firstly there is a demand for high-quality sensors that interacts with the environment, therefore the sensors that are involved can be looked into further.

2. Trigger matching and image augmentation is required for the scene to be understood, this in order to apply the correct augmentation for the environment. This is often done with image processing (IP).

3. User interaction is also an important aspect as there are several additions you can have to improve the AR experience further. For instance, eye-tracking, gesture, speech and touch.

4. The fourth way is to improve the information infrastructure such as using cloud ser-vices to obtain data.

5. AR needs high computational power and a great communication infrastructure to bind everything together.

One of the large factors of why AR is not being pursued that hard is the requirements that are put on the environment that the AR should be running in. For example there is often a situation where brightness can make the AR experience worsen, because known features are not found. The user that gets a bad AR experience will not return to the service [39].

AR can be used and implemented on smartphones. The processing power of a smart-phone (iPhone 6) is 120,000,000 times faster than the single best computer that sent people

(19)

3.1. Augmented reality

to the moon and back [44]. The Samsung Galaxy S7 which was used in this thesis has even better processing power than the iPhone 6 [15], giving access to a powerful device with a camera that can be used for AR. The AR view is directly applied to the camera view, enhanc-ing the view. There are many AR applications available on the market where the replacement of facial features and showing a 3D model of an object are the most common. However, it is hard to distinguish what counts as AR which is why it is described below.

AR can be split into two cases, see-through AR and monitor based AR [36]. See-through displays interact with the real environment directly with some extensions applied on the environment. The problem that see-through displays encounter is that they require accurate head tracking with great calibration and an acceptable field of vision. An example of this is the Microsoft HoloLens which is a head-mounted display that is see-through. The HoloLens can be seen as a pair of bigger glasses that fits around the head.

The other case is a monitor based AR experience which is a display where the visualiza-tion of the augmented image is overlaid on a video that is either in real-time or a prerecorded one. This case have the most focus in this thesis as it involves the use of a cell phone with a camera that is directed at the sea to visualize the shallow.

In Figure 3.2 one can see the image of Milgram’s Reality-Virtuality Continuum [36]. The figure illustrates how AR and VR are related to each other by being some distance away in the spectrum. The virtual environment is where VR is positioned and consists of an environ-ment that is based exclusively on virtual objects that are computer generated. This computer generated environment can have either a realistic surrounding or an abstract environment where physical laws no longer operate [39].

The real environment to the left and the virtual environment to the right are the opposites that are mentioned. The continuum displays the difference between AR and VR.

Figure 3.2: Reality-Virtuality Continuum

3.1.1 Complications of AR

There are some hindrances that is keeping AR to really break through. A list from Singh and Singh [51] shows that there are some technical aspects to evaluate and other obstacles that exists which are needed to be overcome. To fully experience an AR view, a see-through head mounted display (HMD) is required. The integration of a HMD is hard and is illustrated with the instance of Google glasses, introduced in 2013 [38]. This technology was thought of to be before its time in its features.

The obstacles that follows are not specifically directed at Google glasses. However they are the most well-known HMD equipment that was meant for commercial use at its release. The obstacles that are blocking the breakthrough of AR are:

1. A stigma that it looks weird and nerdy to go around with a HMD. 2. The HMD is too expensive.

3. It is hard to have the HMD on at times.

4. Some people experience simulation sickness [28], often coupled with rapid movements in the augmented image.

(20)

5. "Changing behavior is much more challenging than changing technology" [10] is what a chief executive and founder of a design company commented, who is involved in designing wearable devices.

Some of these hindrances will decrease faster than others over time as the expansion of technology still continues, which the majority of the population in U.S.A. believes it will [53]. Then time and continuous research will lead to smartphones and its components to be increasingly sophisticated to deal with these hindrances. The hardest hindrance is to reduce simulation sickness which requires eye tracking and haptic feedback such as vibrations. The eye tracking is there to help the eyes focus on a particular point and minor vibrations that simulate physical experiences.

3.2 Horizon-based orientation

This section involves the extraction of the horizon in the image and matching it to the horizon that is generated from the 3D environment. This section can further be divided into two parts, namely image processing and matching.

3.2.1 Image processing

This part consists of the theory that regards the Image Processing (IP) techniques that were used. In order to extract the horizon of the two images, there were some methods involved.

Preprocessing

Before one wants to use an image to process there are some preprocesses that yield a better end result if they are done. The first that was done is translating the colored image to a grayscale image. The standard formula that is used to translate a colored pixel that consists of three colors, red, green, and blue to a gray pixel is [33]:

gray = 0.299¨ red +0.587¨green +0.114¨blue

The other preprocess was to filter the image with an averaging kernel which is a matrix that is shown in Table 3.1. The result of applying the kernel gives a blurring effect on an image [29]. This blurring effect will reduce the impact of noise that can be for instance a bird or a tree.

For each pixel in the image the kernel is placed and multiplied with the pixel’s adjacent pixels in order to obtain the targeted pixel’s averaged value. In other words, the pixel at (100,100) will have an averaged value that take into consideration all the pixels between (99-101, 99-101).

Table 3.1: Average kernel filter 1 9 19 19 1 9 19 19 1 9 19 19 Otsu’s method

After the normalization was done on the image, the next step was to segment the image into black and white. For this, Otsu’s threshold method was used [40]. Otsu’s method is about making a histogram which is a diagram of color value of 0 to L on the x axis and the number of pixels that have the color value in the image on the y axis. L is often represented by 255

(21)

3.2. Horizon-based orientation

in Color32 format. Otsu’s method has the purpose of choosing a threshold value in between two clusters of color values.

The theory behind Otsu is the following: Let the total number of pixels be N=n1+n2+ ...+ni, where ni is the number of pixels with value i. The histogram can be regarded as a probability distribution calculated as:

pi = ni

N (3.1)

The whole probability distribution of (3.1) is normalized by satisfying: L

ÿ i=0

pi =1 (3.2)

The threshold T that Otsu’s method generates will divide the pixels into two classes, C0 and C1. C0contains the pixels that have a gray-level in [0, T] while C1contains the pixels that have a gray-level in [T+1, L]. With these two classes, one can calculate the mean values of them both with:

¯ u0= T ÿ i=0 ipi w0, ¯ u1= L ÿ i=T+1 ipi w1 (3.3)

Where w0=řTi=0piand w1= řL

i=T+1pi.

With these equations formulated, one can calculate the variance between the classes with:

σ_b2=w0(u¯0´u¯T)2+w1(u¯1´u¯T)2 (3.4) The best threshold T in [0, L] causes ¯uT to be the one that maximize the variance (σb2) between the two classes [25].

A histogram generated from the pixels values in the application can be seen in Figure 3.3, which indicate that Otsu’s method would select a value around 150 as the threshold value with the highest variance. Whereas the lowest variance can be found around 0 and 255

Figure 3.3: Example of a histogram from Otsu’s threshold method

Extracting the curve

A kernel is also used to analyze patterns and is in this case a detection of a pattern in an image, however it has several uses in other areas [48]. To understand a pattern is to understand the data’s relations, regularities and/or structure. This pattern is the foundation for doing

(22)

predictions on where the pattern is supposed to be situated. An advantage of kernels is that they do not require coordinates to function as they look at the adjacent pixels. Another advantage of kernels is that their methods can be applied to a wide range of data. However, one should bear in mind that by using larger kernels, there will be an increase in computation cost [31]. The kernel in Table 3.2 can be used to generate a match if two pixels above and two pixels below the searched pixel have opposite values. A match would is generated if the outcome of the kernel is zero.

Table 3.2: A 1x5 kernel 1 1 0 -1 -1

An edge detection kernel was used in this thesis which looked like the kernel in 3.3. Table 3.3: Edge detection kernel

-1 -1 -1 -1 8 -1 -1 -1 -1

3.2.2 Matching with cross-correlation

The horizon curve that is extracted from the image is to be matched with the true horizon that is generated from the geographical data. This is required in order to know the heading of the smartphone. The matching of the two horizons are done with normalized cross-correlation (NCC). Cross-correlation provides a measure of similarity between two signals, and by nor-malizing the cross-correlation one gets the value of similarity to be in [-1, 1], where a value of 1 corresponds to the exact same curve and -1 being the exact opposite. The exact opposite of a curve is the same as the exact opposite of a line between two points. A line that increases by one in height when it increases by one in width has the exact opposite line that decreases by one in height when it increases by one in width.

The formula for obtaining the cross-correlation in 1D is altered from [32]: řN´1

i=0 xiyi

N is the length of the patterns x and y that is matched and a pattern’s value at point i is the height of the pixel. This correlation calculates a similarity between two patterns, however this similarity is illustrated in numbers and therefore hard to comprehend. To have a more standardized value, one normalize the correlation with:

ř_N´1 i=0 xiyi ( b řN´1 i=0 x2i)( řN´1 i=0 y2i)

The final equation (3.5) that was used describes when pattern y is larger than pattern x. řM´N+1 j=0 řN´1 i=0 xiyi+j b řM´N+1 j=0 řN´1 i=0 x2i řN´1 i=0 y2i+j (3.5)

M is the length of pattern y. M ´ N+1 is the difference plus one to go through the sum of j at least once.

(23)

3.3. Filter

NCC can also be used on a 2D image directly, where it measures the correlation of each pixel in the image. However, to correlate a whole 2D image is computationally costly [24] and the mesh data that the image is matched against is not a 360 video of the real environment with textures.

3.3 Filter

This thesis used two types of filters, the Kalman filter and a low-pass filter. These two filters are illustrated and explained below.

3.3.1 Kalman filter

The Kalman filter was introduced in 1960 [26] and has since been one of the most used fil-ters to predict a forthcoming state. The Kalman filter algorithm constitutes of a loop that sequentially predict the next state and after an update of the measurements are done, an out-put estimate is produced. The Kalman filter has the advantage of being able to be used live because of its small computational cost.

The Kalman filter handle uncertain noise with an approach similar to a Bayesian optimum solution [61]. Kalman filters are often seen in navigation systems and in computer vision, making it a great candidate for this thesis. The goal of the Kalman filter is to produce an output signal that is as close to the ground truth as possible. The input to the Kalman filter in this thesis is the rotation in degrees of the heading that is one of three rotations from a quaternion which can be read further about in 3.2.3.

A general definition of the Kalman filter is explained together with Figure 3.4 where a read value is marked by a black cross. Each reading has an uncertainty, which is not displayed in the Figure. The blue cross is the prediction of the next state in the Figure, where the next prediction is based on the previous value. It is possible to predict several states in the future. However, the uncertainty would then increase in size considerably for each step.

After the prediction is made, the measurement is obtained and with these two values, a correction is made. The red line between the predicted and measured value is where the corrected value will be selected. If the predicted value is close to the measurement, the un-certainty will decrease. The distance the corrected value is selected on the red line is decided by the Kalman gain. The Kalman gain can be a constant or a variable that changes over time that decides which value to trust the most between the measurement or the prediction. The Kalman gain is a number between 0 and 1 which is weighted by the distance between the measured and predicted value.

The mathematical foundation for the Kalman filter can be read about further in Appendix A.

(24)

Figure 3.4: How prediction and correction works

3.3.2 Low-pass filter

A low-pass filter allow signals that are of low-frequency to pass through easily without being modified much, while the higher frequencies are altered significantly [22].

This thesis focused on change over time, therefore the filter took into consideration the previous value and then altered the new value based on the previous. The formula for the low-pass filter is:

it=0.75 ¨ it+0.25 ¨ it´1 Where i is the input from the sensors and t is the time step.

3.3.3 Quaternions

The quaternion number system was often used throughout the thesis and was primarily im-portant to understand when there was a transformation around the y axis (pitch value) of the image. If one wants to rotate 3D-objects, the rotation is done with quaternions. Furthermore, a quaternion is used in the filter when the data is inputted to the gyroscope, accelerometer, and magnetometer.

A quaternion is similar to a complex number except that a quaternion has three imaginary terms. The first constant represent a scale (a in q) and the other three terms that contain imaginary terms represent a vector (b, c, and d in q) [56].

q=a+bi+cj+dk

The vector is the one that was focused on in the filter as every gyroscope, accelerometer, and magnetometer uses quaternions for its position and rotation in the world.

There are two other popular ways to handle rotations in 3D. These are Euler angles [56] and 3x3 rotation matrix [56]. Euler angles have a problem with gimbal lock and use a low memory by the storage of 3 numbers. Gimbal lock arises when two rotation axes align with each other, therefore the remaining rotational axis is locked to only rotate in its own axis and cannot always produce the true angle [19]. A 3x3 rotation matrix store 9 numbers but does not have the issue of gimbal lock. Lastly, a quaternion store 4 numbers and do not suffer from gimbal lock, making it a balance between the two.

(25)

3.4. Sensors

3.4 Sensors

A smartphone has four important sensors that can be used in AR: gyroscope, accelerometer, magnetometer, and geographic positioning system, where the three first belong to the micro-electromechanical system (MEMS) group [37]. A MEMS often consists of mechanical parts. However, every MEMS do not have moving parts. A MEMS could be a few millimeters in size but can be much smaller. The purpose of a MEMS is to observe something and later react to it [37]. The reaction can for instance be set to update a value at a fixed frequency or make an operation if the observed value is over a specific limit.

3.4.1 Gyroscope

The gyroscope uses the force of Earth’s gravity to determine the orientation of the smart-phone. This orientation occurs fast but has a lot of drift. The gyroscope in smartphones measure rotation around three axes x, y, and z. The gyroscope today is an optical gyroscope that use lasers in an elipse of fiber optics [49].

3.4.2 Accelerometer

An accelerometer measures an object’s changes in velocity over time. Everyday life consist of many accelerations, therefore an accelerometer have a wide area of uses. This orientation is slow but is instead more precise compared to the gyroscope. The accelerometer fused with a gyroscope is a good approach to measure the local orientation of the device.

The measurement of acceleration in an accelerometer can easier be understood by imagin-ing a weight attached to a strimagin-ing from the ceilimagin-ing of an airplane [41]. This weight will measure the acceleration depending on its position, meaning that if the weight is exposed to a nega-tive acceleration, it will move to the front of the airplane. With the position of the weight, an angle can be obtained from the offset from its resting position. The angle can then be used to calculate the acceleration of the plane. Accelerometers in smartphones are designed to act on vibrations that occur with accelerations [18]. These vibrations are then read by microscopic crystals which in turn induce a voltage to the measurements.

3.4.3 Magnetometer

A magnetometer measures the strength and direction of magnetic fields. Magnetometers have many daily uses that many are unaware of but a simple magnetometer is often known as a compass that detects true north based on a magnetic field [55]. For a smartphone to obtain a local direction of magnetic north, it has three magnetometers, fixed perpendicular to each other.

3.4.4 Global Positioning System

GPS have three components that forms it: the space segment (satellites), the user segment (receiver), and the control segment (control stations). The receivers obtains signal information from the satellites and apply trilateration [27].

A signal contains three types of information: a psuedorandom code, ephemeris data, and almanac data [46]. The psuedorandom code is the identity of the satellite that transmitted the information. The ephemeris data contain information if it is healthy or unhealthy and its timestamp. The almanac data is the orbital information for the satellite and other satellite members in the system.

The receiver can then calculate the time it takes for a satellite’s signal to travel to the re-ceiver with the help of the signal information. This time times by the speed of light will

(26)

retrieve the distance and with the help of two more satellites, trilateration can be done. Tri-lateration uses the signal from three satellites to identify the location of the phone [46]. How-ever, a minor error of one-thousandth of a second in the clock of the receiver will result in a position error of 320 kilometres. This error can be fixed by adding another GPS signal which will result in a polygon and the center of this polygon is where the receiver is.

3.5 Usability

AR has a risk of showing to much information which can overwhelm the user and reduce the efficiency of AR [51]. This can become a problem if several augmentations are taking place and the user cannot differentiate what is important and what is not. Furthermore, the purpose of this application is to help persons at sea, therefore it needs some extra thoughts regarding how to make it user-friendly. The main concern is how to easily transition to an executing state where the image processing is made without disrupting the user’s experience.

3.5.1 Requirements for mobile augmented reality

A user study that was done on AR in shopping centres [39] found crucial requirements on MAR devices. The relevant requirements from that study that can be used in this thesis is displayed below.

Privacy was a big concern and a specific restriction on how information was dealt with was desired to be regulated with user control. The thought of having information stolen was of a big concern as the information contains what the person has actually looked at. Not only is the visual information important to secure but also the location data with all of its metadata. There is furthermore an unease regarding the increasing risk of physical safety for the user when they are highly immersed in the AR environment. An instance of this would be to collide with nearby objects or fall overboard. Therefore, the service that is provided have to be aware of and reactive towards the environment. This lead to an augmented image that do not contain too much information. There was an additional requirement that the content had to feel useful for the user which is similar to not displaying too much information.

The most important requirement for this thesis was reliability of the information. People desired that the information was up-to-date and valid, e.g. how much it could be trusted. An important aspect is to have a trustworthy source that can assure the user of its credibility.

3.5.2 Usability attributes

The scenario that this thesis is focused on is when the user takes up the smartphone and looks around the environment. The application has to be trusted by the user, which encourages and demonstrates the benefits of a usability evaluation. Therefore there is a usability evaluation in chapter 5. The theory behind the evaluation is based on the article "Usability: Critical Analysis and Taxonomy" [2].

Usability is often characterized with imprecise and overly ambiguous terms leading to dif-ferent evaluation depending on the person [2]. Therefore if one mentions usability, a proper structure of the term is required. The article’s definition of usability is divided into six at-tributes: knowability, operability, efficiency, robustness, safety, and subjective satisfaction. These six attributes are the foundation that the application is evaluated for, where the last two attributes were regarded as extra important in the implementation and therefore has more focus. The attributes and their subattributes are:

Knowability

Knowability is defined as how the user can understand, learn, and remember how to use the application. Its four subattributes are: clarity, consistency, memorability, and helpfulness.

(27)

3.5. Usability

Operability

Operability is defined as the system’s capacity to provide users with the necessary function-alities. Its four subattributes are: completeness, precision, universality, and flexibility.

Efficiency

Efficiency is defined as the system’s ability to generate results compared to the resources that was used. Its four subattributes are: efficiency in human effort, efficiency in task execution time, efficiency in tied up resources, and efficiency in economic costs.

Robustness

Robustness is defined as the ability to resist error and unfavourable situations. Its four subat-tributes are: robustness to internal error, robustness to improper use, robustness to third part abuse, and robustness to environment problems.

Safety

Safety is defined as the system’s capacity to prevent risk and damage. Its three subattributes are: user safety, environment safety, and third party safety. Considering that the thesis focus extra on this attribute, this attribute is further described and every subattribute can be seen in Figure 3.5. It is drawn by the author but the design is from [2].

Figure 3.5: Safety classification and its subattributes

• User safety, the ability to avoid risk and damage to the user while the system is used by a user. User safety can be divided further into: physical safety, legal safeguarding, confidentiality, and safety of user assets. These avoidance attributes are the same for third party safety.

(28)

• Environmental safety, the ability to avoid risk and damage to the environment when used.

• Third party safety, the ability to avoid risk and damage to other people while the system is used by the person.

Subjective satisfaction

Subjective satisfaction is the system’s ability to produce satisfaction in the user. Its two sub-attributes are: interest and aesthetics. A more extended illustration of all subsub-attributes can be seen in Figure 3.6. It is drawn by the author but the design is from [2].

Figure 3.6: Subjective satisfaction classification and its subattributes

• Interest, the ability to catch and maintain user’s awareness and intellectual curiosity. • Aesthetics, the ability to entertain the user in sensational terms and there are five

sub-attributes that the sensation can be divided in.

3.5.3 Related work

There were not any work found in this area that had an approach similar to this thesis. How-ever, there have been several parts that have been picked from different articles.

There were many possible approaches to choose from to extract the horizon where most of them were used in unmanned aerial vehicles (UAVs) in order to stabilize and estimate the altitude of the UAV. An article detected the horizon with the use of psuedo spectra images (PSI) [58] with okay results, but their purpose was to illustrate that analyzing an image’s PSI works to identify specific entities.

Another article from NASA with collaboration with Stanford University [8] used both the extraction of a horizon and matched it with 3D data. The article focused on navigating on other planets with the use of the horizon in extreme environments where information and signals are restricted. The article showed great results but their is no indication of how long it took for the horizon to be detected. Therefore the difference in conditions further reduced the tendency to use their method as they used a slow movement vehicle which did not have any listed performance requirements.

The method that was picked was in an article named An Improved Algorithm for Horizon Detection Based on OTSU [25] which found a horizon with the same method that this thesis used. The method in the article was used as a foundation for an UAV’s altitude estimation

(29)

3.5. Usability

and stabilization and proved to be efficient. The difference is that this thesis also used sensors to improve the method further which made it possible to rotate the image so the horizon always was parallel to the image. Furthermore, the input from the sensors could with the help of the x rotation know if the horizon was to low and instead use the previous generated pixels for the horizon curve. This was done to reduce glares caused by the sun reflecting on the sea.

This thesis differ from other popular AR experiences, such as an AR experience that put out models that interact with the real world. This AR visualization is growing fast and a study made by Eric Marchand et al. displays the underlying technique for pose estimation [34]. The study shows how estimating the pose of the smartphone in relation to the environment is done and how to keep track of where the object is positioned. ARCore uses this kind of pose estimation to position objects locally. However, in this thesis ARCore is only used for its local orientation and not for placing out objects locally.

Furthermore, this thesis differ from articles that detects a straight horizon line when de-tecting a horizon which is done for instance in [20] which displays five different methods to detect such a straight line in a marine environment. This detection is often done in order for a unit that takes an image can stabilize and know an approximate of the altitude of itself. Instead, this article generates a curve to show the contours of the horizon in order to match it against 3D data. A similar horizon line that can be obtained with the use of sensors which was done in this thesis.

There have also been studies of the Kalman filter previously and there are articles where the Kalman filter had been applied to estimate three rotations in [61]. The Kalman filter in the article only followed a step function and reacts similar to a low-pass filter, but still illustrates a useful scenario of using the Kalman filter. Another article showed that a Kalman filter can be used to estimate altitude change during a fall [54]. The article used the Kalman filter on two different pressures and used the maximum of the difference to predict a fall or not. The conclusion was that a Kalman filter was more responsive compared to a moving average filter.

The matching was done with normalized cross-correlation which is a common method when checking if signals are similar and there is no related work that uses it in a different way. Furthermore, there have also been parts where no articles explained a process, for instance how to make a panorama image from a 360 degree view.

(30)

(31)

4 Method

This chapter is divided into six sections: pre-study, data, design, implementation, usability tests, and measurable effects with the implementations. These six sections involves the six phases that took place in this thesis. The sections are sorted in the same order that they occurred.

4.1 Pre-study

The pre-study comprised a literature study to gather the resources that were required. The part of gathering the resources were performed in a structured way to find as much relevant resources as possible. The pre-study was carried out at EBSCO’s [17] research database to-gether with Google Scholar. EBSCO was the main website for research and Google Scholar was used to find popular references in the Augmented Reality area.

This pre-study also took into consideration the requirements in the background chapter when the literature study took place. The requirements can be simplified into high visualiza-tion accuracy on a smartphone comparable to Samsung S7.

Both the filtering of the input and the extraction of the horizon had the highest priority. The resulting approaches from the pre-study that was determined to be best suited for this thesis were then further researched. When the research was ongoing, there were some arti-cles that was regularly referenced and these were further looked into. These artiarti-cles mostly involved definitions and new ideas that were tested for the first time. These articles were used as a foundation for the AR research in this thesis.

4.2 Data

The geographical data from Lantmäteriet came in squares with UTM coordinates and had a side of 2.5km. Data came as LAS files of 300 Megabytes per each square, it was later converted to .OBJ format in CloudCompare [13]. The file size was further reduced to 20 Megabytes by subsampling the mesh to use vertices that had approximately a minimum space of seven me-ters between each other. The original data was very precise and if there were no subsampling, the mesh allocated 2 Gigabytes per square. Afterwards it was tested that in order for the mesh to work in Unity with the correct transparent material, the limit was somewhere between 80

(32)

Megabytes and 40 Megabytes. Furthermore, after the subsampling, there were some vertices that stood out in the mesh which was removed in Meshlab [35].

The coordinate system was required to be altered in order for the center to be in the origo in Unity. Therefore a global shift was introduced in the mesh for it to be translated correctly in Unity. The result was that the mesh was in the center and one unit length in Unity corre-sponded to one meter in the mesh. The center of the mesh had to be translated and be known. Therefore the UTM coordinates for the center of the mesh was transformed to latitude and longitude for it to correspond to the GPS input on the smartphone.

4.3 Graphical design

The graphical design is about the prototype application that will be the end-product. The design was developed based on the theory, pre-study and the requirements. The final result in this thesis is a prototype, therefore references on how one should think of when designing a prototype has been taken from [9]. The article raises two important questions that have been taken into consideration when the implementation has taken place in this thesis.

• What is the essence of the existing user experience? • What are essential factors that the design should preserve?

These questions are appropriate for a prototype that provide a proof of concept that re-alizes a theory into something practical. The first question was easy to work with as it is clear how there was a requirement for a better user experience where users could trust the application.

The second question is answered by that the smartphone is the only component that is required for the application. Furthermore, usability should be highly considered when de-veloping applications that uses AR. Therefore there are some principles that should be taken into account which is presented in the theory chapter that regards the requirements for mo-bile augmented reality.

4.4 Implementation

The implementation can be divided into three smaller sections, one for each task that was evaluated. The order of the following subsections are ordered so that the first approach is the first that was started and estimated to take the longest time. Most implementations were done in C# in Unity and if something was implemented anywhere else, it is declared in the text.

4.4.1 Extracting and matching the horizon

This subsection regards the extraction and matching of the horizon. To easier understand the steps below, every image can be represented by a matrix where every pixel in the image corresponds to a row and column in a matrix. This section answer the first research question if horizon-based orientation can be realized so that the generated horizon can be matched with a correct heading.

A method chart of how the process works is illustrated in the list below: 1. First render an image of the screen without augmentations.

2. Rotate the image so the horizon is parallel to the the phone based on gyroscope and accelerometer.

3. Observe the rotation in the x rotation for a cutoff height that the horizon can not be under.

(33)

4.4. Implementation

4. Extract the horizon curve for the image.

5. Generate the panorama image and its horizon curve based on the coordinates that the smartphone has.

6. Match the two curves.

Preprocessing

First, the image was transformed to gray-scale because it is faster to write and read a sin-gle pixel value, compared to three values that a color pixel have (red, green and blue) [29]. Furthermore, the Otsu method that is presented below is easier to calculate in gray-scale.

After the conversion to gray-scale, an averaging filter was used, which has the effect of blurring the image [29]. The idea of a mean filter is to take the average of the adjacent pixels in order to reduce noise and have smoother transitions of the color values.

Otsu’s method

The gray-scaled image was segmented into either black or white depending if the pixel value was below or above the threshold value. When the decision to implement Otsu was taken, the direction to take was to use Bitmaps from the Bitmap Class in System.Drawing in the implementation. That decision was based on that there were numerous examples online and it was a simple and effective way to extract pixel data as the image is seen as a matrix of pixels. However, the Bitmap class was only available for Windows and not for Android so instead Texture2D was used which is a class in the UnityEngine.

The thresholded value from Otsu’s method is calculated by searching the graph from both sides until the largest cluster from the right and the largest cluster from the left is found. The middle of each cluster determine a mean point and the two mean points for both clusters are added together and divided by two to find the highest variance.

Extracting the curve

The first step was was to retrieve two types of images. The first image to be fetched was of the real image of the environment. This image was to be matched against a mesh that depicted the environment. Therefore, it was necessary to receive a panorama image from the mesh. This was done by extracting a small rectangle in the middle of the 360 view to avoid scaling issues. The rectangle was then added onto a wider rectangle that represented the panorama image which can be seen in Figure 4.1. After each rectangle was extracted there was a rotation of the camera in the 360 view and another rectangle was added to the panorama image.

The panorama image was further extended by the width that the degree of the image in the implementation represented. This is required so the cross correlation could also find a match in that space. The panorama image was also set so the first pixel represented the true north direction in the map. Scaling was also an important aspect in the cross correlation, therefore much thought was put into the scaling for the image and the panorama.

The actual extraction of the curve was done with the help of a 1x5 kernel. There was a risk with extracting something else, so the kernel was used to reduce the noise [31]. The increase in computation cost compared to a 1x5 kernel was negligible as the image is small.

Unfortunately there was not an actual test at sea for the application. However, the test was instead simulated on the computer by using images that were taken just for this cause by someone that was on a boat. The images had location data from where the image was taken which made the simulation work the same. The latitude and longitude in the image was used to place the camera on those coordinates which then depicted the mesh’s surroundings.

For the rotation and filter, an updating variable that included both the gyroscope and the accelerometer rotation vector was retrieved for the usage of the y rotation. It was considered foolish not to use the other components of the rotation vector when extracting the horizon.

(34)

Figure 4.1: Extraction process from 3D data to a panorama image

In Figure 4.2 there are four images which illustrates the rotation of the smartphone. The first image in the upper left is the starting image, and to the right of it, the z rotation has been altered. The lower left image is where the x rotation has been altered, and the lower right image displays an adjustment in the y rotation.

The x rotation was used as a check to see that the extracted horizon was not under a certain line that was extracted from the rotation vector. This was thought to be useful to reduce the impact in an image where the reflection of the sun was intense. If the generated pixel were under the estimated line from the rotation vector, the previous pixel value in the array was used instead. The z rotation was used to rotate every image so the horizon would approximately be parallel and in the middle of the screen all the time. The rotation further helps to better fit the panorama image.

There was a test on how the application would perform on buildings. In order to do this, a smaller block of 3D data was used to increase the accuracy of the data. Furthermore, the option of how near the 3D mesh would be generated was altered from 100 to 5 meters.

The curve of the horizon on the panorama image was easy to generate as it was generated by the application and the mesh had a different color value compared to the background. Therefore, it was only necessary to go through the image, column wise until it noticed a change in value. The pixel where the change in value occured was then allocated in an array which defined the horizon for the panorama image. Both curves are stored in arrays where each cell has the height value of the curve.

Matching with normalized cross-correlation

When both of the curves was in an array each, matching was done between the arrays to identify where the curve had the highest similarity. The matching value was ranging from -1 to 1 after it was normalized. It measures the similarity between the curves. This value is used in the context of usability, for instance to display a certainty of the outcome.

Edge detection

The edge detection kernel was was applied on the image right after Otsu’s method. The resulting pixel would be black if the current pixel multiplied by eight, subtracted by the

(35)

4.4. Implementation

Figure 4.2: Rotations of the smartphone

adjacent pixels was higher than the Otsu’s threshold divided by 12. This edge detection was done as a test on how it would look like and display how the approach could be improved. The threshold that came from Otsu’s threshold had no scientific reasoning and was used instead of just using 0. Because when 0 was used it generated much more edges.

4.4.2 Filtering sensor values

The input that comes from the sensors on the smartphone was decided to be filtered. There are two types of filters, a simple low-pass filter and a Kalman filter that have a heuristic that is modified according to this problem. This part of the method will answer the second research question on how the orientation of a magnetometer compares to a filtered orientation.

Kalman filter

This section treats how the results from the gyroscope, accelerometer, and magnetometer were filtered. This sensor data comes as a quaternion from Android Studio which is fused and is updated on a frequency of 0.06s. The Kalman filter was required to take in a quaternion that contained the readings of the magnetometer, gyroscope, and, accelerometer.

A more numerical concept of the Kalman filter can be seen in Figure 4.3 and is described below.

The Kalman filter approach has an initial state in Next state where it obtains the readings from the sensor. These readings are taken to the Prediction state where a prediction is made based on the uncertainty of the reading.

Afterwards, the prediction for the model is applied. The prediction is based on the pre-vious state’s derivative, for instance a position can be predicted with the initial position, velocity and, time in the initial position. Depending on how confident the model is of the next state, the covariance prediction will be smaller. The covariance matrix is updated with the multiplication of the previous covariance matrix and the state-transition matrix.

Then there is the calculation of how the predicted value differed from the measurement which is the distance between the predicted center point subtracted with the actual mea-surement. The result is the prediction error and now the Kalman gain is what is left for the correction.

The Kalman gain decides where on the line between the measured and predicted point the corrected value will be. The Kalman gain K lies between 0 and 1, and decides which of

(36)

Figure 4.3: Kalman filter

the two points is most believable. If the measurements are thought to be exact, we will have a small K and if we believe more in the prediction, the Kalman gain increases.

In the application, the filter observed the difference between the rotation in the quaternion that used the magnetometer and the one who did not. If there was a difference between them, the Kalman gain would decrease.

The Kalman gain in Figure 4.3 is 0.66 as the output estimate is closer to the predicted value, i.e. we believe more in the prediction. The Kalman gain is used to correct the state and obtain the output estimate. Then the next iteration of the loop will start, however, the next state does not do any readings any more.

When the filter was implemented, there was one large factor that made the filter not per-form as well as intended. If there was a disturbance of the magnetometer, the Kalman filter will still conform to the disturbance which will lead to a disturbance anyway even if it was slightly smaller. The resulting effect was similar to a low-pass filter, but achieved in a more hazardous way.

There was one specific implementation to the Kalman filter that achieved the still effect, which was to alter the Kalman gain variable in another way. The Kalman gain was reduced to 0 fast if the difference between readings did not match. If the difference between the previous and the actual reading of the rotation vector with no magnetometer was low, meaning that the smartphone was not meant to be moved. The real rotation vector should also mimic this behaviour, therefore a check if the difference is low can be made. If the check is true, the incoming rotation should use the previous rotation instead. However in order to reduce a big displacement, linear interpolation between the previous and the current rotation can be done where the previous rotation is highly prioritized.

The first implementation of the Kalman filter was stand alone and acted only on the rota-tion that involves the magnetometer. Afterwards when the Kalman filter was integrated with the rest of the application, the rotation quaternion that used the magnetometer was used until the horizon had been matched to the environment. Thereafter, the rotation with the Kalman filter was used to check if the rotation with only the gyroscope and accelerometer had a value close to its own.

The testing of the filter was divided into two smaller parts where the first was to check how the filter behaved when there were several orientations of the smartphone. The other part was to see how it handled disturbances, which in this case was a magnet that oscillated near the smartphone. Under the progress of this thesis, the built-in self calibration on the smartphone was done twice and was done to prevent the magnet disturbance test to confuse the sensors. The self calibration test can be arrived at by dialing *#0*# in the call keypad and then navigate to Sensor and later Gyro Selftest and Selftest under Magnetic Sensor.

(37)

4.4. Implementation

Low-pass filter

The low-pass filter was applied directly after the readings were received on the smartphone. The low-pass filter was implemented with a cut off frequency of 0.3 (α) which is a common tactic while using the orientation sensors in smartphones [12]. It is done to reduce the noise that the magnetometer cause. It was filtered directly as the current reading depends on the earlier and by doing it in Android Studio a better performance was expected.

4.4.3 Stabilizing the view with ARCore

ARCore is a platform for building AR applications on Android [3]. It tracks and understands the world with a local position and rotation based on images and sensors. ARCore was used to compare how its orientation reinforced that a heading was correct and how much the drift the sensors had. The drift of the accelerometer and gyroscope was compared against the ARCore in order to see if this could be a solution so that a new matching would not be needed as often.

ARCore was used together with the rotation of the heading that the earlier positions pro-vided. This implementation was justified to decrease drift in the camera that is an issue in gyroscopes [16].

This drift is caused by the integration of the signal from the gyroscope. Gyroscopes gives readings with a high frequency and outputs angular velocity. Therefore, in order to get the angle, you integrate. However, this integration causes noise to turn into drift from the gy-roscope. This can be further explained by integrating a cosine signal. The integration of a cosine signal will be a sine but divided by the frequency which results in a smooth signal.

Figure 4.4 illustrates how the ARCore can be used to improve the experience. If the local orientation can be perceived in combination with the heading of the speed vector, there can be another source that supply the orientation of the smartphone.

Figure 4.4: Understanding ARCore with its local perception

For the two coordinate systems to align there have to be some sort of trigger to do so. This thesis went with a button that should be used by the user with a push of the button when the smartphone was directed in the same direction that the boat was heading. This was tested by cycling with the smartphone in hand to get the results in the result section.

4.4.4 Integration of approaches

The approaches were first implemented by themselves in order to test them individually and to satisfy a structured approach which included iterative steps with sprints. The final scenario