Development of an Open-Source API for Augmented Reality for the Android SDK

(1)

Department of Science and Technology

Institutionen för teknik och naturvetenskap

Linköping University

Linköpings universitet

g

n

i

p

ö

k

r

o

N

4

7

1

0

6 n

e

d

e

w

S

,

g

n

i

p

ö

k

r

o

N

4

7

1

0

6 -E

S

Development of an Open-Source

API for Augmented Reality for

the Android SDK

Patrik Arthursson

Yin Fai Chan

(2)

LiU-ITN-TEK-A--11/005--SE

Development of an Open-Source

API for Augmented Reality for

the Android SDK

Examensarbete utfört i medieteknik

vid Tekniska högskolan vid

Linköpings universitet

Patrik Arthursson

Yin Fai Chan

Handledare Jimmy Jonasson

Handledare Mikael Karlsson

Examinator Matt Cooper

Norrköping 2011-02-17

(3)

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

(4)

Abstract

Augmented reality has in recent years become very popular in commercial areas such as entertainment and advertising. The fastest growing field right now is augmented reality for mobile devices. This is because the rapidly increasing performance can manage the heavy operations which only desktop computers used to handle. So far traditionally standard markers have been used, but in this paper we will take a look at a different technique, natural feature tracking. The final product will be an open source augmented reality API for Android freely available online.

(5)

1 Introduction 6

1.1 Combitech . . . 6

1.2 Purpose and goal . . . 6

1.3 Our task and expected outcome . . . 7

1.4 Timeplan . . . 7 2 Background 9 2.1 Augmented reality . . . 9 2.1.1 Definition . . . 9 2.1.2 Field of use . . . 10 2.1.2.1 Advertising . . . 10 2.1.2.2 Entertainment . . . 11

2.1.2.3 Information and navigation . . . 11

2.1.3 Tracking . . . 11

2.1.3.1 Marker Tracking . . . 12

2.1.3.2 Natural feature tracking . . . 12

2.2 Development environment . . . 14

2.2.1 Open source . . . 14

2.2.2 Android . . . 14

2.2.2.1 Android software development kit (SDK) . . . . 14

2.2.2.2 Dalvik virtual machine (Dalvik VM) . . . 15

2.2.2.3 Native development kit (NDK) . . . 15

2.2.3 Development tools and software . . . 15

2.2.3.1 Eclipse IDE . . . 15

2.2.3.2 CMake . . . 16

2.2.3.3 Simplified wrapper and interface generator (SWIG) 16 3 Augmented reality on Android smartphones 17 3.1 Hardware . . . 17

3.1.1 Restrictions of mobile devices . . . 18

3.1.1.1 Hardware restrictions . . . 18

3.1.1.2 OpenGL ES . . . 18

3.2 OpenCV . . . 19 2

(6)

CONTENTS 3

3.2.1 OpenCV for Android . . . 19

3.2.2 Camera calibration with OpenCV . . . 20

3.3 Related work . . . 20

3.3.1 Studierstube . . . 20

3.3.2 Qualcomm AR (QCAR) SDK . . . 21

3.3.3 AndAR, Android augmented reality . . . 21

3.3.4 PTAM, Parallel tracking and mapping . . . 21

3.3.5 Uniqueness of ARmsk . . . 21

4 Design of ARmsk 23 4.1 Implementation of AR . . . 23

4.1.1 Feature detection . . . 25

4.1.1.1 Features from accelerated segment test (FAST) 25 4.1.1.2 Speeded up robust features (SURF) . . . 26

4.1.1.3 Center surround extremas (CenSurE/STAR) . . 26

4.1.1.4 Detector of choice . . . 26 4.1.2 Feature description . . . 27 4.1.3 Descriptor matching . . . 28 4.1.4 Pose estimation . . . 29 4.1.4.1 Homography . . . 29 4.1.4.2 Perspective-n-point (PnP) . . . 30 4.1.5 3D rendering . . . 31 4.2 API design . . . 32 4.2.1 ARmsk API . . . 33

5 Building an application with ARmsk API 35 6 Marketing of ARmsk 38 6.1 Naming the project . . . 38

6.2 Promotional website . . . 38 6.2.1 Site structure . . . 39 6.2.2 Wordpress . . . 40 6.3 Version control . . . 40 6.4 Social networks . . . 40 7 Discussion 41 7.1 The task . . . 41 7.2 OpenCV . . . 42

7.2.1 OpenCV for Python . . . 42

7.2.2 Building OpenCV . . . 42 7.2.3 Outdated examples . . . 42 7.3 Performance . . . 42 7.3.1 JNI export . . . 43 8 Conclusion 44 8.1 Future work . . . 44

(7)

1.4.1 A time chart describing the estimated time consumption for

dif-ferent parts of the master thesis. . . 8

2.1.1 Reality-Virtuality Continuum proposed by Ronald Azuma. . . . 10

2.1.2 Tracking of markers. . . 12

2.1.3 Tracking of natural features. . . 13

3.1.1 A HTC Desire . . . 17

4.1.1 The AR processing pipeline for ARmsk. Numbers in every right corner of the steps are section numbers in this paper. . . 24

4.1.2 a) The marker. b) A sample frame from the camera. . . 25

4.1.3 Features detected on the (a) marker and in the (b) camera stream. 27 4.1.4 Matches between the marker and current frame. . . 29

4.1.5 a) The red circles are outliers removed by RANSAC. b) With the help of the homography matrix the orientation of the marker in the stream can be calculated. . . 30

4.1.6 The final rendering with the markers pose estimation. . . 32

4.2.1 The structure of the ARmsk API. . . 34

5.0.1 Building order of ARmsk. . . 36

5.0.2 Application work flow. . . 37

6.2.1 http://www.armsk.org, the promotional website for ARmsk. . . 39

(8)

List of Tables

3.1 Comparison between AR projects. . . 22 4.1 Comparison of detectors in terms of rotation-, scale- and

illumination-invariances. . . 26 4.2 Comparison of feature detectors in terms of repeatability and speed. 26

(9)

Introduction

In this paper we present an open source markerless Augmented Reality API for Android, named Augmented Reality Markerless Support Kit or ARmsk. This thesis was realised at Combitech AB in Linköping, Sweden.

1.1 Combitech

Combitech AB is an independent consulting company within the Saab concern, with services in engineering, environment and security. Combitech has recently started up a research department for visualization and simulation on handheld devices, named Reality Center. ARmsk is one of the first projects started under the new department.

1.2 Purpose and goal

At the time when the specification for this master thesis was written, there was not any open source toolkit or alternative that uses natural feature as markers. There was AndAR1 _{that did augmented reality for smartphones, but that was}

developed ontop of the ARToolKit2 _{which can not handle anything other than}

traditional black and white markers. What is more, before using these markers they need to be trained for the algorithms to recognize the marker. What people call markerless is natural feature recognition in pictures and video in real-time, which is our ultimate goal, performance-wise. However, robustness and stability will be prioritized over speed and performance for this thesis. Natural features, markers, robustness, stability and speed are terms that will be described later on.

What we would like accomplish is to make things easier for future developers to work with augmented reality on Android and present an API where the

1_{http://code.google.com/p/andar/} 2_{http://hitl.washington.edu/artoolkit/}

(10)

CHAPTER 1. INTRODUCTION 7 developer or user does not have be immersed into computer vision theory and coding to be able to produce augmented reality with natural feature tracking.

1.3 Our task and expected outcome

Due to the time constraint, producing a fully functional bug-free API is not possible, but what we want to deliver is at least an alpha version that is usable enough to help developers produce augmented reality applications for Android. While the API is not even at version 1.0, there will be bugs, all the features will not be there, and performance will be lacking.

We wish for ARmsk to continue to be developed and become more stable, feature-packed and eventually reach an official release. All to contribute to the augmented reality on Android community. For this to happen, ARmsk needs to be put out there in the open source community in an appealing and promoting package. A website where news, pictures, videos and updates will be published. Additionally, with the first version of the API the idea is to include an An-droid application that can be used for demonstrating the API. This application will be updated alongside the API as new features come along.

1.4 Timeplan

For this project we had an initial timeplan in which we specified the workflow and time consumption. Parts of the project will be carried out simultaneously. • Research, 2 weeks: Study Android SDK, Augmented reality on Android

and structure of open-source projects.

• Implementation, 15 weeks: Includes development of API, application and community.

– API/Application, 12 out of 15 weeks: This part was divided into two elements: design and development.

– Community, 3 out of 15 weeks: Create a webpage and publish project-related material online.

• Thesis & project diary - 3 weeks: Write the thesis for the last three weeks, and update the project diary during the project.

(11)

Figure 1.4.1: A time chart describing the estimated time consumption for dif-ferent parts of the master thesis.

(12)

Chapter 2 Background

2.1 Augmented reality

2.1.1 Definition

Augmented Reality (AR) is a term for applications that add virtual informa-tion to real-life physical environments. There are in general two types of AR; projection based systems where transparent information get projected in front of the viewer and camera based system where the environment is captured with a camera, processed and augmented. This paper focuses on and discusses im-plementation of the latter on embedded systems.

There is no official definition of the term ‘Augmented Reality’ but it is common, in the field of AR, to use Ronald Azuma’s definition[1]. It states that an AR system must have the following characteristics:

1. Combines real and virtual 2. Interactive in real time 3. Registered in 3D

The first point states that there needs to be a mixture of both virtual reality and the reality itself. Almost all movies nowadays use virtual information, or computer-generated imagery (CGI), and blend it with real-life footage for special effects. This is however not regarded as AR, since movies are not interactive as the second point states. The third and last point states that it have to be registered in 3D, as in some live sport broadcasts on TV. AR however excludes e.g. weather forecasts because only 2D planar effects are used.

(13)

Figure 2.1.1: Reality-Virtuality Continuum proposed by Ronald Azuma. In Figure 2 seen on the far left side is the Real Environment, on the other side is the Virtual Environment. Between these extremes there is Mixed Reality (MR), which includes both AR and Augmented Virtuality (AV). AV is the opposite to AR, it merges real information into a virtual environment.

The term Augmented Reality is believed to have been coined in 1990 by Thomas Caudell, an employee at Boeing at that time. In 1992 L.B. Rosen-berg developed one of the first functioning AR systems, called VIRTUAL FIX-TURES, at the U.S. Air Force Armstrong Labs, and demonstrates benefits on human performance. [2, 3]

2.1.2 Field of use

AR has formerly been difficult to apply for practical purposes on mobile devices due to AR-technology being very computationally heavy. But with current tech-nology, with faster and more powerful phones and smarter algorithm solutions, AR can be applied in most areas.

2.1.2.1 Advertising

Advertising is undoubtedly the fastest growing field within AR. Since the mid 00’s marketers started to promote products via interactive AR applications. The applications were mainly run on stationary computers with a recording camera. For example, at the 2008 LA Auto Show, Nissan unveiled the concept vehicle Cube and presented visitors with a brochure which, when held against a camera, showed several versions of the vehicle1_{. Another example is Burger King which}

launched a Flash based web advertising campaign in 2009, where people could hold up a dollar bill in front of the camera, and then watch the campaign offers without installing any software2 _beforehand.

The new and faster smartphones opens up for AR-advertisement on handheld devices. An example of this is an AR application which LG made available for the release of their new Android phone LG Ally3_{. The application is one of the}

first smartphone-based applications which use markerless tracking.

1_{http://nissan.t-immersion.com/}

2_{http://cargocollective.com/jeffteicher#191861/BK-AR-Banner}

3

(14)

CHAPTER 2. BACKGROUND 11 2.1.2.2 Entertainment

AR can be found in the entertainment sector too. There are various applications that use the AR technology, but there are not many commercialized products at the moment. However, in the gaming industry games are being released that incorporates AR, such as Ghostwire4_{for Nintendo DSi in 2010 and for Nintendo}

3DS there are already a couple of releases scheduled. There have also been a few applications developed during research processes, like [4][5][6] which include a racing, a tennis and a train game.

2.1.2.3 Information and navigation

Head-up displays (HUD) was one of the first fields where AR got fully inte-grated, it was early used to display transparent information in front of jet pilots eyes, with full interactivity, including eye pointing. HUD is also used in new generations of cars which projects useful information onto the driver’s wind-shield. Another type of interactive AR that has in recent years become popular is AR for smartphones. A great example is LayAR5_{, which is an AR information}

application for both Android and iPhone OS. LayAR uses the mobile phone’s camera, compass, GPS and accelerometer to identify the user’s location, orien-tation and field of view. It will then retrieve data based on those geographical coordinates, and then overlay that data over the camera view. The data over-layed can also be customized to show specific data to the user’s liking, such as nearby gas stations or restaurants.

2.1.3 Tracking

One thing that all AR applications must use is some kind of tracking technique to determine the pose and position of the camera relative to the virtual 3D information. There are in general two different methods to achieve good and stable results for AR and it is tracking with markers and tracking of natural features.

4_{http://www.ghostwiregame.com/} 5_{http://www.layar.com/}

(15)

Figure 2.1.2: Tracking of markers. 2.1.3.1 Marker Tracking

A classic marker is in general a black and white 2D image that is placed where the tracking is supposed to take place. This high-contrast type of marker is easy to find and track with basic image processing and does not require many operations. The camera position is determined by tracking the outer black border and the orientation by tracking the inner black figure. This technique is implemented and optimized in the commonly used ARtoolkit library. Tracking of markers is however a bit outdated nowadays, since you actually need a printed black and white marker to produce AR. Now there are novel techniques that use regular images, like logos and symbols, as tracking objects instead of markers. This will be described in the next section. Figure 36_{shows AR that uses marker}

tracking.

2.1.3.2 Natural feature tracking

Features is a rather general concept in computer vision and what to be called a feature in an image is highly dependent on what is relevant information or a point of interest for the proceeding operations. It can be edges, ridges, blobs or corners and vary from a single point to a region of points. The features relevant for this project are most often found in the form of isolated points, continuous curves or connected regions in the image. Using a good algorithm for feature detection is highly important since these points are most likely used

(16)

CHAPTER 2. BACKGROUND 13 as a starting point, and following algorithms will only perform as good as the feature detector will. Good features are features with high repeatability, which in short means that the same point can be found in another image. Repeatability is an important concept and will be discussed more thoroughly in section 4.1.1. There are many different types of feature detectors that performs differently for different kinds of features. Below are a couple of common feature detectors and their classifications listed:

• Sobel (edges)

• Harris & Stephens / Plessey (edges & corners) • FAST (corners)

• Difference of Gaussian (corners & blobs) • Determinant of Hessian (corners & blobs) • MSER (blobs)

Natural feature tracking, or markerless tracking uses key features detected in every frame from the camera and matches them with pre-specified features. This type of tracking is computationally heavy and has therefore not been used for embedded systems, until recent years. Figure 47 _{shows an example of how}

natural feature tracking can be used, in this case the fingertips are the tracking targets.

Figure 2.1.3: Tracking of natural features.

(17)

2.2 Development environment

2.2.1 Open source

ARmsk is an open source software and is published under the GNU GPLv38

license. Open source means that anyone have the freedom to read, use, modify and redistribute software source material. An open source software project can be published under many different licences which grants recipients the freedoms otherwise protected by copyright laws. This allows the developer to alter code until it suits the needs of the developer. There are many advantages with open source since any developer may contribute to a project. At the times when many developers are joined together software development and progress are faciliated and many issues and bugs are found faster. As long as the communication works open source software projects can be developed very quickly and efficiently. There are several sites and tools that can manage and support an open source software project, including maintaining communication and version control.

2.2.2 Android

The Android platform is an open source operating system developed and re-leased by Google. It is mostly found in mobile devices e.g. smartphones and tablets and comes with a lot of complete features and functionality such as multi-tasking, mail, web and media management. Android is based on the Linux 2.6 kernel which is used as the hardware abstraction layer because it has a proven driver model and also a lot of existing drivers. It provides memory management, process management, a security model, networking and a lot of core operating system infrastructure that are robust and proven to work over time. The Android platform for mobile devices intends to be a complete stack that includes everything from operating system through to middle ware and up through to applications. Developing for Android is free and has allowed Android to grow very fast since its release, both when it comes to new great applications and when it comes to smartphone manufacturers choosing Android as operating system for their products. For developers Google has provided tools that makes it easier to develop for Android, the Android SDK and the Android NDK. 2.2.2.1 Android software development kit (SDK)

The Android SDK is a set of tools that are meant to aid developers when they are developing Android applications and the Android platform itself. The 1.0 version of the SDK9_{was released in september 2008 and has since been updated}

together with each release of the Android operating system. The version 1.0 was the first stable release of the Android platform and allowed developers to prepare applications for commercially available smartphones. The SDK is available for Windows, Mac OS X, and Linux and includes tools as well as

8_{http://www.gnu.org/licenses/gpl.html} 9_{http://developer.android.com/sdk/}

(18)

CHAPTER 2. BACKGROUND 15 an Android smartphone emulator to run, test and debug applications. The emulator is standalone and can be run in order to give users and developers a chance to interact with the operating system on Android handsets. The emulator is commanded through something called the Android Debug Bridge (ADB) via the terminal or command line. Android includes a debug monitor and runs a Dalvik virtual machine to run and compile Java, the main development language for Android applications.

2.2.2.2 Dalvik virtual machine (Dalvik VM)

The Dalvik Virtual Machine is developed especially for Android to meet the needs of running in an embedded environment where battery, memory and CPU is limited. The Dalvik VM uses registers as storage instead of stacks and runs DEX files which are byte-code that are the result of converting, at runtime, .Class and .JAR files. So when these files are converted to .dex they become a much more efficient byte-code that can run very well on small processors. They use memory very efficiently and the data structures are designed to be shared across processes whenever possible. The Dalvik VM uses a highly optimized byte-code interpreter. The end result is that it is possible to have multiple instances of the Dalvik VM running on a device at the same time, each in each of several processes, efficiently.

2.2.2.3 Native development kit (NDK)

Java is the only supported programming language for creating Android ap-plications. However, it is possible to combine Java with C/C++, the native language for Android smartphones, through JNI. The provided NDK10_contains

the completed toolchain for cross compilation. It is based on the GNU Com-piler Collection and GNU make. With those tools developers are able to create shared libraries in the Executable and Linkable Format (ELF) used by Linux. Currently there are only a few libraries that are officially supported. Among those are libc, libm, libz, liblog and the OpenGL ES 1.1 libraries.

2.2.3 Development tools and software

2.2.3.1 Eclipse IDE

The Eclipse IDE for Java developers11 _{is used for developing and building}

An-droid projects which is also the recommendation by the AnAn-droid development team. The Android specific functionality is enabled through the Android Devel-opment Tools (ADT) plug-in. It allows the creation of a mobile user interface via a visual editor. Managing and editing the C/C++ files and libraries as well as makefiles for CMake and SWIG interface files is also done in Eclipse. Throughout the project, Eclipse Galileo was used.

10_{http://developer.android.com/sdk/ndk/} 11_{http://www.eclipse.org}

(19)

2.2.3.2 CMake

Building native parts of ARmsk is done in the terminal or command line with CMake12_{, in this project version 2.8-2 was used. CMake reads makefiles that}

specify building details and generates executable files and in this case libraries from source code. Once built CMake figures out automatically which of the files it needs to update, based on which source files that have changed. It also automatically determines the proper order for updating files, in case one non-source file depends on another non-source file. This means that the whole program does not need to be recompiled by changing a few source files. CMake is not limited to any specific language.

2.2.3.3 Simplified wrapper and interface generator (SWIG)

SWIG13 _{is a software development tool that connects programs or libraries}

written in C and C++ with different high-level programming languages. SWIG is used in this project to connect the native library of ARmsk written in C/C++ with Java for Android. The functions to be wrapped are specified with a SWIG interface file, and needs to be added into the makefile to be included in the build. SWIG wrappers are generated during the build and are linked to the native files. Then it is possible to call native classes and native functions from Java or the language the wrappers are specified for, in the same syntax as if the functions were written in that very same language. The version used for this project is SWIG 2.0.0.

12_{http://www.cmake.org} 13_{http://www.swig.org}

(20)

Chapter 3 Augmented reality on

Android smartphones

3.1 Hardware

A HTC Desire is used, figure 51_{, with Android 2.2 for testing during the}

devel-opment period. The Desire is equipped with a 3.7 inch AMOLED display with 480 x 800 pixels resolution. With a 1.0 GHz Snapdragon processor, 576 MB of RAM and a 5.0 megapixel color camera with video recording. This device is connected to a Macbook Pro via USB and is working faster than the emulator and also far more convenient by being a portable camera.

Figure 3.1.1: A HTC Desire

1_{http://www.htc.com/www/product/desire/overview.html}

(21)

3.1.1 Restrictions of mobile devices

3.1.1.1 Hardware restrictions

There is a fundamental difference between a desktop workstation and a mobile handheld device and that is the hardware. In mobile devices where size is a vital factor there is just no way to have the same amount of processing power. Though in later years the hardware of mobile devices has been developed in a rapid speed and gradually cathing up on desktop computers. The phone used for testing has 512MB ROM (which has separate partition reserved for the OS) and the 1.0 GHz in processing power is still way behind a modern computer.

Most mobile phone CPUs do not have a unit that solely calculates floating-points, a Floating Point Unit (FPU), in contrast to desktop CPUs. This forces the compiler to emulate the floating points calculations, rather than calculate directly on hardware, which are approximately 40 times slower than their cor-responding integer calculation[7]. According to [25] a well-written smartphone application runs around 10 times slower than on a normal computer. While most CPUs for mobile phones do not have parallel units that execute processes there is the option to use multi-threading or interleaving for operation acceleration.

Due to the hardware limitations there are many techniques and operations that are totally infeasible to compute on current generation mobile phones. These might need to be rewritten, approached in a different way or simplified to suit the available resources.

3.1.1.2 OpenGL ES

Android supports OpenGL ES2 _{1.0 which is a 3D graphics library and is a}

stripped down version of the OpenGL 3D API for desktops. ES is short for embedded systems and is specifically tailored for mobile devices. OpenGL ES 1.0 corresponds to version 1.3 of the original OpenGL library. That it is stripped down does not really mean that it lacks functionality, rather it includes only the most used functions to minimize redundancy. In OpenGL ES 1.x version fixed-point and fixed-point profiles are included, however, support for floating-points are only applied on API level which means that the pipeline is merely defined to be fixed-point based. In OpenGL ES 2.0 there will be support for only floating-points. The biggest difference between these are the total exclusion of glBegin and glEnd in the API for embedded systems, that OpenGL ES does not manage single vertices, but instead you have to send references to arrays of vertices.

Another restriction is that all textures must be square, with the size being a power of two. This restriction not only applies to the embedded version of OpenGL, but also to the standard edition until version 1.3, as stated by the API documentation [8, p. 1103]. However this applies only to the function glTexIm-age2D, which is used to initially transfer the texture image data to the OpenGL

(22)

CHAPTER 3. AUGMENTED REALITY ON ANDROID SMARTPHONES19 driver. When updating the texture through the function glTexSubImage2D1 you may provide images of any size after all.

Other characteristics of the embedded edition of OpenGL is not relevant for this paper. They can be found in [8, p. 739] and [9].

3.2 OpenCV

OpenCV3 _{is an open source computer vision library filled with programming}

functions that mainly aim at real time processing. The library lies under the open source BSD4 _{license and is free to use for both academic and}

commer-cial use. It was developed by Intel and is now supported by Willow Garage. OpenCV was officially launched in 1999 as a research initiative of Intel to ad-vance processor intense applications. Early goals of OpenCV was to improve computer vision research and provide open and optimized code for this. It was also to spread knowledge of computer vision by providing a stable infrastructure that could be further built on and developed. All this for free and be available for commercial use, that do not require a license to be open or free themselves. OpenCV reached its first version 1.0 release in 2006 and is now a wide and large library that contains more than 500 optimized functions.

The code is written in C and thus portable to a selected amount of other platforms, such as DSP’s. To make the code more approachable and to reach a wider audience wrappers have been developed for the more popular program-ming languages such as Java, Python, C# and Ruby. In OpenCV version 2.0, released in 2009, C++ interface was implemented therein. It is backward com-patible with C, however, all the new functions and algorithms are written for the new interface. We chose to use the OpenCV libraries primarily because most of the functionality needed for AR is already implemented. Not only that, there was also a certain OpenCV port for Android readily available.

3.2.1 OpenCV for Android

There are a couple of projects out there that ports parts of OpenCV to An-droid. However recently, an Android port has officially been integrated into the OpenCV library and can now be found on official OpenCV wiki5_{. This port}

was formerly known as android-opencv and is the port that has been used as foundation for ARmsk. More about this and exactly how OpenCV is used and the port for Android will be discussed in detail in section 4.2.1.

3_{http://opencv.willowgarage.com/}

4_{http://www.opensource.com/licenses/bsd-license/} 5_{http://opencv.willowgarage.com/wiki/Android/}

(23)

3.2.2 Camera calibration with OpenCV

OpenCV has built-in methods for camera calibration, where it is possible to dynamically estimate the intrinsic camera parameters and lens distortion for a camera in use. The intrinsic parameters encompass focal length, image format and principal point. These are expressed with a 3x3 camera matrix, A.

A =   αx γ u0 0 αy v0 0 0 1  

The parameters αx= f · mxand αy = f · my represent focal length in pixels,

where mx and my are scale factors. γ represents the skew coefficient between

the x and the y axis. u0 and v0 are the focal points, which would be in the

center of the image.

The camera matrix can be estimated by taking multiple pictures from dif-ferent angles of a calibration rig, an object with known geometry and easily detectable features. A common calibration rig is a black and white chessboard-like pattern since it has very distinctive edges and corners.

3.3 Related work

3.3.1 Studierstube

Studierstube6 _{is a software framework for the development of AR and VR}

ap-plications. This framework was first invented to develop the worldwide first collaborative AR application. Later the focus changed to support for mobile AR applications. Studierstube produces AR with natural feature tracking and does that with a close to realtime framerate and can also detect and track mul-tiple targets[20]. As the tracked targets increases to 5-6 targets including 3D rendering the framerate drops slightly, but the performance is still smooth and very impressive. The framework is developed and written from scratch and em-bodies research of the last decade from the Graz University of Technology (TU Graz). Not long ago one of the main researchers behind Studierstube joined Qualcomm and has been working on a free AR SDK for Android, named Qual-comm AR. Studierstube for PC is freely available for download under GPL, while Studierstube ES for mobile phones is available commercially. The variant for mobile phones builds on the scene graph library Coin3D and features the device management framework OpenTracker. Studiestube has during their re-search presented different useful techniques to achieve natural feature tracking on embedded devices in real time. One of them is PhonySIFT[24], a modified version of the commonly used SIFT[14] algorithm.

(24)

CHAPTER 3. AUGMENTED REALITY ON ANDROID SMARTPHONES21

3.3.2 Qualcomm AR (QCAR) SDK

The QCAR SDK7_{is developed by Qualcomm in 2010 and is free for developers}

to use in their AR applications. It is available for download on their site after a registration, but it is closed source. QCAR allow developers to upload pictures, that will be used as tracking targets, online. These get processed and optimized to be used together with the SDK. The uploaded target gets rated depending on number of features and how well the features are distributed over the image. Then the processed resources can be downloaded in a .dat file and added to their AR project. The applications based on QCAR performs really well and is able to deliver AR in real time, despite doing markerless tracking. The SDK currently supports Android 2.1 operating system and above. Qualcomm pro-vides developers with tools and resources to create Android AR applications but does not provide a channel for commercial distribution. AR apps may be distributed in the Android Marketplace subject to the AR SDK license terms instead.

3.3.3 AndAR, Android augmented reality

AndAR is an Open Source AR framework for Android and is released under the GNU General Public License. The framework is based on the open ARToolkit library, and was developed by Tobias Domhan as his masters thesis. Since it uses ARToolkit, it is limited to tracking with markers. AndAR might be used as a foundation for AR projects that is capable of displaying 3D models on black and white AR markers.

3.3.4 PTAM, Parallel tracking and mapping

PTAM8_{is a camera tracking system for AR. It requires no fiducial markers,}

pre-made maps, known templates, or inertial sensors. It is an implementation of the method described in the paper [10]. This type of tracking for AR is called exten-sible tracking which basically is a system that tracks in scenes without any prior map with the use of a calibrated hand-held camera. In a previously unknown scene without any known objects or initialisation target the system builds a 3D map of the environment. This is done by connecting features from two images which have been translated horisontally or vertically. Once a rudimentary map is built, it is used to insert virtual objects into the scene, and these should be accurately registered to real objects in the environment.

3.3.5 Uniqueness of ARmsk

ARmsk is very closely related with all the projects in section 3.3 and share a lot of their functionality. However, there are a few aspects that makes ARmsk unique:

7_{http://developer.qualcomm.com/ar} 8_{http://www.robots.ox.ac.uk/~gk/PTAM/}

(25)

Free Open source Track natural features Local marker training Studierstube ES No No Yes No

QCAR Yes No Yes No

AndAR Yes Yes No No

ARmsk Yes Yes Yes Yes

Table 3.1: Comparison between AR projects.

PTAM is not included in the comparison since it produces AR with a completely different tracking technique, it does not track markers, but

information in the environment instead. Local marker training means that the developer can define the marker, or tracking target, on the local device during run time. ARmsk is unique in the sense that it is a completely free open source project, it provides a solution for tracking of natural features and can train markers locally.

(26)

Chapter 4 Design of ARmsk

4.1 Implementation of AR

In the following section the approach of implementing augmented reality for mobile devices using different known techniques is described. Robustness is prioritized over speed therefore methods that are too computationally heavy for a modern mobile device will be used. Too computationally heavy meaning that it is not likely for the mobile device to produce real-time framerate. Five major steps make up the pipeline of our AR implementation, as shown in figure 6.

(27)

Figure 4.1.1: The AR processing pipeline for ARmsk. Numbers in every right corner of the steps are section numbers in this paper.

For each run, as inputs for the pipeline are two images. First a static marker image that in the initialization get processed through step 1 and 2. In step 3, this image will be used as the search target in the iterative matching process. Secondly we have an incoming camera frame that will run through the pipeline and get matched to the marker, pose estimated and then augmented with 3D information which is done in steps 1 to 5. This is executed for every frame.

(28)

CHAPTER 4. DESIGN OF ARMSK 25

(a) (b)

Figure 4.1.2: a) The marker. b) A sample frame from the camera.

4.1.1 Feature detection

The first step in the process of AR is to detect useful features in the image. As there are a couple of feature detection algorithms present it is necessary to find one that suits the requirements for the project at hand. The implementation needs to foremost be robust, being fast is desirable, but comes in second. By looking at [18] it is possible to see how the detectors perform on a mobile device. However, all the detectors have parameters that can be varied to affect the result, still none of these are documented. Therefore it is only possible to take these numbers as an approximate measurement for how well the methods work for a mobile device. A more thorough tweak of all the detectors is needed to muster the highest repeatability and speed possible to see which detector that actually suits our problem best. Repeatability is a measure of how well a feature detector performs in finding the same feature in another image. It is determined by a combination of three invariances; rotation, scale and illumination. All of the following detectors are found in the OpenCV library.

4.1.1.1 Features from accelerated segment test (FAST)

FAST[11, 12] is an extremely fast corner detector with high illuminance-invariance which is highly desirable for mobile devices. Unfortunately FAST lacks rotation-and scale-invariance, resulting in low repeatability. There are known problems with noise for FAST, which can be reduced by setting an appropriate threshold for the detector. Though the threshold needs to be set differently for each case.

(29)

4.1.1.2 Speeded up robust features (SURF)

The SURF[13] detector, or Fast-Hessian detector, is based on the determinant of the Hessian matrix. It detects blob-like structures at locations where the determinant is a maximum. The SURF-detector is robust, provides good re-peatability and is relatively fast with the use of integral images[22]. SURF uses, just like SIFT, an image pyramid as scale space and sub-sample pixels in between levels of the pyramid. This will provide a relative good repeatability when scale differs. Also SURF can handle a great deal of luminance variance which makes it stable even when the ambient lighting is uneven. SURF is rotation-invariant. 4.1.1.3 Center surround extremas (CenSurE/STAR)

The CenSurE[15] feature detector, also known as STAR, computes features at the extrema of the center-surround filters over multiple scales. Unlike SIFT and SURF, STAR uses the original image resolution for each scale. This makes the STAR detector more exact in scale-invariance and raises the chances of feature repeatability. The features are an approximation to the scale-space Laplacian of Gaussian and can almost be computed in real time using integral images. The STAR detector is sensitive for changes in illumination, but not for rotations. 4.1.1.4 Detector of choice

Detector Rotation-invariance Scale-invariance Illumination-invariance

FAST Low Low High

SURF Medium Medium High

STAR Medium High Low

Table 4.1: Comparison of detectors in terms of rotation-, scale- and illumination-invariances.

Detector Repeatability Speed FAST Low High SURF High Low STAR High Medium

Table 4.2: Comparison of feature detectors in terms of repeatability and speed. Rounded into table 2 and table 3 is a comparison of the discussed feature detec-tors, fetched from [18], and STAR is the highest rated among the contenders.

FAST is tremendeously fast and finds a huge amount of features. This was our initial selection of feature detector, but there are repeatability issues with this detector and in the later stages of our implementation these issues causes matching instability. Although being the detector that finds most feature points,

(30)

CHAPTER 4. DESIGN OF ARMSK 27 and doing that very fast, it picked up a considerable amount of noise as features even after tweaking the threshold parameter.

SURF feature detector has very good performance in terms of repeatability, but despite the fact that SURF is faster than SIFT, it is the slowest feature detector of all the ones tested. It is very stable, but there are other detectors that are equally stable, but faster, such as the STAR feature detector. This is our detection of choice when it comes to stability.

STAR is very good and fast but lacks in illumination-invariance. This can be troublesome when trying to do AR where the markers are subjected to strong lighting and where reflections are prone to occur, for instance in an outside en-vironment. In spite of that, after examining the performance, STAR is regarded as the top choice.

(a) (b)

Figure 4.1.3: Features detected on the (a) marker and in the (b) camera stream.

4.1.2 Feature description

The second step is to create image descriptors for the image. Image descrip-tors describe the visual information in sections of the image, and in our case, around each detected feature. The descriptors are used during the matching process to find corresponding features in other images. Since features that can differ in pose, angle and lighting conditions needs to be found it is necessary to use an algorithm that generate descriptors that are scale-, rotation- and illumination-invariant. In OpenCV there are only two different descriptor algo-rithms implemented; SIFT and SURF. The newer SURF descriptor outperforms

(31)

SIFT in almost every aspect[13] and is an obvious choice for us to implement. The SURF descriptor extracts 20x20 pixel patches for each feature and cal-culate the gradient grid for each patch. Descriptors can then be calcal-culated using the gradient grid with the following procedure:

• Each 20x20 grid is divided into subregions of 5x5 gradients, resulting in a total of 16 subregions.

• For each subregion, a four-dimensional descriptor vector is calculated, where every set of four element consists of:

– The absolute sum of differences (SAD) in x-direction. – The SAD in y-direction.

– The sum of magnitudes in x-direction. – The sum of magnitudes in y-direction.

This means that each descriptor, with 16 subregions and respectively assiociated vector, has a total length of 64 elements. There is also an extended version of the SURF descriptor with 128 elements, which additionally calculates the sum of differences in positive and negative x- and y-directions, resulting in an eight-dimensional vector per subregion, making the descriptor 128 elements wide. The extended version provide higher accuracy but due to higher dimensionality the matching is slower[28]. Since robustness is prioritized, the extended version is implemented.

4.1.3 Descriptor matching

After successfully extracting the image descriptors from the images we need to match these with the ones extracted from the marker in the initialization to find corresponding features. One way of doing this is by using kd-trees[16], which is a space-partitioning data structure. The kd-tree is a binary tree in which every node is a k-dimensional point, or in our case a descriptor. By randomizing the kd-tree structure we improve the effectiveness of the representation in higher order dimensions[23].

A set of four randomized kd-trees, also called a forest, was created with the descriptors from the marker image. Then the forest is searched through using the k-nearest neighbor algorithm (k-NN)[26] for each descriptor in the camera stream image. By constructing a forest the precision of the nearest neighbors is improved[27]. If the search is limited to 64 leafs per tree computational power is saved while getting acceptable matches. The k-NN search algorithm is already implemented in OpenCV and the results from the search are pairs of nearest neighbors of descriptors. The distances between nearest neighbors are checked to see if they are small enough to actually be neighbors and thus be labeled as a match. If the distances are too large then the points are most likely not a match.

(32)

Figure 4.1.4: Matches between the marker and current frame.

4.1.4 Pose estimation

4.1.4.1 Homography

Our matching implementation does not not provide perfect matches. Instead, it generates a few outliers, i. e. points that are matched wrong. These outliers are not be included in the homography calculation. To remove these outliers an iterative method called Random Sample Consensus (RANSAC)[17] is used to filter the features that do not fit the model. This does not entirely remove all outliers, but most of them.

When the matches between the incoming image and the marker mostly con-sists of inliers the perspective transformation, also known as the homography, can be calculated. In other words the homography describes how the marker has changed its orientation when shown in the camera. This means that for any given point in the marker image, it is possible to find its position in the camera stream image by transforming it with the calculated homography matrix.

(33)

(a) (b)

Figure 4.1.5: a) The red circles are outliers removed by RANSAC. b) With the help of the homography matrix the orientation of the marker in the stream can be calculated.

4.1.4.2 Perspective-n-point (PnP)

Next a square plane is created in 3D space and oriented in the center of the marker. The sides of the plane is equal to the height of the marker. The reason a square is used is because it simplifies the calculations for OpenGL transfor-mations later on. The x and y values are transformed with the homography matrix to find the planes’ position in the camera stream image. We have the camera model: z   u v 1  = A [R T ]     xw yw zw 1    

where [u v 1]> _{represents a 2D point in pixel coordinates and [x}

wywzw1]>

a 3D point in world coordinates. A is the camera matrix (see section 3.2.2), R is the rotation and T is the translation. R and T are the extrinsic parameters which denote the coordinate system transformations from 3D world coordinates to 3D camera coordinates. Since the 3D to 2D point correspondence are known, together with the camera matrix, the extrinsic parameters can be found which describes the change of position for the 3D plane. This is called solving the PnP problem and the algorithm for this is also implemented in OpenCV. The result from solving the problem is the transformation matrix.

(34)

4.1.5 3D rendering

An augmented reality application is a combination of 2D (camera stream) and 3D graphics. In an Android application there is no way for them to be combined even though Android has APIs for both of them. Only by drawing the camera stream images as an OpenGL texture with the size of the screen makes it possible to work 3D onto 2D. In the end only the 3D API is used practically because the API of the camera is not designed to just provide a raw stream of byte arrays. The camera API always has to be connected to a surface on which it can directly draw the stream. However, it is not possible to have the OpenGL surface as a preview surface for the camera as they would cause conflicts with both threads (camera and OpenGL) trying to access the the surface simultaneously.

Furthermore the preview surface must be visible on the screen, otherwise the preview callback will not be invoked. The way around this is to put an OpenGL surface on top of the preview surface. The preview surface is there and visible, if it was not covered by the OpenGL surface. The camera stream is drawn on a surface that is not visible to the user and additionally on an OpenGL texture. Compatibility is more important, than avoiding this overhead, because there is no other way to circumvent this API design decision.

So lastly, after solving the PnP problem the resulting 4x4 transformation matrix is sent back to Java through JNI as a float array. The reason for returning a float array is that the matrix loading function in OpenGL only accepts matrices in form of an one-dimensional array. The array is used to set the orientation of the 3D object which is rendered on a OpenGL layer on top of the camera stream.

(35)

Figure 4.1.6: The final rendering with the markers pose estimation.

4.2 API design

API design is hard, but there is an abundance of information about theories and guidelines for it. The ARmsk API is not going to be big, so most of the guidelines are suited quite well. The primary guidelines for the API design is:

The API must be, in prioritized order: 1. Absolutely correct, no errors anywhere 2. Easy to use

3. Easy to learn 4. Fast enough 5. Small enough

This can be summed up into correctness, simplicity and efficiency. It is impor-tant to design to evolve and prepare for future work as well. An API should be minimalistic, i. e. it should contain functionalities that are really needed and used. When an API is small there are less things to keep track of and it

(36)

CHAPTER 4. DESIGN OF ARMSK 33 becomes easier to learn. At the same time, this API should include main AR features/capabilities, so that developers can get started easily. It was decided that the API should include marker management, marker detection and pose estimation. As for packages, a small project with less than 30 classes should be housed in a single package and should one completely rewrite the API it is a good idea to choose entirely different package name.

4.2.1 ARmsk API

ARmsk is designed to be an API for markerless AR applications for Android. The idea is to produce AR with only the Java API and let all the native calls be done without ever being shown. Figure 12 shows a visual of the ARmsk architechture.

ARmsk is largely built upon an OpenCV port for Android as foundation, the port is called android-opencv (AO), earlier mentioned in section 3.2.1. This has now been announced as the official OpenCV port for Android. In sample applications provided by AO, SWIG and CMake is introduced for connecting native C++ with Java and Android. AO takes care of the camera as well as image acquisition and turning the images into workable formats for OpenCV such as converting them into the correct colorspace, into grayscale and into OpenGL 2D textures. The port provides an image pool that could work like a buffer, but can for now only house a single image. Every frame is placed in this image pool to be converted into the right workable format, and can then be accessed later for calculations, operations and drawings. Lastly the frame is passed on to be used as an OpenGL texture to be applied on the view, whether they have been operated on or not. This is how the camera preview works and is also the reason why the preview has slight lag compared to an ordinary camera as it actually is an OpenGL 3D layer ontop of the preview surface. This is the solution to the constraint mentioned in section 4.1.5. The preview runs on a native processor which works much like a thread, continously running in the background, capturing frames. This processor mentioned is a native class of AO and has also been implemented into ARmsk for running the processing pipeline.

An Android application is using parts of AO through ARmsk to access the camera. Through JNI the ARmsk library is called to get the transformation matrix needed to set the orientation and render 3D objects with OpenGL ES in Java.

(37)

(38)

Chapter 5 Building an application with

ARmsk API

ARmsk needs to be built (compiled) before anything else, as it mostly contains OpenCV functions written in C++. CMake is needed to read the enclosed makefiles and for the building commands to work. Building ARmsk means building two times, first to build the OpenCV library and then the ARmsk native processing files. Building the OpenCV library takes quite some time because the whole library is admitted into ARmsk, but fortunately it is required to do it only once. Once built the ARmsk shared library is ready to be used, in Eclipse ARmsk is it imported as a library project. Then it is time to build the native processing files of ARmsk. It is done in the same manner, but it is important to have SWIG installed since there are SWIG interface files placed together with the ARmsk native files, one for each C++ file. These interface files are included in the building instructions of the makefile. Compiled native files are placed in a JNI folder while the SWIG wrappers of those files are automatically generated and placed in a folder inside the JNI folder. Lastly the build is imported as an Android application project in Eclipse and linked to ARmsk/OpenCV as the library.

(39)

Figure 5.0.1: Building order of ARmsk.

ARmsk is a markerless AR API for Android and will provide an object oriented Java API. The cornerstones of the API is providing native functionality for camera preview, marker management, marker matching and access to a transformation matrix which is used to set the orientation and position of a 3D rendering. User interface and graphics are up to each developer building an application ontop of ARmsk as there are no such things included. However, the API does include everything else needed to start up AR, all of it is built inside the library as native code, hidden for the ordinary developer. As long as an Android project has everything included and is linked to the ARmsk library correctly it is easy to call on every native function of the API from Java. For ARmsk and AR to work on Android there is need to put on at least two layers on the application surface, a camera layer to get frames and an OpenGL layer for OpenCV operations and rendering. A good idea is to split up the second layer to a operation layer and a rendering layer for better structure, three layers altogether. The camera and texturing layers are included as native classes provided by ARmsk, and needs to be instantiated and added in Java as views. The third rendering layer can simply be a Java instantiated OpenGL ES view. With the layers set up all the preparations for AR are done and what is left is to prepare a marker and start the ARmsk process.

(40)

CHAPTER 5. BUILDING AN APPLICATION WITH ARMSK API 37

Figure 5.0.2: Application work flow.

The marker management’s main funcitonality is to set a marker and in ARmsk there are two ways to do this. First to set a marker with the use of the camera, to take a picture and call a native marker-setting function. The second is to choose an already existing picture and send it to ARmsk to set it as marker. Setting a marker basically means finding a marker’s features, extracting feature descriptors and save them for matching and tracking when processing AR.

Marker matching as well as updating the transformation matrix are functions in ARmsk main class and is initiated on a native processor which needs to be implemented into a class in Java. Running them in a processor will constantly update the matching and transformation matrix values. The matrix is loaded into OpenGL to transform a 3D-object rendered on the rendering layer.

(41)

Marketing of ARmsk

6.1 Naming the project

A good project name can really help a project grow and adopt faster than it should have with a less good name. Karl Fogels [21, p. 21] definition of a what a ‘good’ open source project name should:

• Give some idea what the project does, or at least is related in an obvious way.

• Be easy to remember. • Unique.

• If possible, be available as a domain name in the .com, .net and the .org top-level domains.

With this in mind we started out brainstorming and came up with the name ‘ARmsk’. ARmsk is short for Augmented Reality Markerless Support Kit and it meets all the above criterias quite well. It includes and augments the letters ‘AR’ to relate the project to the Augmented Reality field. Since the name is only five letters long it should be easy enough to remember and there are no other projects with a similar name (that we could find).

6.2 Promotional website

The ARmsk promotional web site’s main function is to present a clear and welcoming overview of the project, and to bind together the vital tools for online open source development, such as the version control, bug tracker, discussion forums etc. The web site also provides all the necessary documentation and examples for newcomers to get started using and further develop ARmsk.

(42)

CHAPTER 6. MARKETING OF ARMSK 39

Figure 6.2.1: http://www.armsk.org, the promotional website for ARmsk.

6.2.1 Site structure

The short description of the site contains seven sections:

• ARmsk: The presentation page, where visitors arrive at when they visit armsk.org, .net or .com. Presents a brief summary and mission statement of the project.

• Blog: Project-blog, updates users and developers of future happenings, releases or other project related information.

• Features: Lists the application and API features, and displays how far the project has come in the implementation work.

• Documentation: An archive of documents containing; examples, step-by-step-guides and other project related documentation.

(43)

• Community: Describes how developers can contribute to the project and send them to the different resources of OSS development (such as SVN, Issue tracker etc.).

• Download: Presents how and where to download the source code and other project related items.

• Credits: Lists all the people and companies that had helped and con-tributed to the project during the development period.

6.2.2 Wordpress

The promotional page uses Wordpress version 3, an open source Content Man-agement System (CMS), which is powered by PHP and MySQL. Wordpress is often used as a blog publishing application and is widely used by major sites all over the world. Since Wordpress provides all the basic services we need it is a great, fast and simple online solution.

6.3 Version control

Googlecode was chosen for hosting the ARmsk open source software project. This is widely used for open source software projects and since it is so widespread it is a good idea to start growing the project here. Hosting is free and the source code is easily accessible through either subversion checkout or browsing the web for anyone interested. Adding informative wiki’s, mailing lists and discussion sections for developers to bring up issues and bugs are also options.

6.4 Social networks

To promote ARmsk and increase the number of visitors to the project site affiliations to social networks, such as Facebook and Twitter has been made. These are good ways of keeping users and developers updated about happenings, releases and other project related news. Linkage to these social networks are integrated into the promotional website.

(44)

Chapter 7 Discussion

The first problem with the project was defining and constraining the task con-sidering the time limitation for the thesis. Design and implementation takes time, and doing it for something entirely new makes estimating time hard. The project was constrained with a set amount of functionality for the API, enough to be called an alpha release.

7.1 The task

Understanding how AR is produced was the main task of our project, but any breakdown of the technology, even though it is popular, is really scarce. The most comprehensive and proven to be working description of AR found is pre-sented in [24]. The processing pipeline of ARmsk is based on the prepre-sented model. This model is also quite new and has been further developed since in terms of speed and stability, making the model even more suitable for ARmsk. However, the methods used in [24] ire not implemented in OpenCV and it was time to decide the course of action. Time was spent searching and reading about the different methods, among many other, that might be used and implemented for AR. After weighing in the time constraint and effort needed to create a stable implementation of the methods mentioned in [24], it was decided that using OpenCV, with already heavily optimized functions, was a better choice. The decision was therefore to implement the methods available in OpenCV even though they are not quite as suitable for embedded devices. Even by having the basic structure ready, it was fairly hard to know which function in OpenCV that would be useful. Most helpful was to read the documentation and searching and reading examples with OpenCV implemented. It would often give a push into the right direction. The documentation for over the 500 functions is categorized after purpose and locating a function that does almost the desired operation might lead to finding one that actually does.

(45)

7.2 OpenCV

7.2.1 OpenCV for Python

OpenCV was very good with its vast library that fortunately contained almost every function needed to create AR, but it had its downsides. The version of OpenCV used was of the C++ interface and was very hard to debug. There were problems understanding the errors throughout the whole project. Although probably slower, OpenCV is also provided in Python which is supported by SWIG, that might be easier to work with than C++. Python has better code readability and smaller source resulting in faster development speed. The reason ARmsk used the C++ version though is because the AO project had good samples written in C++ and it was the language that had been worked with more prior to the project. Additionally Python is slower than C++ in most cases and consumes more RAM, which makes it not as preferable for embedded systems.

7.2.2 Building OpenCV

Initially there was trouble getting the OpenCV libraries built together with android-opencv. For it to build both SWIG and CMake were needed to be installed on the system. Then came understanding how to write makefiles that includes both Android and SWIG building specifications. Studying the samples provided in AO proved to be very useful and the housed makefiles served as templates for the ARmsk makefiles as well.

7.2.3 Outdated examples

Many of the existing examples of OpenCV functions that are available online are often based on an the older version of OpenCV. That made implemention kind of cumbersome. Due to the big changes in version 2.0 the examples were not of much use with the new interface. They simply did not match the functionality that was sought after and trying to combine new and old OpenCV versions did not work up to expectations. Especially the conversion between new classes and older outdated classes.

7.3 Performance

The alpha version of the ARmsk API works, although with performance issues and with a few bugs that will make the demonstration application crash occa-sionally. The reason for the crashes is not entirely clear, but it happens mostly either right when the AR process is started or when the marker is totally out of sight, so it might have to do with invalid feature values. The performance issue is a more substantial one and is caused by different factors in every step of the AR process. So far, most of the functions used in ARmsk are from OpenCV and they are heavily optimized. To get even more performance one should switch

(46)

CHAPTER 7. DISCUSSION 43 them out entirely or modify them to be optimal for its task. Also, there are new techniques presented and implemented into the OpenCV library for every release which are faster and ever more stable, which could be used. There are many great methods that are not included in OpenCV that could substitute the ones used by ARmsk and even perform better, that is not possible to include in the scope of this project as the time limitation would not allow it.

7.3.1 JNI export

The goal of the open source project is to make things easier for the future developer, i.e. they do not need to bother about digging into native coding. Every part of ARmsk is coded and put together into a native library, except for the last part, the rendering. This part could have also been incorporated into the native part, but the decision was made to export this to Java. This, so that the ordinary developer can solely focus on the Java programming for Android SDK and create his or her own 3D rendering. This actually became a problem, since there were trouble exporting float arrays from native to Java through the JNI. In the native code, each element in the transformation array is fetched one by one from Java. Going to Java from native and vice versa creates a performance overhead, so going back and forth is not desirable at all. It would be interesting to see the performance gain by sending over the whole transformation array instead of the 16 elements one by one.

(47)

Conclusion

This paper has investigated Augmented Reality possibilities on Android smart-phones as well as the technology behind AR. The Android platform has been introduced and its capabilities has been presented. Furthermore the restrictions encountered on mobile devices have been documented.

An AR API, called ARmsk, has been developed during the work of this paper and an approach for natural feature tracking that allows robust pose estimation from planar targets on smartphones has been presented. The API is built on top of the commonly used computer vision library OpenCV and is one of few APIs that provide markerless AR on Android. Additionally a demonstration application, which is using ARmsk, capable of rendering basic 3D models on top of a chosen image marker was developed. This application shows what ARmsk can do and that today’s smartphones are capable of being used for AR purposes.

The API is released under the GNU GPL and is publicly available on the project’s website. This allows developers to use the API and software as a foundation of their very own markerless AR applications. As we write this report the ARmsk project is by no means over. We hope for the project to continue to grow and become a much more polished and powerful API. The main goal of the project is to provide a free open source augmented reality API that performs in real-time. Tthe user has to input a minimal amount of information to have an AR application up and running on Android. The implementation is far from optimized and the future work will be to improve the API’s performance and stability.

8.1 Future work

The following is a list of what future work will include:

• To achieve a more stable and higher framerate a threshold for the detector that is dynamically set can be implemented. This to limit the number of feature points taken into calculation. In the current implementation