PanCam Bundle Adjustment

(1)

MASTER'S THESIS

PanCam Bundle Adjustment

Cenek Albl

Master of Science

Space Engineering - Space Master

Luleå University of Technology

(2)

(3)

FACULTY OF ELECTRICAL ENGINEERING

MASTER’S THESIS

ˇ

Cenˇ

ek Albl

PanCam Bundle Adjustment

Department of Cybernetics

Supervisor: Ing. Tom´aˇs Pajdla, Ph.D.

(4)

(5)

This thesis investigated the method of Bundle Adjustment with special focus on con-strained systems, such as panoramic cameras. SBA algorithm from Lourakis and Argy-ros was implemented in Matlab and its functionality was validated. A new algorithm was proposed and implemented based the original one, incorporating the constraints suited for PanCam-like cameras. This algorithm was evaluated on both synthetic and real datasets. The performance was thoroughly analyzed and compared to the original method in terms of accuracy of the reconstruction. For synthetic datasets, significant improvements have been observed and expressed quantitatively. Particular improve-ments in terms of visual appearance have been observed on real datasets. Results were elaborated and possibilities for further improvement have been proposed.

(6)

(7)

Tato práce se zabývá metodou vyrovnán´ı svazku paprsk˚u, se zvláˇstn´ım zamˇeˇren´ım na

omezen´ı specifická pro panoramatické kamery. Algoritmus vyvinutý dvojic´ı Lourakis a

Argyros byl implementov´an v Matlabu a byla ovˇeˇrena jeho funkˇcnost. Na z´akladˇe

to-hoto algoritmu byl navrˇzen a implementován nový, schopný vynutit v pr˚ubˇehu procesu

taková omezen´ı parametr˚u, která jsou vhodná pro kamery PanCam a jim podobné typy.

Algoritmus byl d˚ukladnˇe otestován jak na umˇele vytvoˇrených, tak na reálných datech.

Sledována byla pˇredevˇs´ım kvalita výsledné rekonstrukce, porovnávána kvantitativnˇe

v pˇr´ıpadˇe simulovaných dat a kvalitativnˇe u dat reálných. U syntetických test˚u bylo

prokázáno znaˇcné zlepˇsen´ı v pˇresnosti rekonstrukce. Na reálných datech byl v urˇcitých

pˇr´ıpadech pozorován vizuálnˇe lepˇs´ı výsledek. Vˇsechny výsledky byly zhodnoceny a

(8)

(9)

Prohlaˇsuji, ˇze jsem svou diplomovou práci vypracoval samostatne a pouˇzil jsem pouze podklady ( literaturu, projekty, SW atd.) uvedené v priloˇzeném seznamu.

(10)

(11)

I want to thank my supervisor Ing. Tom´aˇs Pajdla, Ph.D. for his countless advices, tremendous support and patient lead.

I would also like to thank Ing. Michal Havlena for the assistance with experiments, RNDr. Michal Janˇcoˇsek for providing the dense 3D reconstruction and Gerhard Paar for introducing me into the ProViScout project.

Special thanks go to my family, my girlfriend and my friends who supported me through my studies.

(12)

(13)

1. Introduction 1

1.1. Previous work . . . 2

1.2. Thesis outline . . . 3

2. Capturing the world 4 2.1. Projective cameras . . . 4 2.2. PanCam . . . 6 2.2.1. MER Pancam . . . 7 2.2.2. ExoMars PanCam . . . 8 3. Bundle Adjustment 10 3.1. Parametrization . . . 10

3.1.1. Different parametrizations and their drawbacks . . . 10

3.2. Bundle adjustment as a non-linear least squares problem . . . 10

3.3. Newton’s method . . . 11

3.4. Gauss-Newton method . . . 12

3.5. Levenberg-Marquardt . . . 14

3.6. Gradient descent . . . 15

3.7. Conjugate gradient . . . 15

3.8. Lourakis‘ adaptation of Sparse Bundle Adjustment . . . 17

4. PanCam bundle adjustment 21 4.1. Requirements . . . 21

4.2. Parameters and projection functions . . . 21

4.2.1. P - using the camera matrix . . . 22

4.2.2. KQT - Calibration, rotation, translation . . . 23

4.2.3. PanCam - spinning around . . . 24

4.3. Sharing parameters . . . 26

4.4. Method of alternation . . . 28

5. Implementation 29 5.1. Obtaining the parameters . . . 29

5.2. Normalizing the parameters . . . 31

5.3. Projection functions and their jacobians . . . 32

5.4. Parameter mask . . . 33

5.5. The algorithm . . . 34

6. Experimental results 37 6.1. Synthetic data . . . 37

6.1.1. Validating the method . . . 38

6.1.2. One cluster . . . 40

P parametrization . . . 40

KQT parametrization . . . 41

(14)

PanCam parametrization 2 . . . 47 Comparison . . . 50 6.1.3. Three clusters . . . 53 P parametrization . . . 56 KQT parametrization . . . 56 PanCam parametrization 1 . . . 59 PanCam parametrization 2 . . . 59 PanCam parametrization 3 . . . 64 Comparison . . . 67 6.1.4. Summary . . . 69 6.2. Real data . . . 70

6.2.1. The CMP building experiment . . . 71

Dataset 1 . . . 71

Dataset 2 . . . 74

Dataset 3 . . . 74

Dataset 4 . . . 81

6.2.2. The Endurance Crater on Mars . . . 81

Single panorama . . . 86

Two panoramas . . . 92

6.2.3. Summary . . . 92

7. Conclusions 98

A. Contents of the enclosed CD 100

Bibliography 101

List of Figures 103

(15)

3D 3-dimensional

BA Bundle Adjustment

CCD Charge-coupled device

CG Conjugate Gradient

CMP Center for Machine Perception

ESA European Space Agency

FOV Field Of View

HRC High Resolution Camera

LM Levenberg-Marquardt

MAP Maximum A-Posteriory

MER Mars Exploration Rover

MLE Maximum Likelihood Estimator

NASA National Aeronautics and Space Administration

SBA Sparse Bundle Adjustment

SfM Structure from Motion

TOF Time-of-flight

(16)

1. Introduction

Perceiving the world around us is no longer a domain dedicated solely to eyes of living beings. Machines have stepped into this realm by creating their own models of environ-ment in order to navigate themselves, recognize objects or simply capture the reality for us. Unlike human vision, computer vision lacks robustness and the ability to easily derive correct object properties based on experience, however it dominates in terms of accuracy when it comes to determining exact measurements such as distances and positions. 3D reconstruction is one of the disciplines in Computer Vision which focuses on delivering precise information about shapes and appearances of real objects.

One of the widely used approaches in 3D reconstruction is an image-based recon-struction which involves capturing multiple images of an object by cameras and then using geometric relations to determine positions of individual 3D points belonging to this object. In this scenario, the cameras (one camera which captures images at differ-ent positions is actually considered a set of individual cameras), image projections and reconstructed 3D points together form a system connected by geometrical equations [6].

This is where Bundle Adjustment [22] steps in as a post-processing method aiming to improve the precision of the system mentioned above. That includes parameters of cameras (position, orientation, calibration) and coordinates of reconstructed 3D points. Taking the captured images as the “true state”, the cameras and reconstructed points are adjusted until an optimal solution is found. This refining process creates 3D models of significantly higher quality, but is not the only application of a bundler. Bundling also plays its role in navigation [19] since provides better estimations of the position of cameras. That can be used to correct or improve the position taken by GPS or other sensors. Small scale bundling can even serve as a visual odometry mechanism [21]. The structure of the algorithms used in modern Bundle Adjustment allows it to be very flexible and incorporate many problem-specific parameters.

For many decades, Bundle Adjustment remains the dominant structure refinement technique for real applications. It was mainly the computational complexity issue that prevented the spread of its deployment in the past. Bundle adjustment comprises a large scale system of non-linear equations and it can be in general solved using one of the many existing non-linear least squares algorithms. It is, however, very important to choose an algorithm that exploits the problem structure and its sparsity. With the use of better suited algorithms and with rapidly increasing power of computers, we are now able to use Bundle Adjustment more frequently and in applications which were impossible before.

In general, the solution to the adjustment problem is to iteratively approximate the non-linear cost function by a local linear or quadratic model around the current parame-ter estimate and find a suitable change in parameparame-ters that decrease the cost. Depending on whether we choose a first or second order approximation, the approach will vary both in terms of implementation and results. Second order Newton type methods [14] use the Hessian and its factorization to find the Newton step which minimizes the error. These methods seek the exact solution to the so called normal equations and they have quadratic asymptotic convergence, thus achieving the optimal state quickly. This is of course dependent on many factors e.g. the accuracy of the approximation.

(17)

Calculat-ing the Hessian is highly computationally demandCalculat-ing and also difficult to implement, therefore approximations to the Hessian are often used. One of the most successfully used methods is the Levenberg-Marquardt, which incorporates active damping strategy that controls the size of the step taken in each iteration.

Normal equations appearing in the process can contain extremely large matrices and to solve them directly without any optimization would take excessive amount of time. Instead, special features of the matrices (independence of parameters, sparsity, block structure) are exploited and techniques such as separating the camera and 3D feature parameters, using the Schur Complement and finding a reduced camera system are utilized. Despite all the effort, factorization of the Hessian remains the main problem in terms of speed.

An alternative to the second order methods can be found in the first order methods of which mainly Conjugate Gradient method has proven to be interesting. The conjugate gradient method aims to avoid the complex computation of Hessian and its factorization and offers an approximate solution to the normal equations, therefore belonging to inexact Newton methods. It can be argued, that the exact solution is not as important since the Normal equations are themselves an approximate to the real cost function. Since the second order terms are not used, the resulting steps are only linear in descent, thus converging slower than the second order methods. The improvement in speed of the step computation due to avoiding expensive matrix by matrix multiplication and factorization is weighted by the amount of iterations needed to successfully converge. Thus the performance difference between second order methods is strongly dependent on particular application.

1.1. Previous work

This work will be based on the state-of-the-art Bundle Adjustment methods. I will elaborate one of the widely used Sparse Bundle Adjustment algorithms [13] introduced by Lourakis and Argyros, which is freely available and it is used in many applications. This algorithm implements the Levenberg-Marquardt algorithm and makes excellent use of the sparsity of the matrix system when seeking the Gauss-Newton step. This software package is available on the Internet and it is provided with documentation on the implementation as well. Alternative approach, using the conjugate gradient method is presented in [11], where the two proposed algorithms demonstrate improvements es-pecially in terms of speed. Improvements are achieved using several adjustments, such as the use of block-sparse preconditioned conjugate gradient and embedded point itera-tions. The conjugate gradient bundle adjustment is elaborated in [4], where experiments have been carried out on different structures of the camera system. A thorough inves-tigation on the topic of bundle adjustment has been presented in [22] and some basic principles along with the mathematical background can be found in [6].

PanCam [12] is a panoramic camera designed to provide stereo and 3D imagery of the Mars surface, searching for traces of microorganisms and giving an insight into the geological characteristics of the planet. It is composed of two Wide Angle Cameras (WAC) used for panoramic imaging and one High Resolution Camera (HRC) for high resolution color imaging. These are mounted on an optical bench on top of the ExoMars rover mast assembly. This structure imposes several features on the optical system. The cameras all share the same baseline, angle and also pan and tilt. Moreover, when rotating while the rover remains stationary, the camera centers at points of taking the images lie on a circle centered around the rovers mast. This system is being developed

(18)

1.2. Thesis outline by ESA and it is intended to be employed in the future ExoMars missions in 2018. A similar device [8], called Pancam, has already been operating on the Mars Exploration Rovers since its landing on the surface of Mars in 2004. As in previous case, two cameras forming a stereo pair are mounted on a rotating shaft. The images obtained by this device have been released by NASA for scientific purposes and they will serve as a source of real data for experiments.

1.2. Thesis outline

My first task will be to comprehend one of the investigated Bundle Adjustment meth-ods in detail and to implement it in Matlab, preserving the original functionality. To represent and compare the results of the bundling process a simple Matlab interface to display systems with points and cameras shall be developed. After the first imple-mentation is completed, a new algorithm shall be proposed and implemented. The new method shall be able to address specific properties and constrains put on a special camera types, in particular the PanCam camera system mounted on an ExoMars rover. The goal of my thesis is to implement these constraints into the bundle adjustment algorithm in order to provide more accurate results. The bundler should be also able to adapt to a ladybug camera type and easily adjusted for another ones. In first sec-tion I will introduce the basics of camera projecsec-tions and describe the properties of PanCam-like cameras. Second section will focus on the details of bundle adjustment methods, their mathematical background and specific properties. Third section will explain the requirements for the new bundle adjustment method and particular cam-era parametrizations which arise from those requirements. Third section will describe the newly proposed algorithm, first stating the requirements arising from the PanCam structure and then explaining the approach which was chosen to achieve these require-ments. Fourth section contains the details of the implementation and describes the algorithm itself. This adjusted method will be compared with the original one and results will be shown in section five. Section six concludes important results of this work.

(19)

It is the prime task of computer vision, to somehow capture the world around. The perception of surrounding environment is the basis for 3D reconstruction, mobile robot navigation, mapping and other activities. There are many ways to gather visual in-formation about the objects around, such as laser scanners, sonars or range 3D TOF cameras [1]. From all of the available sensors, the closest to our view of the world is the classic camera. Originally developed to simulate the human eye and capture the pre-cious moments for us, cameras have evolved into many forms with different properties, serving various purposes. In this work, the widely used classical camera sensor will be elaborated, since it is most relevant to a topic of Bundle Adjustment.

2.1. Projective cameras

Projective camera maps the world points with coordinates X into image coordinates x according to equation

x= PX (2.1)

where matrix P is the 3 × 4 projective camera matrix with rank 3 and 11 degrees of freedom[6], X is a homogeneous 4 × 1 vector representing a 3D world point and

x is a 2D image point represented by a 3 × 1 vector. The matrix P carries a lot of

interesting information about different camera properties. To show them, it can be further decomposed into

P= KR[I| − C] = K[R| − RC] (2.2)

where K is the calibration matrix,R is a rotation matrix that defines the orientation of

the camera and vector C = [X, Y, Z]T _{comprises the world coordinates of the camera}

center. Each element of K corresponds with some physical property of the camera. For example, considering the pinhole camera model as depicted in Figure (2.2), one can derive simple mapping between the 3D points and resulting 2D image point

(X, Y, Z)T 7→ (f X/Z, f Y /Z)T (2.3)

which comes from the similarity of triangles. Using homogeneous coordinates, the mapping described by equation (2.1) takes the form

    X Y Z 1     7→   f X f Y Z   =   f 0 0 0 0 f 0 0 0 0 1 0       X Y Z 1     (2.4)

which can be expressed as [K|0]X. In this case the first two diagonal elements of K are the focal length of the camera. If the resulting coordinates are to be expressed in pixels, then f in (2.4) must be multiplied by a factor m, which is the number of pixels per unit distance in image coordinates. However, today’s CCD cameras sometimes don’t have exactly squared pixels, but rather rectangular. This nature can be incorporated into K

(20)

2.1. Projective cameras

Figure 2.1. Projection of a 3D cube onto a 2D image plane by projective camera.

by using two factors mx and my for number of pixels in unit distance in the directions

x and y respectively. The mapping changes to     X Y Z 1     7→   mxf X myf Y Z   =   mxf 0 0 0 0 myf 0 0 0 0 1 0       X Y Z 1     (2.5)

In (2.4) it was assumed, that the principal point (see Figure 2.2) is located in the corner of the image, at position (0, 0). This is mostly not true, because cameras have their principal axis going through the approximate center of the image. That can be easily

compensated for by introducing x0 = mxpx and y0 = mypy as the true principal point

coordinates in terms of pixel dimensions such that     X Y Z 1     7→   fxX + x0 fyY + y0 Z   =   fx 0 x0 0 0 fy y0 0 0 0 1 0       X Y Z 1     (2.6)

where fx = mxf and fy = myf . That nearly completes the calibration matrix for most

cameras used in practice. The last term, which is missing, is the skew factor s. It is non-zero only if the x and y axes of the image are not perpendicular, for example due to poor design of the CCD array. That is usually not the case and s is in general zero. It can, however, be present when taking a picture of a picture, or scanning a picture [6]. K=   fx s x0 0 fy y0 0 0 1   (2.7)

The camera matrix can be computed from the required camera parameters and vice versa. For an arbitrary camera matrix, one can find all camera parameters by first

(21)

a) f Y Z C P fY/Z X x image plane b)

Figure 2.2. a) Visualizing the camera projection using the pinhole camera model. Point X is projected to a point x in the image plane[23]. b) Section along the X-Y plane. The y coordinate of the projected point is clearly defined by the similarity of triangles CXZ and CPX. P denotes the principal point, which is the intersection of the camera principal axis with the image plane. Distance between C and P is the focal length.

separating

P= [M| − MC] (2.8)

and using M−1

to obtain C. Matrices K and P can be found from M using the RQ decomposition. For the purpose of this work, it is important to complete the derivation,

how the euclidean pixels coordinates [x, y]T _{in resulting image are obtained from the}

world point coordinates [X, Z, Z]T_{. Consider the following decomposition of the camera}

matrix P P=   P1 P2 P3   (2.9)

Now using equation (2.1), one will obtain the homogeneous coordinates

x=   P1 P2 P3   X =   P1X P2X P3X   (2.10)

To transform these into euclidean coordinates, it must be first normalized according to

[ωx, ωy, ω]T _{7→ [x/ω, y/ω, 1]}T _{and the last coordinate discarded[6]. The final projection}

function that projects the homogeneous 4-vector X to the image coordinate 2-vector x then takes following form

f (X) = PX = P1X/P3X P2X/P3X (2.11)

2.2. PanCam

This section will explain the concept of Panoramatic Camera systems, which have been developed to work on planetary rovers, during their exploration missions. As for now, two devices which fall in this category are known - one is being developed for the future ESA’s ExoMars rover mission and the other has already been operational on the NASA’s Mars Exploration Rover (MER), shown in Figure 2.3. Both devices are different in more than the name - ESA’s PanCam versus NASA’s Pancam, but in general they share most of the properties.

(22)

2.2. PanCam

Figure 2.3. Mars Exploration Rover (MER) in a computer-generated model [8]

30 cm

1° _1°

Baseline

Toe-in angle

Figure 2.4. Visualization of the MER Pancam setup - the cameras are aligned towards each other by a Toe-in angle of 1◦_{. Note that the picture is not on scale.}

2.2.1. MER Pancam

Pancam is a device developed for NASA’s MER, which was launched in 2003 and has been successfully operating ever since. The purpose of the device is twofold. From the scientific side, Pancam should capture the morphology, topography and geologic con-text on the MER landing site and also provide color images of the surface in order to determine the mineralogic, photometric and physical properties of the surface. Second objective is to provide a mission support, including sun-finding, object resolving, loca-tion of interesting sampling targets and last but not least - providing stereo imagery suitable for generating digital terrain model to help long-term guidance and refinement of the rover’s traverses. To achieve that, Pancam is composed of two CCD cameras that share the same baseline and are mounted 30 cm apart on an optical bench, form-ing a stereoscopic imagform-ing system. The optical bench is attached to a rotatform-ing mast

in a height of 1.54 m above the surface, allowing 360◦

rotation and ±90◦

elevation of

the cameras. The toe-in angle (see Figure 2.4) between the cameras is 1◦

(23)

Table 2.1. Optical parameters of the MER Pancam cameras. Values taken from [8].

Parameter Value

Focal length [mm] 43

Field of view [◦

] 16

CCD imaging area [pix] 1024 × 1024

Operating wavelengths 400 - 1100 nm

an appropriate stereo parallax [8]. Optical parameters of both imaging devices can be found in Table (2.1).

Rotating the mast in successive steps while taking pictures provides a panoramic image of the surrounding, such as the one in Figure (2.5). The advantage is, that many images taken from MER’s Pancam are publicly available, along with some data about the rover positions, therefore serving as a valuable material for experiments.

Figure 2.5. Panorama of ’Santa Maria’ Crater taken by the MER Opportunity on Dec. 18 and

19, 2010, during its 7th anniversary of landing on Mars. The image spans is 225◦ _{and the}

colors are adjusted to emphasize the differences among materials in rocks and soils [15].

2.2.2. ExoMars PanCam

Baseline 50 mm

PanCam Interface Unit PanCam Optical Bench

Wide Angle stereo Cameras High Resolution Camera

Figure 2.6. ExoMars PanCam optical bench[5]

(24)

2.2. PanCam

Table 2.2. Optical parameters of the ExoMars PanCam instruments. Values taken from [10, 5]

Parameter WAC HRC

Focal length [mm] 22 180

Field of view [◦

] 34 5

CCD imaging area [pix] 1024 × 1024 1024 × 1024

wings of ESA. It is supposed to be carried by the ExoMars Rover mission, scheduled to launch in 2018. The instrument is still in the development stage, however, practical tests have already been carried out in simulated Martian conditions [5]. The tasks of PanCam during the mission, quite similar to the NASA’s, are to

• assist with locating the landing site with respect to the surrounding • provide the geological context of the explored sites

• support the selection of interesting scientific targets

• study the properties of the atmosphere and other variable phenomena

In contrast to NASA’ version, ESA’s PanCam comprises of two Wide Angle Cameras

(WAC) with 34◦

FOV and one High Resolution Camera (HRC) with 5◦

FOV. All three cameras are mounted on an optical bench (see Figure 2.6), which attached to a turning rover mast in the height of 1.8 m. Optical parameters of the system can be found in Table (2.2).

(25)

Bundle adjustment is the problem of refining a visual reconstruction to produce jointly optimal 3D structure and viewing parameter estimates. Jointly means, that both struc-ture and viewing parameters (camera pose and/or calibration) are adjusted simultane-ously. The optimality is ensured by minimizing some cost function, which quantifies the error between measured projections and projections predicted by our model equations. [22]

3.1. Parametrization

At the input of a Bundle Adjustment process, we are given a set of parameters and measurements. The parameters describe properties of 3D objects and cameras in a variety of ways. 3D objects can be described either by coordinates of simple features such as points, lines, curves, planes or surface patches, but also more complicated representations can be used including geometric constraints, dynamics, etc. In the same manner, many different camera models involving various parameters, e.g. perspective projective, affine, orthographic or more exotic models, such as push-broom and rational polynomial can be used. This ability to incorporate many different models, equations and constraints which predict values of some known measurements is one of the greatest strengths of Bundle Adjustment methods and makes it very interesting in computer vision . Also main purpose of this work is to implement different parametrization models in order to produce results better suited to specific types of reconstructed scenes.

3.1.1. Different parametrizations and their drawbacks

The freedom of choice of parametrization is redeemed by the complexity of effects it has on the result of the optimization. Since the parameter vector lies in a high dimensional, non-linear space with possible non-linear constraints, complications such as singularities, internal constraints and unwanted degrees of freedom may arise. As an example we can take 3D rotation with its multiple representations. Euler angles can appear in many variants of rotation sequences [16] which have different behavior and can cause numerical problems if one does not avoid their singularities. The use of quaternions solves those problems, but we must ensure that they are normalized after

each adjustment to satisfy the condition kqk2 = 1. Other options are also feasible,

such as Rodriguez formula, local Euler angles and other well behaved small rotation approximations.

3.2. Bundle adjustment as a non-linear least squares problem

Once a proper parametrization is chosen, corresponding functional relation: b

x= f (p) (3.1)

that maps the parameter vector p to a vector of estimated projectionsx_b can be found.

(26)

3.3. Newton’s method that minimize the reprojection error, described by a cost function. Some literature mentions the use of Maximum likelihood estimator (MLE) or its Bayesian regularization Maximum a posteriori (MAP) estimation. Traditional method of choice is the Non-linear least squares, with the cost function defined as

ǫ = 1

2r

T_r₌ 1

2krk

2 _(3.2)

where r = x −_bx = x−f (p) is the residual, i.e. the distance between the measured

projections x and the estimated projectionsx. The factor_b 1₂ is present due to ease of

further computations and symbol k.k denotes the L2 norm.

In some cases, where we have information about covariance of the measurements we

can use it in a form of weight matrix W such that W−1

is the covariance matrix of vector x. Cost function then takes following form

ǫ = 1 2r T_Wr₌ 1 2(x−f (p)) T W(x−f (p)) = kx−f (p)k2_Σ (3.3)

where k.k_Σ is the Mahalanobis distance. Both (3.2) and (3.3) are non-linear functions

and the task of Bundle adjustment is to use a suitable numerical algorithm to find

parameters _bp that minimize the function. Most available algorithms choose an initial

parameter estimate p0 and iteratively seek updates to this vector to find the minimal

solution. In order to move in the parameter space to find a better estimate, it is common practice to approximate the cost function by some low-order function or to find its gradient [6]. The most widely used methods are presented in following subsections.

3.3. Newton’s method

An arbitrary non-linear function can be approximated around a vector of parameters

xby a second order Taylor series

f (x+δx) = f (x) + gTδx+1

2δx

T_Hδx _(3.4)

Where g = _dxdf(x0) is the gradient vector and H = d

2

f

dx2(x0) is the Hessian matrix.

The δx that minimizes this function can be found by differentiating with respect to δx and then setting the derivative equal to zero

gT+Hδx = 0 (3.5)

to find the stationary point. Rearranging this equation to the form

Hδx = −gT (3.6)

yields a system of linear equations, which can be solved to obtain δx also called the New-ton step. Iterating and repeatedly finding the NewNew-ton step until the global minimum is found is called the Newton method. It is a second order method and its asymptotic convergence is quadratic. If the second order approximation describes the function well around the current estimate, which for smooth cost functions happens near the mini-mum, then the decrease in error is approximately squared at each iteration. The rapid convergence is the main advantage of second order methods, however, they have several disadvantages which prohibit their common use in Bundle Adjustment applications. Calculating of second derivatives is very complicated for complex cost functions, such

(27)

0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 −10 0 10 20 30 40 50 60 70 P0 P1 δP p ε Cost function

Second order approximation

Figure 3.1. Newton’s method in practice: cost function is approximated at the initial param-eter estimate P0 by second order Taylor expansion and a suitable δP is found by locating its

minimum. The example cost function is ǫ = 1

2ky − f (x)k 2

where f (x) = sin(px).

as camera projection functions. Not only the computational demands to create the Hessian matrix are excessive, but also seeking analytical form of the second derivatives can be very difficult. Another problem is the amount of time needed to solve the n × n

system of equations, which is (n3) and thus prohibitive for large systems. This can

be fortunately avoided by carefully examining the structure of the Hessian matrix and utilizing its sparseness.

3.4. Gauss-Newton method

An extension to the Newton’s method is the Gauss-Newton method, which avoids the direct computation of second order derivatives by introducing an approximation to the Hessian matrix. Its principle is used in almost every Bundle Adjustment problem today. The Hessian is defined as

H= d

2_f

dx2 (3.7)

which for the cost function ǫ = 1₂krk2 becomes

H= d 2_ǫ dp2 = d dp rdr dp = dr dp T dr dp + r d2_r dp2 (3.8)

where _dpdr = J. As a core of this method it is assumed that the Hessian is nearly

linear and thus the second order term can be dropped to obtain

H= dr dp T dr dp = J T_J _(3.9)

as the approximation to the Hessian of ǫ. Since the gradient of ǫ is obviously

g = dǫ

dp = r

dr

dp = −J

(28)

3.4. Gauss-Newton method 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 −10 0 10 20 30 40 50 60 70 P0 P1 δP p ε Cost function

Second order approximation Gauss−Newton approximation

Figure 3.2. Comparisson between a second order Newton and Gauss-Newton iteration. If the function is locally nearly quadratic (the Hessian is nearly linear) or the prediction error r is very small, then Gauss-Newton’s approximation performs rather well and with significantly lower computational demands.

Substituting to eq. (3.6) yields equation

JTJδp = JTr (3.11)

called the normal equation. Another way of arriving to this solution is to assume that the projection function f (p) is locally linear and it can be approximated around

p as

f (p+δp) = f (p) + Jδp (3.12)

Where J is the Jacobian of f evaluated at p. Minimizing the cost function ǫ =

1

2kx−f (p)k is a task of finding parameters p=p0+△p such that

ǫ = 1 2krk 2 ₌ 1 2kx−f (p0+ δp)k 2_≈ 1 2kx−f (p0) − Jδpk 2 ₌ 1 2kr − Jδpk 2 _(3.13)

is minimized over all δp. This is a linear system of equations, which has a minimal solution when

d dδp

kr − Jδpk2= JT(r − Jδp) = 0 (3.14)

obtaining again the same normal equation JT_{Jδp = J}T_{r. Gauss-Newton method}

seeks the minimum of (3.13) by starting at a given parameter estimate p = p₀ and

then iteratively solving the normal equation to find a suitable update p=p+δp, which moves the cost function towards the global minimum. Unfortunately, the convergence of the algorithm is not guaranteed and it is possible to end up in the local minima or to not converge at all. Whether the algorithm converges is mostly influenced by the

(29)

1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5 −10 0 10 20 30 40 50 P0 P1 δP p ε Cost function Gauss−Newton approximation

Figure 3.3. Example of a minima overshoot by inappropriately long step based on Gauss-Newton approximation. The minimum of the second order approximation points towards a parameter estimate increasing the cost. Levenberg-Marquardt solves this by iteratively increasing the damping and reducing the step size. The damping term ǫ is increased by a factor of n2

each iteration until a solution with lower cost is found. In this case it took five iterations.

3.5. Levenberg-Marquardt

The rate at which the Gauss-Newton algorithm converges is not controlled. The sec-ond order approximate to the cost function does not always correspsec-ond well with the actual shape and taking a step which is too large can lead to worse results instead of improvement. Moreover, when at a point close to the minimum there is a danger of overshooting the minimal cost. The step δp taken at each iteration must have a correct length in order to avoid these problems. A method that is able to control the step size adaptively and effectively is the Levenberg-Marquardt (LM) method and it is this ability that made it very popular in solving least squares problems and thus in modern Bundle Adjustment. It is built upon the standard Gauss-Newton method of iterative solving of normal equations, however, it introduces a new variable called the damping term µ, which is incorporated in the normal equation to create the augmented normal

equation

(JTJ+µI)δp = JTr (3.15)

which allows to control the solution in terms of step-size and direction. When

damp-ing term is very high, then (JTJ+µI) is nearly diagonal, making δp approximately

equal to JT_r,_{which is the direction of gradient descent. Furthermore, the larger is the}

value of µ, the smaller steps δp are taken, thus controlling the rate of convergence. The strategy of the LM algorithm is following: at each step, if δp leads to reduction in the cost function, then the step is accepted and the damping term is decreased. If, however, the step leads to increase in the error, damping is increased and the augmented normal equations are solved again, until a suitable step, that decreases the cost function, is found. This inner loop of finding acceptable deltas of parameters is considered as one iteration.

(30)

3.6. Gradient descent In practice, the LM algorithm continuously decreases the damping term as long as the updates are accepted, meaning that the step size is increased if the steps are be-ing taken in the correct direction. This behavior is appreciated, since it accelerates the convergence. When approaching the minimum, step size may exceed the desired distance to the stationary point and the update leads to increase in the cost. In that case, iteratively reducing the step size is the way to keep the algorithm descending to-wards the minimum. If the cost function at current estimate is ill-approximated, then by increasing the damping term it is ensured that steps are taken in the direction of the gradient descent direction. LM thus alternates between the Gauss-Newton method (small damping terms) and gradient descent (large damping terms), taking the advan-tages of both. Numerous conditions exist for terminating the LM algorithm, ensuring that iterations stop when convergence occurs.

3.6. Gradient descent

This is the simplest first order method to find the minimum of a non-linear function. Its principle is to follow the gradient of the function, assuming that H in equation (3.6) is close to 1 and therefore

δx = −gT (3.16)

This approach works efficiently only if the Hessian is actually equal to a multiple of 1. That can be achieved by a technique called preconditioning . Knowing the direction of the gradient is not enough to compute such δx that lowers the cost function An appropriate the step size must be decided, which is done by line search. To find a proper scale for δx a line search can be used, which solves slightly adjusted equation

αδx = −gT (3.17)

where α controls the step-size.

Gradient descent is a method which is by itself not recommended, due to its slow convergence, but it can be combined for example with Gauss-Newton’s method to produce the LM algorithm described above.

3.7. Conjugate gradient

An alternate approach to solve a system of linear equations defined by

Ax = b (3.18)

such as the normal equations is the Conjugate Gradient method introduced by Hestenes and Stiefel. It is based on the method of conjugate directions, which seeks the minimum of the function r = Ax − b by selecting a sequence of orthogonal directions and then taking a step in each of those directions, that minimizes the function. If the

minimization is done over a parameter space RN, the convergence is guaranteed in at

most N steps as illustrated in Figure 3.4.

Conjugate directions method is an iterative process, starting at an initial parameter

estimate x0 and taking steps in directions d0, d1, d2, . . . , dN according to

(31)

Figure 3.4. A conjugate direction method in a 2-dimensional parameter space - the convergence occurs after 2 steps [18].

At each step ei = xi+1−x is the distance of our parameter estimate from the solution.

The task is to find the adequate step size αi, that minimizes the residual error

ri= Axi+1− b (3.20)

in the direction di. That occurs when diand ei+1are orthogonal, meaning that we do

not need to optimize in this direction anymore, because in this direction the parameters

are already optimal. That yields αi as a solution to

diTei+1 = 0 (3.21) diT(ei+ αidi) = 0 (3.22) αi = − diTei diTdi (3.23)

However, to solve this, ei would have to be known, which would mean the system is

already solved. To solve this, instead of the orthogonality of the search directions di,

the conjugacy (A-orthogonality) is required. Two vectors di and dj are conjugate if

diTAdj = 0 (3.24)

The condition for minimizing ri then changes to diAei+1 = 0 leading to

αi = −

diTri

diTAdi

(3.25)

Next task is to obtain a set of conjugate directions di which will be taken. That was

originally done by a conjugate Gram-Schmidt process, which required all previous steps

stored in the memory and also O(n3) operations to compute the whole set of directions.

This problem was solved by the conjugate gradient method, which uses conjugation of the residuals to construct a new search direction. This is very useful, since the residual has a helpful property of being orthogonal to the previous search directions, which infers that it is also orthogonal to all previous residuals. New residuals are just a

(32)

3.8. Lourakis‘ adaptation of Sparse Bundle Adjustment linear combination of the previous, forming a Krylov subspace – a subspace created by repeatedly applying matrix to a vector [18].

The resulting iterative Conjugate Gradient algorithm looks as follows:

d0 = r0= b − Ax αi = riTri diTAdi xi+1=xi+αidi ri+1=ri−αiAdi βi+1= ri+1Tri+1 riTri

d_i+1=ri+1+βi+1di

Where iterated over i the exact solution x is found after at most N steps. The CG algorithm can be generalized for solving arbitrary function, forming the non-linear conjugate algorithm.

That implies two possible employments of the CG algorithm for bundle adjustment problem. Either the CG is applied to the normal equations within one iteration of the Gauss-Newton algorithm, or the non-linear CG is used directly to solve the bundle adjustment problem. Using the CG to solve the normal equations raises the question, whether it is necessary to find the exact solution. Since the step taken in Gauss-Newton iteration is by only an approximate, it has been shown [11, 4] , that it is worthy to stop the CG iteration cycle earlier, before reaching N iterations. This approach is considered an inexact Newton method. A number of criteria are developed to decide when to terminate the CG cycle in order to get a substantial performance improvement while not losing accuracy.

What is interesting from BA point of view is a substantial speed improvement which can be achieved using an adjusted version of CG. The matrix A in case of normal

equations is in a form JTJ which is an expensive matrix-matrix multiplication. This

can be avoided using a simple trick of reforming the equations

αi= riTri diTAdi = ri T_r i diTJTJdi = ri T_r i (Jdi)T(Jdi) (3.26) ri+1=ri−αiJT(Jdi) (3.27)

and carry out only the two matrix-vector multiplications ωi = JTdi and JTωi. The

larger the BA problem, the more computation time is saved. This makes the Cg method very popular for solving systems with large J [11], but in [4] it has been shown that the CG show very slow convergence in a case of cameras lying on a circle pointing outwards overlooking the scene. Such setup is, however, typical for the panoramic cameras.

3.8. Lourakis‘ adaptation of Sparse Bundle Adjustment

A complete BA method using state of the art techniques was introduced by Lourakis and Argyros in an article called SBA: A software package for Generic Sparse Bundle

Adjustment and it became very successful and widely used in modern bundle adjustment

applications. It comes not only with a comprehensive description of the algorithm and implementation details, but also with a complete source code and a Matlab interface.

(33)

The core of the SBA algorithm is the Levenberg-Marquardt iterative method which was described above. To utilize the sparsity that appears in the Jacobian of BA prob-lems for the sake of better performance, techniques of reordering the matrices and then solving separately for camera and point parameters are used. Assuming no shared pa-rameters between cameras and points, the Matrix J used in equation (3.15) can be arranged very efficiently. An example of properly arranged matrix J for a system with 3 cameras and 4 points is in equation (3.28). Clearly the matrix has a very sparse structure. J=                      A 0 0 B 0 0 0 0 A 0 B 0 0 0 0 0 A B 0 0 0 A 0 0 0 B 0 0 0 A 0 0 B 0 0 0 0 A 0 B 0 0 A 0 0 0 0 B 0 0 A 0 0 0 B 0 0 0 A 0 0 B 0 A 0 0 0 0 0 B 0 A 0 0 0 0 B 0 0 A 0 0 0 B                      (3.28)

Elements A and B denote the derivations of the projection function f (ai, bj) over m

cameras and n points, the former with respect to camera parameters ai and the latter

with respect to point parameters bj. Using this form of the Jacobian and

correspond-ing rearrangement of the parameter vector and the gradient, the augmented normal equations then take following form

          U 0 0 W W W W 0 U 0 W W W W 0 0 U W W W W W W W V 0 0 0 W W W 0 V 0 0 W W W 0 0 V 0 W W W 0 0 0 V                     δa1 δa2 δa3 δb1 δb2 δb3 δb4           =           ǫa1 ǫa2 ǫa3 ǫb1 ǫb2 ǫb3 ǫb4           (3.29)

Such formed normal equations show a block sparsity and distinct separation of vari-ables, suggesting that it can be computed in an efficient manner. Blocks U, V and W are formed as Uj = n X i=1 AijTΣ−1xijAij+ µ (3.30) Vj = m X j=1 BijTΣ−1xijBij + µ (3.31) Wij = AijTΣ−1xijBij (3.32)

The right hand side containing the gradient is composed as

ǫaj = n X i=1 AijTΣ−1xijǫij (3.33) ǫbi= m X i=1 BijTΣ−1xijǫij (3.34)

(34)

3.8. Lourakis‘ adaptation of Sparse Bundle Adjustment Where

ǫij = xij − ˆxij (3.35)

with xij being the measured and ˆxij the predicted vector of image point coordinates.

Σ−1

xij is the inverse covariance matrix for a projection of a point i to camera j, coming

from the weighted least square function in equation (3.3). If there is no prior knowledge

about the covariance of the measurements, the matrix Σ−1

xij can be replaced by an

identity matrix. The damping term µ is added to U and V since they are the diagonal

elements of JT_{J. Once having this block-form of the augmented normal equations, it}

can be further rewritten as U W WT _V δa δb = ǫa ǫb (3.36) to treat it in a more compact way. Here comes the trick of utilizing the special structure of BA normal equation matrix. Solving this system directly, using the inverse

of matrix JT_{J, would be too expensive. Intelligent solution would be to calculate the}

inverse only for certain blocks, preferably the ones which are diagonal. Let’s consider the 2 equations that are described by (3.36)

U δa+ W δb = ǫa (3.37)

WT_δ

a+ V δb = ǫb (3.38)

Expressing δb from the second equation yields

δb = V

−1

(ǫb− W δa)

And substituting into the first equation we obtain

U δa+ W V−1(ǫb− W δa) = ǫa (3.39)

(U − W V−1

W )δ_a= ǫa− W V−1ǫb (3.40)

which can be expressed by a new matrix equation U − W V−1 W 0 W V δa δb = ǫa− W V−1ǫb ǫb

These reformed equations have a major advantage over the previous ones – the de-pendency of camera parameters on the structure parameters has been eliminated. Thus it can be solved first for the camera parameters, which is generally much more efficient because the number of parameters for cameras is much smaller than for the structure.

Computing an inverse of the dense matrix U − W V−1

W will therefore be significantly faster than it would be in the case of solving for the structure parameters. Matrix

U − W V−1

W is called the Schur complement of V . Since the matrix V is positive definite, the Schur complement must be also positive definite [17], which allows

effi-cient solving using the Cholesky factorization. Once solved for δa, the solution can be

back-substituted into equation (3.38) to get

(35)

and solve for δb. The inverse matrix V−1 was already computed in the previous step

and its calculation is easy, since V only contains blocks on the diagonal.

Finding deltas of camera and structure parameters is done every iteration of the LM cycle, and if it leads to a satisfying decrease in the error, the step is accepted and parameters updated to a new value. The criteria for a satisfying error decrease is defined by the gain ratio

ρ = (kǫpk2− kx − f (pnew)k2)/(δpT(µδp+ g)) (3.42)

Defined by the ration of actual reduction in the error kǫpk2after taking a step δp and

the reduction predicted for δp by the linear model f (p + δp) ≈ f (p) + Jδp. The sign of

the gain ration determines, whether the step will be accepted (ρ > 0) or declined (ρ < 0) The magnitude of the gain ration then controls the amount by which the damping term is reduced if the step is accepted, according to following formula

µ = µ ∗ max(1

3, −(2ρ − 1)

3₎ _(3.43)

When the step is refused, the damping term is increased by a term v and the term v is multiplied by 2, so the damping term increases exponentially until an acceptable step is found. Once it is found, v is reset back to 2. To terminate the BA, various thresholds, parameters and conditions have been introduced. The whole LM algorithm runs until one of the following criteria is met:

1. The magnitude of the gradient drops below a threshold ε1

2. The relative magnitude of δp drops below a threshold involving parameterε2

3. The magnitude of the residual ǫ drops below a threshold ε3

4. The relative reduction in the magnitude of the residual ǫ drops below threshold ε4

5. Maximum number of iterations has been reached

The SBA algorithm has been tested on various scenes and it has proven its qualities by reaching execution times orders of magnitude faster, compared to a naive implemen-tation of a dense, general purpose LM algorithm [13]. Its performance on a large scale image sets and also its free public availability were the reasons for choosing to build the proposed PanCam Bundle Adjustment on this algorithm.

(36)

4. PanCam bundle adjustment

In this section a new bundle adjustment method, further referred to as PanCam BA, will be proposed. After investigating the possibilities, SBA from Lourakis and Argyros was chosen as a base method. Other methods could be implemented as well, but the advantages they brought, mostly in terms of speed improvement were not crucial to the PanCam BA. SBA from Lourakis and Argyros provides a solid ground to build on, since it has been widely used in real application and it has a very thorough documentation. Also the availability of this algorithm in the CMP Web Service makes it possible to compare the results and validate the implementation.

The proposed method imposes constraints on the camera parameters through the use of special projection functions, tailored for each particular camera system. Some parameters of those functions are shared for certain groups of cameras. That leads to a different structure of the matrix J in (3.15) and raises a question, how to handle the new system of equations.

4.1. Requirements

Lourakis’ bundler provides a standard method for general bundle adjustment require-ments of today. In its generality, it is able to consider any camera positions, orientations and calibrations. However, in some cases it would be useful and maybe even necessary to impose certain constraints on the system. Such situation arises, when we have cer-tain knowledge about the structure of the system, for example a geometric relation between camera centers or knowing that the calibration parameters for all cameras are the same. Fixing some parameters at known values or sharing them between cameras could lead to improved quality of the result as it is supported by the original properties of the system. When implementing mathematical constraints for a specific camera type (or setup of cameras), it is necessary to correctly analyze the physical nature of the system. For example, when consider rotating PanCam platform creating panoramic picture of the surrounding (see Figure 4.1), several constraints arise. Naturally, camera centers for each individual picture around the rotation lie on a circle. Thus the X, Y, Z coordinates of camera centers are not free, but are bound by a specific rotation center, axis and a diameter. The angle by which the camera turned around the axis determines the position of the camera center. Ladybug camera types have also the dis-tribution of cameras around the centre fixed and the cameras form a rigid system. We can determine the camera centers around the axis once and then use this information in Bundle Adjustment for all other measurements, where the camera system was in different position. By the nature of the problem, better results in terms of pixel error are not guaranteed, but resulting system should correspond better with the reality.

4.2. Parameters and projection functions

Choosing the parameters is mainly a matter of finding possible interpretations of the projection function (2.1). Since bundle adjustment is dealing with sets of n points

(37)

landmark

rover traversing

rover taking images

Figure 4.1. A typical operation of the ExoMars Rover. PanCam image network is created by scheduled stops, when the PanCam creates a full panoramic image of the surrounding. The 3D model created from those images is a base for precise localization of the rover and a surface mapping.

bi for the i-th point and j-th camera. For bi the parametrization is simple, each point

is described by its homogeneous coordinates

bi = [X1, X2, X3, X4]T (4.1)

For cameras, there is a variety of different options, how to select the parameters. Based on previous investigation, three different parametrization methods were chosen. While all of them describe the same projection and can be easily converted from one to another, the individual parameters express different properties of the system, which is necessary in order to introduce constraints. Furthermore, the behavior of BA is different for each type of parametrization. The subspace of parameter values can have a significant influence on the result of BA, which was proven later in the experiments.

4.2.1. P - using the camera matrix

The first and the most obvious parametrization comes directly from the projection function (2.11). Denoting the camera matrix elements as

  p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12   (4.2)

and the world point coordinate being [X1, X2, X3, X4]T, the projection function then

takes the following form

f (X) = PX = " _p1X1+p2X2+p3X3+p4X4 p9X1+p10X2+p11X3+p12X4 p1X1+p2X2+p3X3+p4X4 p9X1+p10X2+p11X3+p12X4 # (4.3) revealing all parameters needed to describe the projection. In this relation, the camera is

described by a set of 12 parameters [p1. . . p12], corresponding to the elements of P. This

parametrization is the easiest to implement, since the parameter vector can be easily formed just by taking the elements of the projection matrix and the projection function

(38)

4.2. Parameters and projection functions −3 −2 −1 0 −3 −2.5 −2 −1.5 −1 −0.5 0 0 0.5 1 1.5 2 2.5 3 x y z

C(cx,cy,cz)

R(q1,q2,q3,q4)

(x0,y0) fx,fy

Figure 4.2. Visualization of the camera parameters in the world coordinates. Vector C de-scribes the displacement of the camera center, R represents the rotation of the camera’s principal axis, fxand fy both describe the focal length in terms of pixel x and y dimensions respectively and finally (x0, y0) are the coordinates of the image centre. Skew factor s is not displayed since it is here and in majority of applications zero.

has a rather neat form. Such representation, however, brings certain disadvantages. The parameters don’t really correspond to any meaningful physical property of the camera, as they are all hidden in the resulting camera matrix. For this reason, no special requirements can be imposed on the Bundle Adjustment process, such as constraints or fixation of parameters. However, an advantage is the simple calculation of the projection function and its jacobian. This parametrization can not efficiently utilize the panoramic nature of the camera, however, it serves as a baseline method for comparing the results. The camera parameter vector has following form

aj = [p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12] (4.4)

4.2.2. KQT - Calibration, rotation, translation

It would be convenient to grasp the physical properties of the camera and control their values during bundle adjustment directly. Looking back at Section 2.1., it is clear that this can be achieved by decomposing the camera matrix P according to equation (2.2). Forming the projection function by using the components of P rather than using the matrix itself, gives the hold of physically meaningful properties of the camera -calibration given by the matrix K, rotation given by R and position of the camera center included in vector C. The parameters defining a calibration matrix are shown in equation (2.7), the camera center is defined by a triplet [X, Y, Z] and the only thing left to be determined are the parameters describing the rotation matrix. At least 3 parameters are required to uniquely define a rotation matrix. A possible representation would be either the Euler angles, quaternions or some well behaved small rotation approximation, such as Rodriguez formula, local Euler angles, etc. According to [22] Euler angles should be avoided in favor of numerically more stable representations. For this work, quaternions were chosen as the parameters to describe rotations.

(39)

The camera parameter vector for this version looks like

aj = [fx, fy, x0, y0, s, q1, q2, q3, q4, c1, c2, c3] (4.5)

and its elements are visualized in Figure 4.2. To employ these parameters in bun-dle adjustment, it is necessary to express the projection function in terms of those parameters. To calculate a projection using this particular parametrization requires

computing a camera matrix P from the elements of aj according to equation (2.1).

Matrix R is needed to proceed with the computation, therefore it must be formed from the 4 quaternions. This parametrization is an alternative to (4.4) and it has some im-portant advantages. Having a direct, separate access to individual camera properties, it is easy to control them as desired. It allows to design a method, which will impose constraints on those parameters, such as fixation of certain parameters or sharing them among cameras. One can for example choose to let the camera calibration fixed, by not adjusting the calibration parameters during BA, or instead, adjust the calibration parameters, but in a same way for all the cameras. This approach is useful for example when it is known, that all pictures were taken by the same camera. Fixed calibration is useful especially when the intrinsic parameters of the camera have been found precisely beforehand. When no special treatment of parameters is enforced, this parametrization should be able to deliver the same results as the previous one, up to the numerical stability.

4.2.3. PanCam - spinning around

Although a satisfactory parametrization for most traditional cases has already been found, the task of the proposed BA technique is to incorporate constraints specific for PanCam-like cameras. As mentioned in section 4.1., the requirement for the general panoramic camera is that the centers of individual cameras rotate around the same axis,

with fixed diameter. Such circle will be described in 3D world frame by its center Cc,

orientation of its normal vector Rc and a radius ρ. Each camera center lying on this

circle will then be defined by a single parameter φ, specifying the angle by which the camera has spun around the axis with respect to some reference point. The parameters

c1, c2, c3 from (4.4) will become unnecessary and will be replaced by forementioned

circle parameters, from which the camera position can be expressed using a functional relation. Such function to define each camera position on a circular trajectory in 3D space can be for example following. First, consider a circle in 2D coordinate frame, or from a different point of view - a circle whose points all have the Z coordinate of 0. Also assume, that the center of the circle is at the origin. A camera lying on this circle would have following coordinates

C=   ρ cos φ ρ sin φ 0   (4.6)

Now going back to the 3D world, orientation of this circle in space is defined by its normal vector. In this case, the unit normal vector of the circle is [0, 0, 1], thus has the same direction as the Z-axis. If the circle is supposed to take a different orientation in space, it can be done by rotating it’s normal vector. Such rotation can be again

(40)

4.2. Parameters and projection functions f r Y X C¨ a) X Y Z=N₁ N2 C'' C' b)

C

_N

3 Cc

c)

Figure 4.3. a) How-to put a camera on an arbitrary circle in space. a) First, the camera center

C′′ is lying on a circle in an X-Y plane and it’s coordinates are determined by a radius r

and an angle φ as in equation (4.6). b) The circle can be randomly rotated by rotation R, which is in this case visualized by two arrows representing 2 successive rotations. Any such

rotation can be described by 4 quaternions. Corresponding camera center C′

is obtained by using this rotation on C′′. c) If the whole circle is also shifted in space, then resulting camera

center coordinates C are calculated by adding the circle center vector Cc to C

′

.

lying on this circle in a same way. New camera center is then

C= R   ρ cos φ ρ sin φ 0   (4.7)

where R is the rotation matrix created from (qc1, qc2, qc3, qc4). Finally, the circle

cer-tainly does not have to lie on the origin, but can be arbitrarily shifted by a vector

Cc = [Ccx, Ccy, Ccz]. Resulting in a complete functional description of a camera center

lying on some circle in space

C= R   ρ cos φ ρ sin φ 0   +   Ccx Ccy Ccz   (4.8)

The whole transformation process is shown on Figure 4.3. Every camera can be de-scribed by its position on some circle. Obviously, one camera can be said to lie on infinite number of different circles at the same time. For three cameras, the circle is already uniquely defined. This representation is thus feasible for setups, where always more than 2 pictures were taken from cameras spinning on the same trajectory, such as in Figure 4.4. That works perfectly in the case of a PanCam and similar types of cam-eras. After including the parameters of the circle and dropping the unused parameters of the single camera center, the parameter vector takes following form

(41)

Figure 4.4. An example of 5 cameras sharing the same circle, capturing several points in space.

4.3. Sharing parameters

As mentioned before, introducing complex projection functions, such as (4.8) makes the cameras to rely not only on its own parameters, but also on the parameters common to the whole camera group. This creates parameters which influence whole groups of cameras at once. For example, estimated coordinates of the center of rotation will have influence on all cameras, that have been rotated around this center. These shared parameters can be expressed in a vector s. The whole parameter vector is then separated as p = (p, a, b) denoting the shared, camera and point parameters respectively. This causes the Jacobians of the projection functions to be non-zero and the matrix J from (3.28) would then appear as

                     S A A A B 0 0 0 S A A A B 0 0 0 S A A A B 0 0 0 S A A A 0 B 0 0 S A A A 0 B 0 0 S A A A 0 B 0 0 S A A A 0 0 B 0 S A A A 0 0 B 0 S A A A 0 0 B 0 S A A A 0 0 0 B S A A A 0 0 0 B S A A A 0 0 0 B                      (4.10)

where S blocks denote the derivatives of projection functions with respect to the shared parameters

resulting in the matrix JT_J_{in (3.29) to loose some of its sparse structure. Two}

meth-ods how to handle this situation were considered. One was to use the same approach as in section 3.5 and calculate the Schur complement for the matrices containing also S. The other was to remove the shared parameters from (4.10) and calculate them separately, after updating the regular parameters first. The first approach is more straightforward, but would produce larger and non-sparse matrix U , which would now

(42)

4.3. Sharing parameters also contain the shared camera parameters. More calculations would also be needed

to form the JT_J _{itself. In order not to raise the number of computations that are}

necessary, the second method was chosen.

The augmented normal equations (3.29) remained in the same form, only the size of blocks A was reduced by excluding the shared parameters and calculating derivating only with respect to the regular parameters individual for each camera. A separate equations are solved for the shared parameters s in the form

(JTsJs+ Iµs)δs= Jsǫs (4.11)

where µsis the damping parameter for the shared parameters and matrix Jsin following

fashion _          S11 0 S31 S12 0 S32 S13 0 S33 S14 0 S34 0 S22 S35 0 S22 S36 0 S22 S37           (4.12)

The blocks Sij are the derivatives of all projection functions of camera j with respect

to shared parameter i. The horizontal dimension of the matrix is therefore the total number of parameters that are shared among some cameras and the vertical dimension is given by the number of projections from all cameras that share parameters. As observed in (4.12), the bocks S appear only in places, where some shared parameter has an influence on projections of specific camera. This structure is very flexible and almost any combination of parameters can be incorporated in the Jacobian. For example, in (4.12) the first parameter is shared between cameras 1-4, second parameter between cameras 5-12 and the third parameter is shared by all the cameras. This composition can

easily incorporate parameters for multiple panoramic cameras. Vector δs is composed

accordingly δs=      δS1 δS2 .. . δS3      (4.13)

and ǫs is the vector of all projection errors whose elements are formed as in (3.35).

The deltas of shared parameters are then found by solving (4.11) as a dense system,

which is acceptable, since the size of JT

sJs is influenced only by a the number of shared

parameters. For example, even such complicated system as 10 different panoramas, each sharing the circular trajectory parameters within each cluster according to Section 4.2.3 would result in a 80 × 80 matrix. The original SBA requires solving for camera parameters using dense inverse as well and the first approach, which would incorporate the shared parameters into the Jacobian directly, would require to compute an inverse of larger dense matrix in (3.40).

Considering 10 panoramas with each consisting 18 pictures, the size of (U −W V−1

W ) in (3.40) would be 1880 × 1880. Choosing the second approach requires to calculate the inverses of two matrices, one 1800 × 1800 and the other 80 × 80. This might not seem as a huge advantage, but considering for example the solution using Gauss-Jordan

(43)

P

₀

P

₁

d

P

Pr

Ps

X

Y

Figure 4.5. A visualization of the alternation method minimizing a function of two variables, starting at the initial estimate P0and ending up at P. Two successive steps in the parameter

δPr and δPs are taken for regular and shared parameters respectively using the alternation

method. The standard method would make only one step δP towards the minimum. It is important to realize, that both approaches can end up at different places in the parameter space.

5.83 × 10−9

computations whereas the first method requires 18803 = 6.64 × 10−9

. That is a 12% improvement in terms of speed.

4.4. Method of alternation

Separating the shared and the regular parameters in two groups and solving for each of them separately is called the alternation method. The process is separated in two phases, where one group of parameters is updated, while the other is held fixed. The comparison to the regular approach is visualized in Figure 4.5.

Two damping parameters as well as two sets of stopping conditions have been used for each update process. That allows better control of the convergence. When the update of either set of parameters would cause an increase in the cost function, the appropriate damping term is increased in order to shorten the step, until a proper deltas of parameters are found. If no update reducing the error is found for one set of parameters, the other set can still be updated freely. That leaves more flexibility for the algorithm. For example, reaching local minimum with respect to shared parameters will not cause the process to stop, but only keep the shared parameters as they are while updating the regular ones. The damping term for one set can be also decreased, if the other set has been successfully updated. That avoids blocking one set of parameters forever due to the damping being too high and therefore resulting deltas too small. The order of parameter updates can be chosen arbitrarily and the current implementation was set to update the regular parameters first.

(44)

5. Implementation

This section describes the details of implementation of the proposed PanCam Bundle Adjustment method. First, the core algorithm for sparse bundle adjustment was cre-ated, based on the SBA from Lourakis and Argyros [13]. This algorithm was tested and evaulated in order to ensure it provides same results as the original SBA. After confirming, that the core algorithm works as intended, thorough changes were made in order to provide extended functionality for PanCam-like cameras. Further improve-ments were done to include support for large scale operation setups, where multiple panoramic cameras were used. This resulted in a very flexible and general bundle ad-justemnt method, which is able to incorporate constrained, fixed and shared parameters of multiple devices in one bundle adjustment process. Major part of the program was written in MATLAB, only the most computation demanding parts were written in C and used through the MEX interface in MATLAB. To describe this algorithm, this section will go step by step through the designed bundle adjustment process.

5.1. Obtaining the parameters

In order to start the bundle adjustment, correct parameter vectos must be provided. Three functions for extracting the parameter vector, shown in table (5.1)

Table 5.1. Functions creating the parameter vector.

Function Product

get parameters parameters according to (4.4)

get parameters kqt parameters according to (4.5)

get parameters pancam parameters according to (4.9)

were developed to accomodate all three parametrizations mentioned in section 4.2. All of the functions produce the parameter vector in a form (a, b), where a and b are the parameters of all cameras and points respectively. The procedure of forming b is the same for each function. It simply takes the homogeneous coordinates of all points, arranging them in a matrix

    b1x b2x · · · bnx b1y b2y · · · bny b1z b2z · · · bnz 1 1 · · · 1     (5.1)

where n is the total number of points and b has therefore dimensions 4×n.