Canonical Representation of sidescan sonar images for robotics application
HONGSHENG CHANG
KTH ROYAL INSTITUTE OF TECHNOLOGY
SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
sidescan sonar images for robotics application
HONGSHENG CHANG
Master in System, Control and Robotics Date: February 25, 2021
Supervisor: John Folkesson Examiner: Patric Jensfelt
School of Electrical Engineering and Computer Science
Swedish title: Detta är den svenska översättningen av titeln
Abstract
Sidescan sonar (SSS) is crucial in underwater investigation as it is cheaper and provide wider detection range. Sidescan sonars are typically mounted on Au- tonomous Underwater Vehicles (AUVs) or vessels. However, the pixel and it’s intensity in SSS image is not corresponding to the 3D seabed perfectly, which cause some problem to utilize SSS image for navigation and mapping.
This thesis is aiming to represent the SSS image to a canonical image. The
approaches considered in this thesis including model fitting in range, incident
angle and beam angle on the intensity in direction across AUV traveling. Fur-
thermore, the deconvolution in direction along AUV travelling will be applied
to remove the blurry due to beam width.
Sammanfattning
Side-scan sonar (SSS) används ofta vid utforskning av undervattensmiljöer då det är billigare och kan läsa av breda områden. SSS är ofta monterade på autonoma undervattensfordon och fartyg. Pixlarna och deras intensitet i en SSS bild motsvarar dock inte alltid den tredimensionella havsbottnen perfekt, vilket kan leda till problem när SSS används för navigation eller kartläggning.
Denna rapport representerar SSS bilder som kanoniska bilder. Metoden som
beskrivs här behandlar passning av en horisontell modell i såväl avstånd, in-
kommande och utskickad ekolodsstrålsvinkel för att anpassa intensiteten på
bilden. Avfaltning används också för att öka bildskärpan som är suddig på
grund av bredden på ekolodsstrålen.
1 Introduction 1
1.1 Motivation . . . . 2
1.2 Research Question . . . . 2
1.3 Social and Ethical Aspects . . . . 2
1.4 Thesis Outline . . . . 2
2 Background 4 2.1 Sonar . . . . 4
2.2 Sidescan acoustic theory . . . . 5
2.2.1 Production . . . . 6
2.2.2 Propagation . . . . 6
2.2.3 Scattering . . . . 7
2.2.4 Imagery . . . . 7
2.3 Computer Vision . . . . 7
2.3.1 Sharpness . . . . 7
2.3.2 Deconvolution . . . . 10
2.4 Model fitting . . . . 12
2.4.1 Least squares . . . . 12
2.4.2 Newton’s method . . . . 12
2.4.3 Gauss-Newton method . . . . 13
2.4.4 Levenberg-Marquardt method . . . . 14
2.4.5 Gradient Decent . . . . 14
2.4.6 Comparison . . . . 15
2.5 AUV Lib . . . . 15
3 Related works 17 3.1 Time Varying Gain . . . . 17
3.2 Shape from Shading . . . . 18
3.2.1 Reflection model . . . . 18
v
3.2.2 Approaches . . . . 19
4 Experiment Setup 20 5 Methods 23 5.1 Across-track Calibration . . . . 23
5.1.1 Model . . . . 23
5.1.2 Discrete model fitting . . . . 24
5.2 Along-track Deconvolution . . . . 25
5.2.1 Kernel estimate . . . . 25
5.2.2 R-L deconvolution . . . . 27
5.2.3 Sharpness . . . . 27
6 Results and Discussion 28 6.1 Across-track Calibration . . . . 28
6.1.1 Altitude . . . . 28
6.1.2 Range . . . . 28
6.1.3 Beam angle . . . . 30
6.1.4 Incidence angle and Lambert model . . . . 31
6.1.5 Corrected Example . . . . 32
6.2 Along-track Deconvolution . . . . 33
6.2.1 Deconvoluted example . . . . 33
6.2.2 Exploration in parameters . . . . 33
7 Conclusions 40 7.1 Future Work . . . . 40
Bibliography 42
Introduction
Sidescan sonar, SSS, is a relatively simple, compact and inexpensive type of sensor. It provides a 2D projection of the 3D seabed [1] by sending acoustic waves which is wide across track and narrow along track [2]. Thus, it has a great potential in underwater mapping and navigation [3], particularly, simul- taneous localization and mapping, SLAM [4].
In a sidescan image formalized using a generic formula, the individual pixels do not correspond to equal size grid cells on the seafloor. Furthermore, the intensity of the return is related to the geometry as well as the altitude, beam pattern and range [5] in direction across Autonomous Underwater Vehicles (AUVs) travelling. This makes two images of the same sea floor seen from different ranges and angles look very different.
In addition, although the beam is pretty narrow along the direction of AUVs travelling, the beam width at large range still can’t be ignored [6]. This will lead to a blurry detection where the geometry is convoluted with a kernel re- lated to beam pattern along the direction of AUV travelling.
By using data collected in Swedish Maritime Robotics Centre (SMaRC) [7] we want to model the effects of range, incident angle and beam angle on the inten- sity. Further we should be able to deconvolve the variable and low-resolution image in the longitudinal (along the AUV track) direction to a higher and uni- form resolution.
1
1.1 Motivation
In this project, it should be possible to transform sidescan images from a known area of the sea floor to a canonical image that has all these distorting effects removed. This canonical image then will be, for example, much easier to match features to other canonical images of the same area. With matched pixels in two sidescan images, one can localize the AUV to a prior map.
1.2 Research Question
Can we model the effects of the transducer beam pattern and incident angle of reflection on the sidescan image in order to remove some of those effects.
1.3 Social and Ethical Aspects
Society can benefit from cheaper more capable AUVs. It can replace humans for dangerous underwater work. Sidescan sonars are smaller and cheaper com- pared with other sonars such as multibeam sonar. If it can be widely used for AUV navigation and simultaneous localization and mapping, SLAM, the cost and time efficiency will be optimized.
Furthermore, the underwater investigation is important to sustainability. There are many valuable mineral resources [8] [9] located on the seafloor. The repre- sented SSS images with better precision will benefit the measuring of seafloor and objects underwater.
However, while carrying out the ocean research, test objects such as cubes could be placed on the seafloor which if left there would contribute to an in- crease in ocean pollution. During the research, these potential sources of pol- lution should be noticed and prevented.
1.4 Thesis Outline
The thesis is structured as follows:
• Chapter 1 Introduction gives a brief overview of the project itself and the motivation.
• Chapter 2 Background explains basics of sonar acoustic theory, com-
puter vision and model fitting.
• Chapter 3 Related works introduces the previous research.
• Chapter 4 Experiment Setup explains the platform used to collect data and the research area.
• Chapter 5 Methods introduces the method to fit Lambertian calibration model and R-L deconvolution.
• Chapter 6 Results and Discussion shows the result of experiment and make further discussion.
• Chapter 7 Conclusions summarises the work in this project and discus-
sion possible work for further research.
Background
In this section, some basic knowledge about sonar theory will be introduced.
Also, some theories in computer vision will be presented as well, as some of ideas can be used in sonar image processing. Furthermore, some methods to fit models and a pre-build library processing sonar data will be introduced.
2.1 Sonar
Ocean, which covers more than 70% of the earth, is less explored, compared with the Moon or Mars. Detection of the seafloor is different from that of land surface, where camera and radar can get remote-sensing result from airplane or satellite. Electromagnetic waves, including lasers and visible light, are used by those remote-sensor. But they attenuate rapidly in water. However, acoustic waves attenuate very little in water and can propagate longer distance [10]. By emitting pulses of sounds (pings) and listening for echoes, sound navigation and ranging, sonar, is used to recognize a vessel or map the geometry of the seafloor [2].
With the development over several decades, there have been different kinds of sonars, including single-beam echo-sounders, multibeam echo-sounders, and sidescan echo-sounders[2]. As Figure 2.1 shows, single-beam sonars use a single beam to detect the depth under the sensor. It is convenient to set and widely used to estimate the depth of seafloor. But only the depth just under the sensor can be measured. With the low efficiency and sparse information on the sea depth, it can be used for some simple purposes only. Multibeam sonars can provide higher resolution bathymetry over a larger range than single-beam
4
sonars by transmitting several narrow beams to the seafloor. But it is much more expensive and has larger size. Sidescan sonars can provide imagery mea- surement of the seafloor with high-resolution and much larger range than the sonar mentioned above. Those sonar images are formed by two beams on each side, scanning at one angular dimension of freedom [2]. However, sidescan is not always used for bathymetry detection because of some effects of geome- try factors. More detailed theory about these effects will be explained in the following part.
Figure 2.1: Comparison between sonars, from [11]
2.2 Sidescan acoustic theory
The detection of sidescan sonar using acoustic wave can be separated into four process: Production, where acoustic waves are transmitted; Propagation, where waves are propagating in the water; Scattering where acoustic waves hit the seafloor; and Imagery, where echos are received by sensor to form a im- age. Figure 2.2 shows the physical model and definition of sidescan acoustic.
Sidescan sonars will transmit one beam on each side. These beams are narrow
along-track to give higher resolution and wide across-track to cover as much
range as possible [2]. The distance from the sensor to the seafloor below is
called platform altitude. The distance from sonar to seafloor in the direction
of peak of beam pattern is called slant range. The incidence angle is the an-
gle between a beam incident on seafloor and the normal. And same definition applies to reflection beam.
2.2.1 Production
In the beginning of one measurement, a pulse of sound wave (ping) is emitted.
But the amplitude of sound wave (or intensity) in not same in different direc- tions. So beam pattern is applied to describe this difference. The typical beam pattern of sidescan sonar is the shape of lobes as the top left part of Figure 2.2 shows. The main lobe is located in the center of beam angle range. And there are two side lobes on both sides of the main lobe [2]. They could lead to some unwanted effects which decrease the quality of detection. The shape of the beam patterns shows the transducer sensitivity, or the contour of acoustic amplitude in different directions [12].
Figure 2.2: Sidescan sonar beam model, from [2]
2.2.2 Propagation
The intensity of the pings will attenuate during the propagation, both on the
way to seafloor or on the way back to the sensor. And several factor will affect
this process including distance from sonar to seafloor (range) and character
of seawater. The decrease of intensity lead by range R is 1/R
2[2]. We will
regard the effect of seawater as a constant.
2.2.3 Scattering
When the acoustic waves hit the surface of seafloor, most of the energy is re- flected in the specular direction (the direction of reflection angle). Only a small proportion of energy with several magnitude lower intensity will be returned back to sonar, which is called backscatter [2]. Several theories have been pro- posed to describe the backscatter. The simplest model is Lambertian model [13], which use the the cosine of the incidence angle as the approximation of proportion model.
I
b= cos(φ)I
i(2.1)
where the intensity of the incidence is I
iand I
bis the intensity of backscatter.
And φ is the incidence angle.
2.2.4 Imagery
With the same propagation speed in water and different range, the time of flight (TOF) of the backscatter is different. As Figure 2.3 shows, the intensity of backscatter is a function of time [2]. Those peaks and valleys of sound pressure are transformed into voltage signal then the gray level of pixels. Rows of these pixels from different pings eventually constitute the sonar image in Figure 2.4.
Figure 2.3: Acoustic backscatter based on time, from [2]
2.3 Computer Vision
2.3.1 Sharpness
Sharpness is one of the most common standards to evaluate the quality of
images. It describes the amount of detail in an image. As is shown in Figure
2.5 the blurry image contain less detail than the left one.
Figure 2.4: Sidescan sonar image
(a) (b)
Figure 2.5: (a) Original Image (b) Blurry Image
Figure 2.6: 10-90% rise distance on blurry and sharp edges
Sharpness can be measured by the rising distance (pixel distance between 10%
and 90%) of edges in images. However, this method is not widely used because these edges has different shape and direction. Instead, there are two main direction to measure the rising distance indirectly.
First direction is using gradient of images to evaluate the sharpness. The gradi-
ent of discrete pixel values can be represented as the difference of neighbor pix-
els. The pixel values near the sharp edges always change dramatically. Thus,
the gradient near there area is high. The most common gradient functions
including Energy of image gradient, Roberts, Tenengrad, Brenner, Variance,
Laplace are introduced in [14] and [15].
For an M × N image, the intensity value of the pixel in m
thcolumn and n
throw are defined as I
m,n, the sharpness evaluation function are defined as follows:
• Energy of image gradient S
EG= X
M
X
N
(I
m+1,n− I
m,n)
2+ (I
m,n+1− I
m,n)
2, (2.2)
which use discrete image gradients at the row and column directions to evaluate sharpness.
• Roberts
S
RBT= X
M
X
N
(I
m+1,n+1− I
m,n)
2+ (I
m+1,n− I
m,n+1)
2, (2.3)
which is similar to Energy of image gradient, but use the difference of diagonal pixel as gradient.
• Tenengrad
S
T N G= X
M
X
N
∇I(m, n)
2, (2.4)
where 5I(m, n) = pd
2m+ d
2n, in which d
m= I
m,n∗ g
rand d
n= I
m,n∗ g
care gradients obtained using Sobel operators in row direction and column direction. g
r=
−1 0 1
−2 0 2
−1 0 1
, g
c=
−1 −2 1
0 0 0
1 2 1
• Brenner
S
BEN= X
M
X
N
(I
m+2,n− I
m,n)
2(2.5)
Compared with functions above, Brenner only use second gradient in column direction to evaluate sharpness, which save computation
• Variance
S
V AR= X
M
X
N
(I
m,n− µ)
2, (2.6)
where µ =
M N1P
M
P
N
I
m,nis the mean intensity of the image.
• Laplace
S
LAP= X
M
X
N
(I
m,n∗ L), (2.7)
where L =
0 1 0
1 −4 1
0 1 0
is the Laplacian operator.
Another direction is based on Discrete Fourier Transform (DFT). Images are transferred to frequency domain. DFT is a standard tool in signal and image processing [16] [17] [18]. Since the sharp edges will include more high fre- quency signal after Fourier transform, sharpness can be evaluated by the per- centage of high frequency signal. Then this method is improved by replacing Discrete Transform as Discrete Cosine Transform (DCT) to save computation in complex number range, since DCT has a greater energy-compaction prop- erty than the DFT, meaning that most of the image information tends to be concentrated in a few low-frequency DCT coefficients [19].
• Discrete fourier transform S
DF T= 1
M N
M −1
X
µ=0 N −1
X
υ=0
p µ
2+ υ
2P (µ, υ), (2.8)
where P (µ, υ) is the magnitude of the image fourier transform.
• Discrete cosine transform S
DCT= 1
M N
M −1
X
µ=0 N −1
X
υ=0
(µ + υ)C(µ, υ), (2.9)
where P (µ, υ) is the magnitude of the image cosine transform.
Other than these two method, Information Entropy function [20] and Vollaths function [21] are also used to evaluate the sharpness of images.
2.3.2 Deconvolution
One reason of blurry images is camera the shaking during exposure. The blurry images can be regarded as the convolution of clear images and kernel decided by camera motion.
I = k ∗ s + n (2.10)
where k is the blur kernel, which is always called point spread function (PSF), I is the blurry image, s refers to the underlying sharp latent image, n is noise and ∗ is the convolution operator. There are two main direction about blurry image deconvolution: blind and non-blind. The main difference is the PSF is known or not. l is estimated from blurry image b and kernel k. While k is inferred from the newly estimated l [22].
Figure 2.7: Blurred image and kernel from [23]
Non-blind deconvolution
Various non-blind deconvolution method has been proposed in recent years.
Classical methods include Inverse Filtering, Wiener filtering, Kalman filter- ing, Constrained least squares filtering [24] [25]. The main idea of these method is optimization with minimum mean square error. More recent meth- ods attempted to improve these methods and use them as their fundamental basis [26].
Some works focusing on probability methods like Maximum a posteriori (MAP) [27] and Markov random fields [28] and hierarchical Bayesian approaches [29]. Another direction is focusing on iterative optimization including maxi- mum entropy-based methods [30], Richardson-Lucy iterative approaches [31]
[32] and conjugate gradient methods [33]
Another interesting direction is using neural networks for image deconvolu-
tion. Neural networks are especially suitable for this task since they can fit
the nature of the problem without the assumption of the parameters. Some
successful work can be found in [34] [35] [36].
Blind deconvolution
An comprehensive review about blind deconvolution before 90’s can be found in [22]. Some classical methods [37] [38] only allow one degree of freedom in kernel. More recent work [39] use image itself with variational Bayes in- ference to estimate a precise kernel matrix. Also, there are some other works [40] [41] focusing on spatially variant kernel estimation.
2.4 Model fitting
2.4.1 Least squares
Least squares method was fist proposed by Legendre [42] in 1805 for com- puting the orbit of comets. The main idea is minimize the difference between the prediction and measurement [43]. The error is defined as the square of the difference between prediction and measurement.
For linear model y = f (a, x), where a = [a
1, a
2...a
M]
Tis the parameters, y
i, x
iare set of N data points.
E(a) = 1 2
N
X
i=1
(f (a, x
i) − y
i)
2= 1 2
N
X
i=1
(f
i(a) − y
i)
2(2.11)
∂E(a)
∂a =∇E(a) =
∂E(a)
∂a1
.. .
∂E(a)
∂aM
(2.12)
If we set the derivative of this error with respect to parameter as 0, we can compute the parameter for best model. However, least square method can only fit liner model. For non-linear model, these equations may not have closed- form solutions for a.
2.4.2 Newton’s method
Newton algorithm is an iterative optimization method using Taylor expansion approximation of non-linear model f (a, x) Then the minimum point of the approximation model is regarded as the start of next iteration. The iteration stops when we reach the certain accuracy [44].
We use the following iteration to find the optimal parameters:
a
n+1= a
n+ ∆a (2.13)
Considering the taylor expansion of eq.2.11 at a
n, the following steps are used to update the parameters until convergence.
E(a
n+1) =E(a
n) + ∆a
T∇E(a) + O(∆a
2) (2.14)
∂E(a
n+ ∆a)
∂a
n≈∇E(a) + H∆a (2.15)
∆a =−H
−1∇E(a), (2.16)
where H is the second derivative matrix or Hessen matrix,
∂2E
∂a1∂a1
· · ·
∂a∂2E1∂aM
.. . . . . .. .
∂2E
∂aM∂a1
· · ·
∂a∂2EM∂aM
2.4.3 Gauss-Newton method
As the Hessen matrix is still computation expensive, Gauss-Newton algorithm is proposed as a special example of Gauss algorithm. If we derive error func- tion eq.2.11 w.r.t a
j∂E(a)
∂a
j=
N
X
i=1
(f
i(a) − y
i) ∂f
i(a)
∂a
j= r
iJ
ij(2.17)
J =
∂f1(a)
∂a1
· · ·
∂f∂a1(a)M
.. . . . . .. .
∂fN(a)
∂a1
· · ·
∂f∂aN(a)M
, r =
f
1(a) − y
1.. . f
N(a) − y
N
, (2.18)
where J is the first derivative matrix or Jacobian matrix.
Then eq.2.18 can be written in vector:
∇E(a) = J
Tr (2.19)
Then H
kjin Hessen matrix can be written as:
∂
2E(a)
∂a
k∂a
j= ∂
∂a
kN
X
i=1
(f
i(a) − y
i) ∂f
i(a)
∂a
j!
=
N
X
i=1
∂f
i(a)
∂a
k∂f
i(a)
∂a
j+ (f
i(a) − y
i) ∂
2f
i(a)
∂a
k∂a
j=
N
X
i=1
(J
ikJ
ij+ S
kj)
H =J
TJ + S (2.20)
Considering equation above and eq.2.16, and S can be ignored,
∆a =−H
−1∇E(a) (2.21)
=−(J
TJ + S)
−1· J
Tr
=−(J
TJ)
−1· J
Tr (2.22)
2.4.4 Levenberg-Marquardt method
Levenberg-Marquardt(L-M) algorithm is an damped gauss-newton algorithm.
It combine the advance of both gauss-newton and gradient decent. Although these method can minimize the object function, L-M method is more robust.
Even if the initial guess is far from the solution, the iteration can still converge toward the solution [45]. This is realized by damper item in the update ∆a.
When the guess is far away from solution, the larger step is chosen with larger λ. The method behave more likely to gradient decent. When the guess is close, the step is shorter. And it behave more likely to gauss-newton.
∆a = −(J
TJ + λI)
−1· J
Tr (2.23)
2.4.5 Gradient Decent
Gradient decent is an iterative optimization algorithm finding the local mini- mum, which was originally proposed by Cauchy in 1847 [46]. The basic idea is searching in the direction opposite to the gradient with certain step length until the gradient is close to 0. Similar to the iterative method above, the update step is formulated by the gradient vector and learning rate α.
∆a = α · ∇E(a) (2.24)
In machine learning, the data set is sometimes too big to calculate. Not all
date is used in error function eq.2.11. If n = 1 and the sample is chosen
randomly, it is called stochastic gradient descent. If 1 < n < N , it is called
mini-batch gradient descent. If the whole data set is used, it is called (batch)
gradient descent. Since not all date are used in the error function, the direction
of Stochastic and mini-batch is not towards the global optimum. But it’s close
to the global optimum and accelerate the computation speed [47]. As these
method are suitable for large-scale training samples, gradient decent is widely
used in machine learning and deep learning.
2.4.6 Comparison
All of these methods can fit models properly. But different iteration step are need for different method. As is shown in Figure 2.8, Newton’s method need less step than gradient decent until convergence. This is because Newton’s method use a quadratic surface to fit the surface around the local guess. But gradient use a flat surface.
Figure 2.8: A comparison of gradient descent (green) and Newton’s method (red) from [48]
All of these methods have their limitations. Newton’s method is hard to con- verge if the initial guess is far from solution. And the Hessen is computa- tional expensive. These disadvantages is solved by Gauss-Netwon and L-M method.
Gradient decent is very sensitive to the learning rate. The iteration close to solution could be slow and act zig-zag shape in decent process. Also, there are tricks proposed to avoid these problems like cyclical learning rate [49] and adaptive learning rates [50].
2.5 AUV Lib
AUV Lib is a library of tools for reading, processing and visualization of sonar data [51]. One important function in AuvLib is draping.
Since detection from sidescan sonar doesn’t include geometry information,
pixel in sidescan sonar images need to be matched with their corresponding
area in the seafloor or "hit point". Then more geometry information can be gathered for further computation.
The matching is based on the position of the AUV and the geometry. As is introduced in section 2.2.4, the pixels are arranged by the time acoustic waves reach the sensor or TOF. Then the distance from the sensor to the hit point can be represented as the product of TOF and sound velocity, which is mea- sured before the survey ans saved as sound velocity profile (SVP). The SVP is the speed of acoustic wave in different depth which is measured before sur- vey. With this distance, the 3D points in the scan plane with same distance can be found and matched to the corresponding pixel. This process is called draping.
Figure 2.9: Example of draping in AuvLib from [51]
Related works
3.1 Time Varying Gain
As sidescan sonar is such a simple, compact and inexpensive type of sensor, many works have been done to model the relation between the sonar scatter- ing intensities and the geometry of the seafloor. Then the model can be used for other purposes including 3D reconstruction and image pre-processing for classification.
3D reconstruction is one of the important parts of navigation and augmented reality. And some research has been focused on this. In [13], backscatter intensity is used to calculate the incidence angel based on the reflection model derived from Lambert’s Law. Then seafloor contour is reconstructed using the sensor position, incidence angle, and range. Since these parameters are not enough to set a definite patch, global minimization is used to improve the smoothness. Although the 3D map of the seafloor is not that accurate, it is considered to include enough altitude information for navigation.
A time varying gain(TVG) is considered to correct range and beam angle ef- fect properly in [52]. Therefore, the intensity angel is the only factor that leads to the varying of the echo intensity. Based on this assumption and Lambert’s Law, a scattering model was build to present sonar intensity with seafloor re- flectively, beam-pattern and altitude. With these parameters, a 3D geometry map can be reconstructed. And Least mean-square was applied to optimize the map by minimizing the error between real image intensity and intensity calculated from model and parameter in map iteratively.
17
Except for 3D reconstruction, sidescan sonars are also applied to sonar image pre-processing for classification. In [53], flat seafloor assumption was made.
Thus, the other geometry information can be derived from the sensor altitude and range. And other parameters(beam angle and incidence angle) affecting sonar intensities are considered as range-dependent in the model proposed.
Based on the previous work, [5] [54] proposed a model with separate correc- tion factor of altitude, range and beam angle. And least squares quadratic fit correction was applied to make the spread of pixel intensity in sonar swath to be more stable.
Based on the previous research, applications of underwater sonar have been explored. [55] proposed a method using Markov Random Field(MRF) model to classify sidescan image for mining detection. [56] use sidescan correction and classification techniques to create and fuse classified SSS mosaics.
3.2 Shape from Shading
Shape from shading(SFS) is a classical computer vision technique for estimat- ing the 3D description of the surface of object based on gradual variation of shading in one or more images of that object [57]. Figure 3.1 shows the 3D model of a human face based on the image with shade. 3D reconstruction with both shade in camera image and sonar image are same essentially.
Figure 3.1: Shape from shading, (a) A real face image. (b) Surface recovered from (a) by the generic SFS algorithm from [58]
3.2.1 Reflection model
The only information is included in the gray level of the shaded image. A
model has to be build to generate 3D information based on the gray level of
the pixel. The most general model is the Lambertian model. When a im- age is taken with a known light source, the grey level at single pixel is only depend on incidence angle which is the angle between light source direction and the surface normal [57]. So the gray level of the pixel can be modeled as follows:
I
L= Aρ cos θ (3.1)
where A is the strength of light source. ρ is the albedo of the surface and θ is the incidence angle. If we apply the normal vector N = (N
x, N
y, N
z) and source direction vector S = (S
x, S
y, S
z), then the formula can be rewrite as:
I
L= AρN · S (3.2)
3.2.2 Approaches
Based on the Lambertian model, surface normal or surface gradient can be derived. However, there is no unique solution of the surface geometry. This is because if the surface shape is described in terms of the surface normal, we have a linear equation with three unknowns, and if the surface shape is described in terms of the surface gradient, we have a nonlinear equation with two unknowns [57]. Therefore, different SFS algorithm are proposed.
SFS was first introduced by Horn [59] [60], where five constraint is introduced
to solve the nonlinear gradient equation.
Experiment Setup
The data used in this work is collected from an HUGIN Autonomous Under- water Vehicle(AUV) from Kongsberg Maritime as is showned in Figure 4.1.
The AUV equipped with aided navigation system including inertial measure- ment unit(IMU) and GPS. There are also various of sensors integrated on the AUV. The related ones to this thesis including EM2040 multibeam sonar and EdgeTech 2205 sidescan sonar.
Figure 4.1: The Hugin AUV used to collected data
The bathymetry map of the surveying area at about -80m depth is shown in Figure 4.2. The area is measured by multibeam sonar with the moving track shown in the same Figure 4.3. The lighter color indicate shallower depth. As the terrain of the area is flat with some slopes. The colour change is not very obvious.
The sidescan data is collected from three different depth in this area. A typical surveying line of sidescan sonar is shown in Figure 4.4. The gray-level of pixels represent the intensity of each bin in the backscatter. The horizontal axis(x-axis) corresponding to the TOF of acoustic waves. And the vertical axis represent the pings in different timestamp during the survey. There are
20
Figure 4.2: The bathymetry map of surveying area
Figure 4.3: The AUV moving track in 3D and 2D
some pixels missing in the center of the image, this is the nadir area where beams can not reach.
The draping process can match SSS ping to the bathymetry map. The visual-
ization of draping can be found in Figure 4.5. The pixel of SSS image are pro-
jected to the bathymetry map. We can get geometry data of each pixel in SSS image including 3D position and normal vector of the scattering point.
Figure 4.4: Example of SSS images
Figure 4.5: Example of draping process
Methods
In this chapter, the methods of side-scan pings calibration in single ping and deconvolution in multiple pings are presented. In the first part, we will fo- cusing on fitting Lambertian calibration model to remove the influence of al- titude, range, beam angle and incidence angle in across-track direction. Next we will apply R-L deconvolution to optimize the blurry caused by beam width in along-track direction. In addition, sharpness evaluation method will be in- troduced, too.
5.1 Across-track Calibration
5.1.1 Model
The pings of sidescan sonar is formulated by certain numbers of bins. The number in our survey is 14212 in port and starboard side(left and right). For the i
thbin in Figure 5.1, the time of flight (TOF) is i∆t, where ∆t is the minimum timestamp difference of bins. Thus, we can get range r
ifrom TOF and sound velocity profile (SVP). Assuming the position of hit point as (x
i, y
i, z
i) and position of sonar(x
s, y
s, z
s), the altitude is defined as a = z
i− z
s. And beam angle is defied as arcsin(a
i, r
i). The incidence angle φ is defined as the angle between vector (x
s− x
i, y
s− y
i, z
s− z
i) and projection of normal vector − → N in scan plane.
As is introduced in section 2.2, the intensity of backscatter is related to
• Altitude a
23
Figure 5.1: Parameters in correction model
• Range r
• Beam angle θ
• Incidence angle φ
We add altitude a as an correction factor since the density of water in different altitude various.
Thus, the correction model is defined as
I
0(a, r, θ, φ) = C(a)C(r)C(θ)C(φ)I(a, r, θ, φ), (5.1) where I(a, r, θ, φ) is the intensity before calibration, I
0(a, r, θ, φ) represents the intensity after calibration, C(a), C(r), C(θ), C(φ) are the corrections func- tion that we are going to fit in the next part.
5.1.2 Discrete model fitting
According to Capus [5], the calibration model of altitude a can be modeled as polynomial. We started by modelling altitude correction as a 6
thdegree polynomial. The loss function applied in gradient decent model fitting can be expressed as:
L
1= N
a,r,θ,φ2
X
a,r,θ,φ
(C(a)I(a, r, θ, φ) − ¯ I(r, θ, φ))
2(5.2)
where N
a,r,θ,φis the total number of I with different a, r, θ, φ, I(r, θ, φ) is the ¯ averaged intensity at dimension a.
Then we can correct altitude a with the C(a) we get from gradient decent by
J (r
0, θ
0, φ
0) = P
a
C(a)I(a, r
0, θ
0, φ
0)
N
a(5.3)
For correction of r we could use the same gradient decent as the altitude di- mension. But there are limitation of polynomial model, which could be far away from a reasonable curve. We choose to use a discrete C(r) for range dimension, which can be estimated from
C(r) = P
θ,φ
J (r, θ, φ) ¯ J (θ, φ) P
θ,φ
J (r, θ, φ)
2(5.4)
The calibration for range dimension is same. And we apply same correc- tion procedure to the beam angle dimension and incidence angel dimension.
The intervals for range, beam angle and incidence angle are 1m, 0.01rad and 0.01rad.
5.2 Along-track Deconvolution
5.2.1 Kernel estimate
Gaussian kernel
According to the [6], the beam pattern in the direction of travel can be modeled as an simple Gaussian function, where h is the Gaussian distribution of the kernel, µ and σ are the means and variance of the Gaussian distribution.
h = exp( −0.5(x − µ)
2σ
2) (5.5)
As is shown in Figure 5.2, the beam width(in m) w at different range can be expressed as
w = 2r tan(γ/2), (5.6)
where r is the slant range from sonar to hit point and γ is the beam angular
width.
Figure 5.2: Parameters in deconvolution model
In our case, the waterfall image is discrete. Thus, the beam width in meters w has to be represented as pixel beam width n for further computation. We use the pings interval in meters m to define each pixel in the image represents a discretization of the distance along a column in the image. And we have
n = w
m = w
v · t (5.7)
where v is the speed of AUV (in m/s) and t is the ping interval (in s).
If we define the edge of beam width as half-power point, we can solve the Gaussian function for h = 0.5. The variance σ
2can be expressed as
σ
2= (0.5 · n)
2−2 · ln(0.5) (5.8)
Then the kernel can be represented as a Gaussian with 0 means ans various from equation above.
Cosine and Cosine squire kernel
Our another choice of model for the beam pattern , or kernel shape, is the cosine or cosine squire.
h = cos(a · n/2) (5.9)
Solving this equation for h = 0.5 with the same assumption about half-power point, we can get the scaling parameter a for cosine and cosine square model.
a = 2π
3n (5.10)
5.2.2 R-L deconvolution
In our moving sonar case, we only need to consider one dimension, which is the along-track direction. Considering the one column in a SSS image, again we use Figure 5.2 as an example, the pixel intensity in i
throw is I
iand the
"true" signal is s
i. If we assume the convolution involving 2n + 1 rows, the convolution process can be represented as
I
i=
n
X
k=−n
a
ks
i+k+ η, (5.11)
where a
kis the convolution kernel.
The Richardson-Lucy is an iterative optimization method. The update step in t
thiteration can be expressed as
s
t+1i= s
tin
X
k=−n
a
kI
i+kc
i+k(5.12)
c
i=
n
X
k=−n
a
ks
ti−k(5.13)
5.2.3 Sharpness
Sharpness is one of the method to evaluate the result of deconvolution. Since the deconvolution is applied to only along-track direction, the sharpness only needs to focus on that direction. Here we refer the Brenner function to evaluate the sharpness of SSS image.
S
BEN=
N −2
X
k=0