Data-Driven Methods for Sonar Imaging

(1)

Data-driven Methods for

Sonar Imaging

(2)

Lovisa Nilsson LiTH-ISY-EX--21/5381--SE Supervisors: Andreas Gällström

Saab Dynamics

Louise Rixon Fuchs

Saab Dynamics

Johan Edstedt

isy, Linköpings universitet

Examiner: Per-Erik Forssén

isy, Linköpings universitet

Computer Vision Laboratory Department of Electrical Engineering

(3)

model-based approach, is used to investigate the use of data-driven methods for reconstruction within sonar imaging. The method uses primal and dual variables inspired by classical optimization methods where parts are replaced by convolu-tional neural networks to iteratively find a solution to the reconstruction prob-lem. The network is trained and validated with synthetic data on eight models with different architectures and training parameters. The models are evaluated on measurement data and the results are compared with those from a purely model-based method. Reconstructions performed on synthetic data, where a ground truth image is available, show that it is possible to achieve reconstructions with the data-driven method that have less leakage than reconstructions from the model-based method. For reconstructions performed on measurement data where no ground truth is available, some variants of the learned model achieve a good result with less leakage.

(4)

(5)

kombinerar ett datadrivet och ett modellbaserat tillvägagångssätt för att undersö-ka användningen av datadrivna metoder för rekonstruktion inom bildbehandling för sonarer. Metoden använder primala och duala variabler inspirerat från klas-siska optimeringsmetoder, där vissa delar är utbytta till faltande neurala nätverk för att iterativt finna en lösning till rekonstruktionsproblemet. Nätverket tränas och valideras med syntetisk data på åtta modeller med olika arkitektur och trä-ningsparametrar. Modellerna evalueras på mätdata och resultaten jämförs med dem från en helt modellbaserad metod. Rekonstruktioner genomförda på synte-tisk data där en facit-bild finns tillgänglig visar att det är möjligt att åstadkomma rekonstruktioner med den datadrivna metoden som har mindre läckage än rekon-struktioner från den modellbaserade metoden. För rekonrekon-struktioner genomförda på mätdata där inget facit finns tillgängligt uppnår vissa varianter av den inlärda modellen ett gott resultat med mindre läckage.

(6)

(7)

appreciated your many ideas, motivating words and our fruitful discussions! Further I would like to thank my examiner Per-Erik Forssén and my supervi-sor Johan Edstedt at Linköping University. Per-Erik, thank you for your interest in my work and the feedback regarding the thesis. Johan, thank you for always answering my questions so fast and for the comments and review of my thesis.

I am also grateful to Ozan Öktem for the very interesting discussion we had and the insights you gave me regarding the work.

Thank you to everyone I met at Saab for welcoming me and for the opportu-nity to write my thesis on this exiting topic. A warm thanks to the other students at Saab for the fun lunch breaks and the insights into your topics.

Last but not least, I would like to thank my friends and family for your cheer-ing words and support and for helpcheer-ing me to relax durcheer-ing the work with this thesis!

Linköping, May 2021 Lovisa Nilsson

(8)

(9)

Notation xi 1 Introduction 1 1.1 Motivation . . . 2 1.2 Research Questions . . . 2 1.3 Delimitations . . . 2 1.4 Thesis Outline . . . 3 2 Theory 5 2.1 Sonar . . . 5 2.1.1 Signal Collection . . . 5 2.1.2 Image Reconstruction . . . 11 2.1.3 Leakage . . . 14 2.1.4 Signal Processing . . . 16 2.2 Inverse Problems . . . 17

2.2.1 General Inverse Problems . . . 17

2.2.2 Inverse Sonar Problem . . . 18

2.3 Deep Learning for Inverse Problems . . . 18

2.3.1 Fundamentals of Deep Learning . . . 18

2.3.2 Deep learning and Inverse Problems . . . 19

2.4 Learned Primal-Dual Reconstruction . . . 20

2.4.1 Operator Discretization Library . . . 20

2.4.2 Network Architecture . . . 20 3 Method 23 3.1 Datasets . . . 23 3.1.1 Synthetic Data . . . 23 3.1.2 Measurement Data . . . 25 3.2 Signal Processing . . . 26

3.3 Deep Learning Model . . . 28

3.3.1 Operators in ODL . . . 28

3.3.2 Learned Primal-Dual . . . 29

3.4 Training . . . 30

(10)

3.5 Validation . . . 31 3.6 Evaluation . . . 31 4 Results 33 4.1 Models . . . 33 4.2 Test Data . . . 33 4.3 Measurement Data . . . 40

4.3.1 Scene with Tripod . . . 40

4.3.2 Scene with Rope . . . 45

4.3.3 Scene with Paddle Wheel . . . 48

5 Discussion 57 5.1 Results . . . 57 5.2 Method . . . 58 5.2.1 Signal Processing . . . 58 5.2.2 Adjoint Operator . . . 59 5.2.3 Synthetic Data . . . 59 5.3 Future Work . . . 59 6 Conclusion 61 Bibliography 63

(11)

Abbreviations

Abbreviation Meaning

CNN Convolutional Neural Network ODL Operator Discretization Library PDHG Primal Dual Hybrid Gradient PReLU Parametric Rectified Linear Unit

PSNR Peak Signal-to-Noise Ratio RVG Range Varying Gain SNR Signal-to-Noise Ratio

SSIM Structural Similarity Index Measure STFT Short-Time Fourier Transform

TVG Time Varying Gain

(12)

(13)

1

Introduction

Sonar stands for SOund Navigation And Ranging and is a technique using sound waves to detect objects underwater. Sonar can be both passive and active, where a passive sonar listens to sound without transmitting itself whereas the idea of active sonar, on the other hand, is to first transmit an acoustic pulse in the water and then receive the echoes reflected at objects on the seafloor [23, p. 1]. From the received signals, reconstructions of the seabed can be created. Reconstruction of images from measured signals can be seen as solving an inverse problem.

Inverse problems can in general be stated as y = A(x) + e, where y is measured data, A is an operator, x the sought data and e noise. The operator A is in inverse problems called forward operator and maps from the space of x to the space of

y. For the inverse sonar problem, the forward operator handles the physics for

wave propagation and maps a scene to the corresponding received signals. The goal is to reconstruct x from the measured data but the problem might be ill-posed. In such cases, a small change in the measured data can have a large effect on the reconstruction. An ill-posed problem could also be a problem where sev-eral reconstructions are possible from the same measured data. Solving inverse problems are classically approached with model-based methods, which builds on domain-knowledge and analytical models for the problem [5], [7].

In contrast to model-based methods, there are data-driven methods that in several cases include deep learning. A deep learning network contains multiple connected layers which enables a computer to learn complex structures by break-ing them down into simpler ones. Deep learnbreak-ing, as a part of machine learnbreak-ing, has gained more interest in recent years due to the use of more powerful comput-ers and training algorithms [13, p. 5, 26].

Data-driven methods do however often require lots of training data for a ro-bust solution, which may not be available for all types of problems. To over-come this problem, recent research has started to combine model-based and

(14)

driven methods for inverse problems [5].

One promising method that combines model-based and data-driven methods is the Learned Primal-Dual Reconstruction presented by Adler and Öktem [3]. The presented solution is a method for solving inverse problems for tomographic data. The method is inspired by an analytical method for inverse problems and makes use of deep learning in form ofConvolutional Neural Networks (CNN).

Fur-ther the method incorporates the forward operator from the inverse problem into the architecture of the network, by using the Python libraryOperator Discretiza-tion Library (ODL) [1], [3]. ODL is a library for numerical methods and inverse

problems and it enables easier use of concepts like operators for real problems, by the use of defined structures.

1.1 Motivation

This thesis is conducted at Saab Dynamics in Linköping, where high technology solutions in fields like underwater systems including sonar systems are devel-oped. The image reconstruction of the seabed from the received sonar signals is today done with a model-based method, which introduces angular and range leakage into the reconstruction.

Data-driven methods have successfully been applied to other reconstruction problems in imaging, where the performed reconstructions improve in quality compared to model-based methods. A strong motivation for this thesis is to in-vestigate if data-driven methods can be implemented for the sonar domain as well. A data-driven reconstruction method that can reduce or suppress leakage in the reconstructed sonar images would be very promising and an important step in the improvements of sonar reconstruction methods.

The purpose of this thesis is therefore to investigate if and how well the sonar reconstruction problem can be solved with a data-driven method using deep learning compared to the model-based method.

1.2 Research Questions

From the introduction and motivation, the following research questions are formed. • How can data-driven methods be used for image reconstruction from sonar

signals?

• How can the Learned Primal-Dual Reconstruction be implemented for sonar data and how well does it perform?

1.3 Delimitations

A delimitation is that only sonar data provided by Saab Dynamics and taken from the freshwater lake Vättern will be evaluated in this thesis. Measurement

(15)

in particular the sonar image reconstruction problem, and no comparison with model-based methods, other than the currently used one, will be performed. Mul-tiple data-driven methods for solving inverse problems are also available, but only the method presented in Adler and Öktem [3] will be investigated further in this thesis.

In the chosen method from Adler and Öktem [3] the Python library ODL is used to incorporate the forward operator into the deep learning network. There-fore is ODL chosen as an implementation method for the sonar forward operator in this thesis as well.

1.4 Thesis Outline

Chapter two describes the theory for this thesis. Here are the sonar system de-scribed, inverse problems introduced and the deep learning network explained. In chapter three the method for this thesis is described. The implementations and decision that lead to the results are presented. Chapter four presents the results from the eight different trained networks for test data and measurement data. The results are further discussed and put in a wider context in chapter five. In chapter five is also the method discussed and ideas for future work presented. Chapter six summarizes the work in this thesis and discusses the research ques-tions in a context.

(16)

(17)

2

Theory

This chapter describes relevant theory for the method and results of the thesis.

2.1 Sonar

This section describes the theory regarding the sonar technique. First, the set-ting in which the sonar signals are collected will be described. Thereafter the currently used model-based method for image reconstruction will be discussed.

2.1.1 Signal Collection

The basic principle of sonar is to have an array with one transmitter and multiple sensor elements placed under water to send and receive acoustic waves to explore the surroundings. The transmitted signals are reflected at different objects at the seabed and later received at the sensor elements in the array. The received signals are measurements over time and can be used to reconstruct an image of the seafloor.

To understand how the sonar system works and how the signals are received, a simplified example with one transmitter, one receiving sensor and one reflect-ing point scatterer at the seafloor will be considered. Later the example will be extended to include an array with multiple sensors and have a more complex reflecting scene.

The theory regarding the sonar setting considered for this thesis comes from discussion with Gällström [15]. Details about other sonar systems and informa-tion about the general principles of sonar can be found in [10], [16] and [23].

(18)

Basic Setting

Consider an array placed under water containing one transmitter and one re-ceiver. On a scene at the seafloor one point scatterer is placed, meaning that only that point will reflect the sound wave. The remaining points in the scene will not reflect the incoming signal.

The signal collection setting can shortly be described as follows, where the variables will be discussed further on: The transmitter sends out a signal p(t) that propagates in the water to the reflecting point in the scene, where it reflects according to a reflection coefficient γ and travels back to the array and is received with a delay τ in time according to the transmitted signal. The raw received signal e(t) is processed in various ways to improve and prepare it for further use. Figure 2.1 shows the geometry setting with an array under water including transmitter and sensor elements. The scene on the seafloor with the point scat-terer can also be seen as well as the vectors rtand r1pointing between the

trans-mitter and reflector position and between reflector and receiver position.

Figure 2.1:Geometry for the collection of sonar signals.

First, the signal is transmitted from the array. In this thesis the transmitted signal p(t) is a linear chirp wave with pulse length T and frequencies in the band B. The transmitted signal is in the baseband defined as:

p(t) =        Aei2π(−B2t +2TBt2), if 0 ≤ t < T 0, otherwise, (2.1)

where T is the pulse length and B the bandwidth.

The STFT of the signal described in Equation 2.1 is shown in Figure 2.2. For the pulse length T it can be seen that the transmitted pulse has frequencies in the range −B₂ to B₂ kHz.

After transmission, the signal propagates in the sea and reaches the seafloor where it is reflected. The reflection at the point scatterer is associated with a

(19)

Figure 2.2:Transmitted signal shown as a spectrogram.

reflection coefficient γ indicating how much of the incoming energy is reflected towards the receiver. γ can be seen as a complex number where the magnitude is the intensity and the phase can be seen as a delay for the reflection at that point. The reflected wave travels on to the sensor element. The signal has traveled a distance from the transmitter via the reflecting point scatterer in the scene to the receiving sensor and it can be denoted as d = ||rt||+ ||r1||, where rtis the vector

between the position of the transmitter and the position of the point scatterer and

r1is the vector between the position on the point scatterer and the sensor element.

The corresponding travel time is called time delay, τ, and can be written as:

τ =d c =

||_r_t||_{+ ||r}₁||

c , (2.2)

where c is the sound velocity in water.

An expression for the raw received signal e(t) can be written as:

e(t) = γ

(τc)2p(t − τ). (2.3)

where a decreasing factor (τc)2can be seen.

The spreading of the wave from the transmitter is assumed to be spherical from an isotropic point. Since the total signal power is constant but spread to a larger spherical area, the signal power in an area element will decrease with the square of the range [23, p. 100–101]. If the initial signal power is P0, the signal

intensity for an area element at the distance τc is P0

4π(τc)2, where the denominator

states the area of a sphere.

The spreading loss is normally compensated for with either a Time Varying Gain (TVG) or Range Varying Gain (RVG) in the received signal [10]. That means

that signals that have travelled a longer distance are multiplied with the travelled distance or corresponding time to compensate for the loss.

(20)

Figure 2.3 shows a transmitted chirp pulse in blue and a received and delayed pulse in red.

Figure 2.3:Transmitted pulse in blue and received pulse in red.

The raw signal e(t) is processed with pulse compression to improve the

Sig-nal to Noise Ratio (SNR) and resolution of the system. This is often done with

a matched filter that performs cross-correlation between the received raw signal

e(t) and the transmitted pulse p(t) to calculate the signal s(t) [10, chapter 2.3]. As

equation it can be written as:

s(t) = e(t) ? p(t) = e(t) ∗ p∗(−t), (2.4) where ? and ∗ denote cross-correlation and convolution, respectively. Since the transmitted and the raw received signal are chirp signals, the result s(t) of the cross-correlation between them gives the final received signal as a sinc signal [9, chapter II.2]. Since the transmitted and received signals are limited in time to a pulse length T , the result of the pulse compression is an approximate sinc, which can be seen in Figure 2.4. The approximate sinc will have it its peak at the time for the arrival of the transmitted pulse p(t), which is the same as the time delay

τ.

The result of the pulse compression between the transmitted and received raw signal in Figure 2.3 can be seen in Figure 2.4, where it is shown that the received signal after processing is a sinc wave.

Extended Array

The array with one transmitter and one sensor element can be extended to in-clude several sensor elements. The scene at the seafloor still only has one point

(21)

(a)Received signal

(b)Received signal, zoomed

(22)

scatterer, which reflects an incoming acoustic wave to the array. The reflected wave will be received by all sensors in the array but at different times.

The setting in Figure 2.1 is extended with multiple sensors to a longer array. The array used in this thesis contains multiple sensor elements and one transmit-ter. For simplicity, only four sensor elements are shown in Figure 2.5, where ri

is the vector pointing from the point scatterer to the i:th sensor element in the array.

Figure 2.5:Extended geometry for the collection of sonar signals. Since the receiving sensors are at different positions, the reflected signal will be received at different times for the sensors. The equations in section 2.1.1 can be updated to τi= di c = ||_r_t||_{+ ||r}_i|| c (2.5) ei(t) = γ (τic)2 p(t − τi) (2.6) si(t) = ei(t) ? p(t) = ei(t) ∗ p ∗ (−t) (2.7)

where the subscript i denotes the i:th sensor.

Extended Reflection Scene

The reflecting scene on the seafloor considered in the previous examples only con-tains one reflecting point scatterer. The scene at a fixed depth z on the seafloor is now extended to a more realistic scene where all points have a reflection coef-ficient, γ(x, y, z). This means that a transmitted acoustic wave propagates in the water to the scene and reflects at all positions. The reflected waves from all the positions in the scene travel to the array and are received by all sensor elements. The measurements over time for one sensor element is a sum of the reflected waves from different positions arriving at different time delays. The equations

(23)

c c ei(t) = " γ(x, y) (τi(x, y)c)2 p(t − τi(x, y)) dx dy (2.9) si(t) = ei(t) ? p(t) = ei(t) ∗ p ∗ (−t), (2.10)

where ti(x, y) is the time delay for a signal received at sensor i reflected at position

(x, y) in the scene and ei(t) and si(t) are the raw respectively processed received

signal for sensor i.

The sonar setting now contains an array with a transmitter and multiple re-ceivers and a scene on the seafloor including different reflection values in each point. The geometry and the defined variables can be seen in Figure 2.6. It can be seen that the earlier examples are special cases of this setting.

Figure 2.6:The geometry and variables for the sonar setting.

2.1.2 Image Reconstruction

To better understand the seabed and what it contains, the received signals are reconstructed to an image. The seabed can contain objects in 3D, but for this the-sis, the reconstructed image is a 2D image. It can be seen as if the reconstructed image is a 3D scene shown as a 2D image.

Since the signals are collected from a scene to the side of the array, shadows may appear when objects are blocking the propagation way for the acoustic wave. A real scene has different reflection values in every continuous point, but when re-constructing an image not all of the points can be considered. The reconstructed images can therefore be seen as discrete samples of the estimated reflection val-ues in the sample points, which in coordinate (x,y) in a scene at a fixed depth z on the seafloor can be written ˆγ(x, y, z).

(24)

Delay-and-sum Reconstruction

There are multiple model-based reconstruction methods available, but a common method is delay-and-sum [10]. This method is also currently used in the sonar system for this thesis.

The delay-and-sum method can also be called back projection since it takes the received signals and propagates them back to positions in the depicted scene [16]. Figure 2.7 shows an overview of how the method works, and as the name states, the delays are used to sum up values from all the sensors. The reconstruc-tion of a scene is performed by reconstructing each point in the scene individu-ally.

Figure 2.7: Overview of the delay-and-sum method. The incoming wave will reach the sensor elements at different times. The value at the time for the delay from all sensors are summed to a point value.

Consider a reflected wave from a point propagating in the water. It will reach the sensor elements at different time delays since the distance to the elements varies. The positions of the transmitter, the sensor elements, and the point for re-construction are known and the time delays can be calculated from them. Since the reflecting wave was received at the time of the delay, the amplitude value from the received signal at the time for the calculated delay is taken for all sen-sors and summed together. This value is then mapped to the point in the recon-structed image. This could simplified be written as:

ˆ

γ(x, y, z) =X

i

si(τi(x, y)), (2.11)

(25)

point, but since z is the same for all points in the scene and do not vary it is left out.

It may be that the calculated time delay does not exactly correspond to a sam-ple in the received signal. Therefore interpolation is performed to get a value for the reflection at the time of the delay, even if no sample is available at that exact time [16].

This process is done for all points in the scene that should be reconstructed and the result is a reconstructed image of the seafloor.

Figure 2.8 shows an example of a received signal for one sensor and the corre-sponding reconstruction done with the delay-and-sum method. The reconstruc-tion shows a tripod as if it were seen from above. Reconstructed sonar images will be further discussed in chapter 4.

(a)Example of a received signal

(b)Example of delay-and-sum reconstruction

Figure 2.8:Example of a received signal for one sensor and the correspond-ing model-driven reconstruction from an array.

(26)

2.1.3 Leakage

The setup for signal collection described in subsection 2.1.1 will give leakage in angle and range. The phenomenon can be seen when reconstructing an image and the reasons for it will be further discussed in this section.

An example of leakage in angle and range can be seen in Figure 2.9. The reconstructed scene contains just one point scatterer located in the center of the scene and the signal was collected with an array of multiple sensor elements.

Figure 2.9:Example of reconstructed scene with leakage.

Angular Leakage

The angular leakage can be described in terms of diffraction, which is a phe-nomenon that appears when a wave propagates through a slit or aperture and spreads from its path. The diffraction pattern is a consequence of interference be-tween the spreading waves and results in varying intensity of the wave observed at the aperture. The diffraction pattern can also be called beam pattern.

For the sonar case, the reflection at the seabed can be seen as a source at a distance from the sensor elements which later will receive the reflection. The aperture with sensor elements can in its turn be seen as a rectangular slit in one direction, as pictured in Figure 2.10. If the distance between the reflection source and the aperture is large, so that reflected waves can be seen as an incoming plane wave, then the aperture is in the far-field region. When the far-field region is fulfilled, calculations are simplified and a diffraction pattern called Fraunhofer can be calculated [10, chapter 2.5], [14, p. 74]. The Fraunhofer diffraction pattern is the intensity pattern of an incoming wave in the far-field region observed at an aperture. The far-field region can be described as the range from the aperture to the source which starts from 2L_λ2, where L is the length of the aperture and λ the wavelength [14, p. 74]. For this system, the aperture length is L ≈ 50m and

(27)

If far-field holds, the observed pattern except for a phase factor is a Fourier transform of the aperture itself [14, p. 74]. The aperture used for the data col-lection in this thesis can be written as a rect(y_L), which is a rectangular window in the y-direction with width L. The Fraunhofer diffraction pattern which will be seen is then the Fourier transform of the aperture, written as:

F{rect(y

L)} = L sinc(f L), (2.12)

where L is the aperture length.

Figure 2.10:Sketch of how the array can be seen as a slit in the Fraunhofer domain

This means that the diffraction pattern from a source far away will be seen as a sinc wave at the aperture and that the intensity in the pattern will vary as a sinc wave with the distance from the center of the aperture. The sensor elements in the array perform spatial sampling of the pattern i.e. the sinc and the angular leakage seen is therefore approximately a sinc wave.

Range Leakage

The leakage in range appears as a result of the pulse compression as described in subsection 2.1.1. The reflected wave is a chirp wave propagating from the reflect-ing point towards the receivreflect-ing sensors. The received signal from a position in the scene is after cross-correlation with the transmitted signal a sinc wave centred around the time τ, which corresponds to the travel time for the wave from trans-mitter to the receiver via that reflecting point. Values earlier or later in time then

(28)

τ, correspond to positions closer or farther away in the range from the reflection

point.

When multiple receiving sensors are active, the leakage in range for some areas cancel out between received signals from different sensors, due to interfer-ence. The leakage can still be seen in the area directly from the reflecting point to the receiving array because no negative interference between the received sinc waves is possible since the travelled distance is almost equal.

The leakage in the range can be visualized as in Figure 2.11, where the sinc wave as the result of the pulse compression is placed in the range from the reflect-ing point to the array.

Figure 2.11:Visualization of how the leakage in range appears.

2.1.4 Signal Processing

The received signal is represented in the complex baseband, which is defined as a signal that has the majority of its energy close to zero in the frequency spectrum [8, p. 5–7]. A signal in baseband can be represented with fewer samples hence is computationally better than its passband equivalent. Most neural networks are however not defined for complex values so the signal has to be processed to be enable training with the neural network.

One solution is to convert the complex baseband signal s(t) to a real-valued passband signal ˜s(t), which has its energy spectrum centered around ± some non-zero center frequency fc. One way to write this relation is [8]:

˜s(t) = Rens(t)ei2πfcto_, _(2.13)

where a constant factor is left out. fc should be chosen large enough so that

the whole bandwidth of the signal after conversion is on the positive side of the frequency axis.

Before conversion from baseband to passband, the received signals may need to be upsampled to avoid aliasing. This could be done through interpolation between existing samples to create new samples. To avoid aliasing the sampling

(29)

2.2 Inverse Problems

This section introduces the theory regarding inverse problems, first in the general setting and then in detail for the case of sonar.

2.2.1 General Inverse Problems

Inverse problems are found in many different areas, such as geophysics, mechan-ical engineering, and imaging. An example from image processing is to restore a blurred image to its original. Another example from medical imaging is to recon-struct an image of an area of the body from example X-ray signals [6].

Inverse problems are formed as

y = A(x) + e, (2.14)

where y is given measured data, A is the forward operator, x the sought data and e noise [5]. The forward operator maps data between two spaces and can be described as A : X → Y , where X is the domain and Y the range of the forward operator. The measured data y ∈ Y is given from the sought data x ∈ X by the forward operator A, that may be either linear and nonlinear [5].

The forward operator differs depending on the given inverse problem but could for image tasks for example be a convolution, blurring, or sampling op-erator [20]. Ongie et al. [20] present multiple examples of different forward oper-ators with descriptions and connections to applications where they are used.

Without going into technicalities, a linear operator A∗, defined as A∗: Y → X is called adjoint of the operator A : X → Y if WX(A

∗

y, x) = WY(y, Ax) is fulfilled,

where W ( · , · ) denotes the scalar product in the respective spaces [22, p. 242]. The solution of an inverse problem is to reconstruct x from the given mea-sured data y. There are many possible methods available for this, both model-based and data-driven. One approach is to maximize the likelihood of the mea-sured data [3]. This means that from the observations y find the most likely sought data x. Maximization of the likelihood can be rewritten as minimization of negative log-likelihood L, which is stated as

min

x∈X

L_{(A(x), y).} _(2.15)

As earlier mentioned, this problem can be ill-posed, which leads to an unstable solution and also possibly over-fitting of the data. By introducing a regularizing term that incorporates knowledge of the data x, the problems with over-fitting are avoided. Hence, Equation 2.15 can be written as

min

(30)

where S is the regularization term and λ is a parameter controlling the amount of regularization [3]. Arridge et al. [5] further describes model-based approaches to solving inverse problems and handles the mathematical theory behind them. The focus in this thesis will however rather be on data-driven methods than model-based.

2.2.2 Inverse Sonar Problem

In the sonar setting, x corresponds to a 2D scene on the seafloor. It can be illus-trated as an image where each pixel represents the reflection value in that point. The forward operator can in this setting be described as a mapping between im-age space and signal space. It handles the physics of sound wave propagation from transmitter to receiver reflected at the seafloor. y is the received time signals for every sensor element and a higher amplitude means that a stronger reflected signal was received at that time.

By comparing Equation 2.9 and Equation 2.14 it can be seen that the sought image x from Equation 2.14 is the same as the reflection image γ in Equation 2.9. Hence the forward operator for the sonar problem is the integral over the scene on the seafloor with reflection coefficient γ and the time-delayed transmitted pulse

p(t), as stated in Equation 2.9. The forward operator can once be defined with

the positions of the array and the transmitted pulse and then be described as an operator that takes an input scene γ and outputs the corresponding received time signals for every sensor.

The inverse problem is to reconstruct the scene from the received signals. The reconstruction task in the sonar setting is different from example the task of de-blurring an image since dede-blurring is an operation between two images in the same space, whereas sonar reconstruction operates between two different spaces, a signal, and an image space, just like many medical reconstruction problems. The signal space contains in this case sampled time measurements and the image space contains a 2D image.

2.3 Deep Learning for Inverse Problems

This section first describes deep learning in general and continues with deep learning methods for inverse problems. Lastly, the chosen deep learning method for image reconstruction in this thesis is presented further.

2.3.1 Fundamentals of Deep Learning

Deep neural networks can be seen as a chain of functions f connected to calculate an output y from a given input x as in y = f (x) = f(N )_(...(f(2)_(f(1)_{(x)))...). The}

different steps in the chain are called layers and the first layer is the input layer. The last layer is an output layer and there often exists multiple layers between the input and the output layer, which are called hidden layers. The number of layers in a network is also called the depth of the network and the term deep neural network is referring to a network with a large depth [13, p. 164–165].

(31)

f with the learned mapping f and outputs a value for their difference. The error

between the output and the target is used to compute the gradients of the layers in the network by an algorithm called back-propagation. From the gradients, an optimizing function such as stochastic gradient descent can be used to update the parameters in the network [13, p. 164–165,200]. This process is repeated and the parameters are learned iteratively.

2.3.2 Deep learning and Inverse Problems

In the inverse problem setting, deep learning has the goal to learn a mapping

A†_θ : Y → X which can be seen as a pseudo-inverse operator that should ful-fil A†_θ(y) ≈ x given the inverse problem in Equation 2.14. The learning is done through training where the parameters θ should be learned, as described in sub-section 2.3.1.

Deep learning for solving inverse problems can be approached in several dif-ferent ways. One approach is to directly try to learn the inversion between the two spaces with a neural network. This fully learned reconstruction method re-quires a lot of data for training since the network needs to learn the physics be-hind the forward operator. Not using the known information about the system in the forward operator can be unwise, since the training dataset has to be large and rich enough to capture the underlying physics [2].

Another method is post-processing with a neural network. First, an approxi-mate inverse of the forward operator is used to map the signals back to the image space, i.e. perform an initial reconstruction, and then a neural network is applied. The neural network is used for removing artefacts and noise from the initial re-construction and in that way improves the rere-construction [20]. If an approximate method is used for the initial reconstruction, some information will be lost due to applying an approximation and the neural network may not be able to recover that information [2].

A third approach is the unrolling methods. Here is often a part of the op-timization learned with a neural network and then iterated to a solution [20]. These methods can be seen as a combination between model-based and data-driven methods since they often originate from classical optimization methods but replace or incorporate deep learning networks into the method [19].

Ongie et al. [20] categorize and present existing deep learning methods ac-cording to two criteria; knowledge of forward operator and available data. They argue that the knowledge about the system in form of the forward operator should be used and incorporated into the neural network so that the neural network does not have to learn the physics of the system. Since the forward operator is known for the sonar problem this is an interesting statement. Regarding the data for this thesis, it can be simulated so that matched pairs of data are available, which means that methods including supervised training can be used. Measured sig-nal data is also available but without a corresponding ground truth image of the

(32)

seafloor. More information regarding the used data can be found in section 3.1. With a known forward operator and data available for supervised training Ongie et al. [20] suggest some methods for solving the inverse problem. For this thesis methods like [4], [11], [12] and [3], that are using the unrolling approach to solving the inverse problem were the most interesting methods since they make use of the physics in form of the forward operator and do this in a deeper way than just post-processing of initial image reconstruction. The method Learned Primal-Dual Reconstruction presented in Adler and Öktem [3] was chosen for implementation in this thesis, since it not only accounts for the forward operator in the algorithm but also has shown promising results in the reconstruction of medical images between two different spaces, signal and image spaces, exactly as with sonar reconstruction.

2.4 Learned Primal-Dual Reconstruction

The deep learning network used in this thesis is the network presented in Learned Primal-Dual Reconstruction [3]. The paper presents a method for the reconstruc-tion of tomographic images directly from raw data. The method incorporates the knowledge of the forward operator and its adjoint into the network and derives from a proximal primal-dual optimization method, where parts of the optimiza-tion have been replaced with a deep learning network.

2.4.1 Operator Discretization Library

The forward operator is incorporated into the deep learning network with the Python library ODL [3],[1]. ODL enables easy use of concepts like operators for solving inverse problems by providing a framework for them, which can be adjusted to different problems.

This can be illustrated as implementing a convolutional operator that is de-fined for a specific kernel but can be used on different images as input to output convolved images. The adjoint of the operator is also implemented as an ODL op-erator. Both of the operators are later converted to Tensorflow layers to be used in the deep learning network.

2.4.2 Network Architecture

The network is inspired by the Primal Dual Hybrid Gradient (PDHG) method,

which is a primal-dual scheme for minimization problems with a structure like the minimization problem introduced in Equation 2.16. The primal-dual algo-rithm uses an auxiliary dual variable defined in the same space as the range of the forward operator together with a primal variable defined in the domain space of the forward operator to iterative update these variables to a solution. The final reconstruction is the primal variable after the last iteration.

The PDHG method uses proximal operators, which minimize a function G, to update the variables. The use of a proximal operator is done analogous to how a step is taken to update variables in the gradient descent method, but the

(33)

prox_κG(x)= arg min x0_∈_X[G(x 0 ) + 1 2κ||x 0₋ x||2_X], (2.17)

where κ is the step size of the update [3].

The presented deep learning network substitutes the proximal operator in Equation 2.17 with a CNN that acts as a learned operator. The result is a method that iterates between two variables, primal g and dual h, and where each vari-able is updated with a CNN to minimize a loss function as described in sub-section 2.3.1. To further improve the network and not just learn the proximal operators, the network learns for example, how to combine the previous and the next update in the best way, not only the step size for the update. The learned proximal operators are also allowed to differ between the iterations, something that improves the reconstruction quality and increases the number of parameters in the network [3].

Since the primal and dual variables are defined in different spaces, the domain respectively range of the forward operator, the forward operator, and its adjoint are needed to transform the variables into the other space. This is an important step to enable an algorithm that iterates between the primal and dual spaces to a solution.

The structure of one primal-dual iteration of the network can be seen in Fig-ure 2.12. The larger blue box corresponds to a dual iterate and the red box corre-sponds to a primal iterate. The yellow boxes highlight the CNN where an orange arrow means 3 × 3 convolution withParametric Rectified Linear Unit (PReLU) and

a yellow arrow means just 3×3 convolution. The numbers in the CNN denote the number of channels in the network. The number of channels for the primal and dual variables is set to five to involve some memory into the system by letting more data persist between the iterates than needed. The green boxes illustrate the use of the forward operator A and its adjoint A∗. In most cases, multiple itera-tions are performed, meaning that pairs of primal and dual iterates are added to the desired number of iterations. The output from the primal red iterate is used as input to another dual blue iteration, which in his turn outputs to the primal iterate to finish another iteration.

The algorithm starts with a dual iterate where the zero-initialized primal g and dual h variables are given as input together with the measurement data y. Since the primal variable lies within the domain of the forward operator and is forwarded to the dual iterate, the forward operator is applied to the primal vari-able to transform it into the correct space. The three parameters are concatenated and given to a CNN with three layers, each given by a 3 × 3 kernel. The activation function PReLU is used for the two first layers [3] and is defined as:

acj(x) =        x, if x ≥ 0 −_c_j_x, _otherwise, (2.18)

(34)

Figure 2.12:Network architecture for one primal-dual iteration.

where cj is a coefficient that determines the slope for negative input values [17].

The parameters in the convolution are initialized with the Xavier scheme and the biases are a all initialized to zero. The output from the CNN, shown as a yellow box in Figure 2.12, is added to the input dual variable to update it. The update of the dual variable finished the dual iterate [3].

The primal iterate takes the dual variable i.e. the output from the dual iterate and applies the adjoint operator to it to map it to the image space. The primal variable is also given as input to the primal iterate and after concatenation, they are forwarded to a 3-layer CNN with kernel size 3 × 3. The output of the CNN is added together with the primal variable given as input to update the primal variable and the primal iterate is done [3].

When both the dual and the primal iterate are completed one primal-dual iteration is performed. The presented solution in Adler and Öktem [3] uses ten iterations and the primal variable in the tenth iteration is then the reconstruction from the signal and can be written A†_θ(y).

The reconstruction is compared with the ground truth image and the empiri-cal loss defined as ˆL(θ) = _N1PN

i=1[||A

†

θ(yi)−xi||2X] is calculated. An Adam optimizer

with values β1= 0.9 and β2= 0.9 is used to update the network and for that to be

possible the gradients of all components in the network are needed. The gradient of the learned proximal is calculated with Tensorflow whereas ODL is used for deriving the gradients of the forward operator and its adjoint, since the operators are defined with ODL. The calculated gradients are also clipped to a norm of 1 to increase the stability through training [3].

The network in Adler and Öktem [3] is trained with data that is randomly generated every tenth iteration. In every iteration the data is propagated through the network to update the parameters in it. Every tenth iteration is testing per-formed with other data, that is the same every session. A cosine decay is used for shrinking the initial learning rate η0with the iteration step t for the total

it-erations tmax as to ηt = η20(1 + cos

π_t_maxt ), where ηt is the learning rate in step

(35)

3

Method

This chapter describes the used data and the different steps in the implementa-tion of the thesis work.

3.1 Datasets

This section describes the used data for this thesis.

3.1.1 Synthetic Data

Since the forward operator for the sonar problem is known, it is possible to gen-erate synthetic signals from a scene by applying the forward operator to it. The scene itself can also be generated since it is just a matrix with values for the reflec-tion in different posireflec-tions. By creating a scene and generating the corresponding received signals with the forward operator, matched data is created. Synthetic data generated this way was used for supervised training and validation in this thesis. There was no fixed dataset used for training but an example of a training scene can be seen in Figure 3.1.

Different geometrical shapes were placed at random positions and with ran-dom sizes in the training scene. The number of shapes was chosen ranran-domly in a range as well as the shape itself. In the scene, a random number of point scatter-ers, which can be seen as smaller dots in Figure 3.1, were also placed at random positions. The reflection intensity of the geometrical shapes and the point scat-terers were, for each object randomly picked in a range between -5 and 5. The background was set to zero.

Point scatterers were included in the training data since measurement data often have smaller objects, like sand and stones, as a part of it. Geometrical shapes were used since larger objects that are a part of the measurement data

(36)

often can be seen as different shapes. It can be a rope formed as a circle lying on the seabed or the edges of a wreck that forms a triangle.

The scenes for training and validation were 5 × 6 m large and contained 167 × 185 sampled reflection points. The location for the training and test scenes on the seabed related to the array were constant for all training sessions. The scene was positioned on a range distance from the array of 51.5 to 56.5 m. In the cross-range direction, the mean of the position between all pairs of the transmitter and the sensors were used as the center of the array and the used scene was with this consideration placed from approximately -3 to 2.5 m around the center of the array.

Figure 3.1:Example of scene for training

For testing during training, the same scene was used for all training sessions and it can be seen in Figure 3.2. It includes a triangle and a circle and some point scatterers. It also has two areas with multiple point scatterers close to each other. The intensity for the test scene was also set to be in the range of -5 to 5. This scene was used to calculate the loss throughout the training session. For validation of the models when finished training ten different scenes as the one in Figure 3.2 were used.

As for the training scenes, the test scenes includes geometrical shapes and point scatterers that incorporate objects that are likely to be seen in measurement data. Areas with several point scatterers close to each other were included since it is interesting to see if a reconstruction can capture them individually.

Since the training data is varying the test data was decided to be constant, so that the calculated loss would not vary due to if a scene happens to be easier or harder to reconstruct. It is also important to be able to see improvements over time as the network learns the reconstruction and with the same test scene

(37)

Figure 3.2:Example of scene for testing

One approximation used for generating the synthetic data is that no noise is added to it, so it is assumed to be noise free. The synthetic data is also sparse compared to measurement data, meaning that less points in the scene reflect an incoming wave than in the measurement data where all points have a reflection. Another approximation is that the generated scenes are all 2D scenes when the measurement signals can come from signals reflected at 3D objects at the scene.

3.1.2 Measurement Data

Data from measurements in freshwater lake Vättern, Sweden, collected by Saab Dynamics are available. There is data from around 800 transmitted pulses, where each pulse corresponds to a received signal at each sensor element in the array. Images can be reconstructed from these measurements and they depict an area of 120 000 m2in Vättern with a resolution of 4x4 cm. Depending on the chosen size for each reconstructed image a different amount of images can be created from the depicted area.

For the measurement data, no ground truth is available since the task of the inverse sonar problem is to reconstruct an image from the received signals. There-fore the measurement data was only used for evaluating the learned network.

(38)

3.2 Signal Processing

To be able to train and use the deep learning network on measurement signals, several processing steps need to be performed and they are described in this sec-tion.

The received sonar signals are complex-valued baseband signals and recon-struction performed on these signals with the delay-and-sum method will lead to a complex-valued image. Most neural networks are not defined for complex values and one way to solve this issue is to convert the complex baseband sig-nal to a real-valued passband sigsig-nal as described in subsection 2.1.4. Another solution would have been to split the real and imaginary part into two different channels and supply them to the network. This solution was not preferred since the complex value in the received signal has a physical meaning as phase and magnitude, which would be lost by dividing into real and imaginary parts.

First upsampling is performed on the baseband signal. The upsampling is performed with interpolation between the existing samples so that more samples over the same time are available. As described in subsection 2.1.4 distortion can be avoided if the sampling frequency is at least fs≥2fmax Hz, where the largest

frequency fmax can be written as fmax = fc+ B2 Hz. The original sampling

fre-quency is upsampled with a factor so that the new sampling frefre-quency is large enough to fulfil the above expressions according to the chosen center frequency

fcand bandwidth B. This upsampling ensures aliasing-free conversion to a

pass-band signal.

The upsampled signal is then converted to a passband signal as stated in Equa-tion 2.13. The passband signals have seven times more samples and it turned out that the real-valued passband signal was too large compared to the baseband sig-nal and caused memory issues when training the network.

The sonar signals contain reflected signals from a large area of the seafloor, but often a reconstruction is done on a smaller scene. This means that not the whole signal is of interest when reconstructing a smaller image and the upsam-pled passband signal can be shortened to enable (faster) training in the deep learning network.

The shortening is done with the use of the calculated time delays. Since it is known in what time range signals from the scene which is to reconstruct arrived, that signal is shorted to only include that interval.

The result of the signal processing step is a real-valued passband signal which is short enough to ensure training with the deep learning network. Delay-and-sum reconstruction performed on a real-valued signal will result in a real-valued image, but will slightly differ from a delay-and-sum reconstruction performed on a complex baseband signal. This is since a passband signal contains the same information as a baseband signal but modulated onto a carrier wave, which in this case has a higher frequency. Examples of reconstructions performed with both signals can be seen in Figure 3.3.

(39)

(a)Reconstruction with complex-valued baseband signals

(b)Reconstruction with real-valued passband signals

Figure 3.3:Delay-and-sum reconstruction performed on baseband and pass-band signals respectively.

(40)

3.3 Deep Learning Model

The deep learning model used in this thesis is a modified version of the Learned Primal-Dual Reconstruction presented in section 2.4. Here are the performed changes in comparison with the original deep learning network described.

3.3.1 Operators in ODL

Since the data and problem setting for the sonar differs from the medical problem in Adler and Öktem [3], the forward operator and its adjoint were changed to suit the sonar problem. ODL was chosen for implementation since it provides a way to incorporate the operators as Tensorflow layers to simplify the implementation of the primal-dual method.

The forward operator should from an input scene on the seafloor with reflec-tion coefficients retrieve the corresponding time signals that the sensor elements receive. The adjoint of the forward operator, i.e., the operator that maps the signals to an image, was implemented as the and-sum method. The delay-and-sum method is not an adjoint operator of the forward operator but is one approximate way to map from signal to image. The implemented approximate adjoint operator for the sonar problem will be referred to as the backward opera-tor.

Forward Operator

As described in subsection 2.1.1 the received signals are approximately sinc waves and the arriving time i.e. the time delay τ depends on the travel distance from the transmitter to a receiver over the reflection point in the scene. The position of the array is known as well as the positions of the desired reconstruction scene so the time delays for all sensors over every position in the scene can be calculated in the initialization of the forward operator and does not need to be done when using the operator.

The simulation of the received signals from a seafloor scene was done in the frequency domain and with passband signals but still following the principle of how measurement signals arise as described in subsection 2.1.1 and particularly stated in Equation 2.9. The Fourier transform of a sinc wave is a rectangular window and since the bandwidth B Hz and center frequency fcHz of the received

and processed passband signal ˜s(t) is known, the Fourier transform of it will be a rectangular window in range [fc−B2, fc+B2] Hz.

For a position in the scene the rectangular window signal is generated and multiplied with the reflection coefficient γ for that position. Thereafter the signal is shifted corresponding to the time delay τ and since the shift is in the time domain it corresponds to multiplication with an exponential in the frequency domain as in:

F{˜s(t − τ)} = ˜S(f )e−2πf τ, (3.1) where ˜s(t) and ˜S(f ) denote the received pulse compresses signal in passband

(41)

The signal generation is done for all positions in the scene for every sensor so that the signal for a sensor is a sum of reflected waves from all positions. To make sure the signal is real, the frequency spectrum [fc−B2, fc+B2] is conjugated

and mirrored to [−fc−B2, −fc+B2]. As the last step, the inverse Fourier transform

is calculated and the result is a signal in the time domain which contains the received reflections for all sensors from the given input scene.

The described implementation of the forward operator uses some assump-tions and approximaassump-tions. One is that the received and pulse compressed signal is a sinc. The transmitted signal is a pulse and is not infinite long, which leads to that the received and pulse compressed signal is just an approximate sinc.

Further does the forward operator not consider any multipath propagation of the waves, meaning that the model only accounts for waves that reflect once at the seafloor and then are received by the sensor elements.

The sound velocity is also considered to be constant over the location with the array and scene for the signal collection. The real sound velocity can vary with depth, seasons and geographical location [23, p. 111].

Backward Operator

The operator mapping from signal to image domain, hence in the opposite di-rection as the forward operator, was implemented as the delay-and-sum method described in subsection 2.1.2.

As input signals, the operator takes passband signals and retrieves a recon-struction of the scene defined when initializing the operator.

The time delays for every position and every sensor is calculated in the initial-ization of the operator and these are used in the backward operator. The calcu-lated time delays are used as input to an interpolation function between the time vector and the received signal. The output is the signal intensity at the time for the time delay. This value is multiplied with a TVG factor as described in sec-tion 2.1.1. In each point of the scene to be reconstructed, the output values from all sensor elements are added together. The final result is an image reconstructed from the input signals.

3.3.2 Learned Primal-Dual

In general, the network architecture with primal and dual iterates including CNNs as described in Figure 2.12 was not changed from the original paper to the sonar problem. The number of primal-dual iterations was however varied in different training sessions to see how it affected the learning.

Adler and Öktem [3] used norm clipping of the gradients before minimizing the loss, this was not used in this thesis, since no stable solution could be found with it. For the sonar case, the Adam optimizer minimizes the loss from the variables, without a norm clipping of the gradients.

(42)

The learning rate was decreased to 10−4, since the higher learning rate 10−3 used in the original paper for some training sessions resulted in unstable training, where the loss increased. In most cases, the Adam optimizer in combination with the cosine decay was used as in Adler and Öktem [3], but for some training sessions, no decay was used.

The total number of training iteration was for the majority of the training sessions 10000, where the original paper used 100 000.

3.4 Training

The presented deep learning architecture is trained with randomly generated scenes, further described in section 3.1. In every iteration, the data is propa-gated through the network to update the weights and biases in it. The primal variable outputted from the last primal-dual iteration is the reconstruction com-puted from the input signals. The reconstruction is used for the calculation of the loss together with the ground truth as in the original network.

New data is generated every tenth iteration for training and the network is tested with the test data, to calculate a loss between the reconstructed and the ground truth image.

The primal and dual variables have the dimensions [N, W, H, C], where N is the batch size, W the width, H the height and C the channels. For the primal variable, the width and height come from the generated images and is 167 re-spectively 185. For the dual variable, the width corresponds to the number of sensor elements in the array and the height is the number of samples for the sig-nal. The number of channels is set to 5 for both variables as in Adler and Öktem [3], which means that five primal and dual variables will be kept after each iter-ate. This can be seen as if the network has some memory, where data is allowed to persist between the primal and dual iterates.

When the primal or dual variable is used as input to the opposite iterate and where the forward respectively backward operator is used to transferring the vari-able to the correct space, only one of the five channels is used. For the primal variable, the second channel is used as input to the forward operator and for the dual variable the first channel is used as input to the backward operator.

The input to the first convolution layer in the primal iterate is a concatenation of the primal variable and the output from the backward operator which give a dimension of [N, 167, 185, 6]. The dual iterate has except for the dual variable and the output from the forward operator also the signals as input so the number of channels to the first CNN layer is 7.

The CNNs in the primal and dual iterates are identical except for the first dimension as described above. The two first convolutional layers have 32 filters and a kernel size of 3. The third and last layer convolutional layer has five filters and the kernel size is 3.

The training was done on the same type of data through all sessions, but the number of primal-dual iterations was changed, as well as the batch size, learning rate and the use of a decay.

(43)

image quality measurements are used,Structural Similarity Index Measure (SSIM)

andPeak Signal-to-Noise Ratio (PSNR). Both of the measurements use a references

image to calculate the metric. PSNR is a measure that is easy to calculate but do not capture the quality of an image as humans see it. SSIM is a method that focuses on structural information in the comparison of images rather than error measurements [24].

Ten different scenes with corresponding signals were synthetically generated to validate the trained models. Reconstructions are calculated for each model and scene and since ground truth images are available they can be used as a reference image to calculate SSIM and PSNR. For each model the mean value for SSIM and PSNR is calculated over the test scenes.

3.6 Evaluation

After training, measurement data was used for the evaluation of the network. Since the inverse problem is to reconstruct an image from measurement signals without a ground truth image, the evaluation is challenging. The measurement signals are given as input to the trained network and the output is a reconstruc-tion of the seabed from that signal.

Measurement signals from three scenes at different positions at the seabed were chosen for reconstruction. These were chosen since it is known from the delay-and-sum method that the seafloor at these scenes contained interesting ob-jects. The reconstructions from the different scenes and trained models are shown and discussed without image quality assessment.

(44)

(45)

4

Results

This chapter presents the results from the trained models with different parame-ters. First, the trained models are presented, then the results from synthetic data, where ground truth images are available. Thereafter, reconstructions made from measurement data without ground truth reconstructions are presented.

4.1 Models

Eight models were trained where batch size, number of primal-dual iterations, start learning rate, and use of decay for the learning rate was changed for the models. All models were trained for 10000 iterations and the hyperparameters used for each model can be seen in Table 4.1. Most models were trained with a cosine decay for the learning rate, but for model H no decay was used. Model F was trained with a batch size of three instead of one as the other models. For model G the learning rate in the first iteration was 10−3instead of 10−4as for the other models. Between model A to E, different numbers of primal-dual iterations are used, hence the architecture of the network is changed and more parameters are learned with a higher primal-dual iteration.

4.2 Test Data

The trained models are used to reconstruct the test images from the correspond-ing signals. Table 4.2 present some quantitative measure for the trained models. The presented loss is the test loss in the last iterate of training. The image qual-ity assessments, SSIM and PSNR, for the reconstructed test images is stated as a mean value over all test images for each model. The results for the delay-and-sum method are also shown as a comparison.

(46)

Model Batch size Primal-dual iterations Learning rate Decay A 1 20 10−4 Cosine B 1 5 10−4 Cosine C 1 10 10−4 Cosine D 1 15 10−4 _Cosine E 1 40 10−₄ Cosine F 3 20 10−4 Cosine G 1 20 10−₃ Cosine H 1 20 10−4 Not used

Table 4.1:Trained models with different parameters.

Model Loss SSIM PSNR A 0.228 0.813 26.95 B 0.226 0.834 26.91 C 0.226 0.846 27.05 D 0.207 0.856 27.45 E 0.202 0.866 27.75 F 0.196 0.858 27.93 G 0.193 0.879 27.85 H 0.188 0.738 27.54 Delay-and-sum - 0.833 25.62

Table 4.2: Image quality metrics for the trained models and delay-and-sum method calculated on test data. The best value for each metric is shown in bold.

Reconstructed images from the different models for one test image will be shown in the thesis, since the results for the different test images are very similar. The highest value for SSIM is achieved by model G, which was trained with a batch size of 1, 20 primal-dual reconstructions and cosine decay for the learning rate starting by 10−3. The highest PSNR is found for model F, which was trained with a higher batch size then the other models. Model H, trained without the use of a cosine decay for the learning rate, had the lowest test loss in the last iterate, but the also the worst SSIM. The delay-and-sum method has the worst PSNR.

The used color map for creating the image reconstructions is inferno. All images are plotted as the absolute values and power to 0.33 of the original image. This was done to better show the differences between the reconstructions.

The ground truth test image and the delay-and-sum reconstruction in this representation can be seen in Figure 4.1 and Figure 4.2. The delay-and-sum re-construction contains lots of smearing. There are no distinct edges on the objects, but rather ringing artifacts. The point scatterers are not visible in the reconstruc-tion.

(47)

Figure 4.1:Ground truth test scene

Figure 4.2:Delay-and-sum reconstruction

Figure 4.3 shows the reconstructed test scene from model A. Some ringing artifacts can be seen in the image. In the range in the middle of the image leakage is visible. In the upper part of the image, it can be seen that the background is noisy.

(48)

Figure 4.3:Reconstruction of test scene for model A

Figure 4.4:Reconstruction of test scene for model B

The reconstructions of the test scene from model B to E can be seen in Fig-ure 4.4 to FigFig-ure 4.7. The reconstructions are very similar and they all show ringing artifacts and some smearing. The point scatterers can be seen and the background is not that noisy.

(49)

Figure 4.5:Reconstruction of test scene for model C

(50)

Figure 4.7:Reconstruction of test scene for model E

Figure 4.8 and Figure 4.9 show the reconstructions of the test image for model F and G. Both of the reconstructions do not have that much ringing artifacts. The edges of the triangle and circle can be seen as well as the individual point scatterers.

(51)

Figure 4.9:Reconstruction of test scene for model G

In Figure 4.10 showing the reconstruction of the test image from model H, the edges of the objects and the point scatterers can be seen clearly. There is however visible noise in the background.

(52)

4.3 Measurement Data

Reconstruction performed on signals from three different scenes will be presented in this section. The scenes contain different objects standing on the seabed.

4.3.1 Scene with Tripod

This scene contains a tripod standing on the seafloor at a depth of 12 m around 54 m away from the array 4 m below the surface of the water. From that angle, the reconstructed image shows the tripod almost as if it were depicted from above. The three legs can be seen as the corners in a triangle with the tripod’s mounting head in the middle of it.

The delay-and-sum reconstruction is shown in Figure 4.11. The three legs and the head of the tripod can be seen, but the image has leakage in angle. This can be seen at the tripod’s legs which are smeared.

Figure 4.12 shows the reconstruction of the tripod with model A with the tripod legs visible. The difference in intensity between background and tripod is low.

(53)

Figure 4.12:Reconstruction of tripod for model A

Figure 4.13 to Figure 4.15 show the tripod reconstruction for model B to D. All reconstructions have speckles, mainly in the middle of the image. Figure 4.13 and Figure 4.14 have a background with low intensity emphasizing the tripod.

(54)

Figure 4.14:Reconstruction of tripod for model C

Figure 4.15:Reconstruction of tripod for model D

In Figure 4.16, showing the reconstruction for model E, two of the tripod’s legs are hard to distinguish, but the reconstruction does not have speckles.

(55)

Figure 4.16:Reconstruction of tripod for model E

Figure 4.17 showing the reconstruction from model F has small speckles vis-ible in the whole reconstruction. The legs of the tripod are hard to distinguish from the background.

Figure 4.17:Reconstruction of tripod for model F