Deep Learning for Positioning with MUSIC

(1)

Master of Science Thesis in Electrical Engineering

Department of Electrical Engineering, Linköping University, 2021

Deep Learning for

Positioning with MUSIC

Glädje Karl Olsson

(2)

Master of Science Thesis in Electrical Engineering Deep Learning for Positioning with MUSIC

Glädje Karl Olsson LiTH-ISY-EX–21/5366–SE

Supervisor: Zakir Hussain Shaik

isy_{, Linköpings universitet}

Per Brännström

FOI, Totalförsvarets Forskningsinstitut

Niclas Granström

FOI, Totalförsvarets Forskningsinstitut

Examiner: Zheng Chen

isy_{, Linköpings universitet}

Division of Communication Systems Department of Electrical Engineering

(3)

Abstract

Estimating an object’s position can be of great interest in several applications, and there exists many different methods to do so. One approach is with Direction

of Arrival (DOA) measurements from receivers to use the triangulation technique

to estimate one or more transmitter’s position. One algorithm which can find the

DOA measurements from several transmitters is theMUltiple SIgnal Classification

(MUSIC) algorithm. However, this still leaves a ambiguity problem which gives false solutions, so called ghost points, if the number of receivers is not sufficient. In this report solving this problem with the help of deep learning is studied. The thesis’s main objective is to investigate and study whether it is possible to perform positioning with measurements from the MUSIC-algorithm using deep learning and image processing methods.

A deep neural network is built in TensorFlow and trained and tested using data generated from MATLAB. This thesis’s setup consists of two receivers, which are used to locate two transmitters. The network uses two MUSIC spectra from the two receivers, and returns a probability distribution of where the transmitters are located. The results are compared with a traditional method and are analysed. The results presented in this thesis show that it is possible to perform positioning using deep learning methods. However, there is a lot of room for improvement with accuracy, which can be an important future research direction to explore.

(4)

(5)

Sammanfattning

Att uppskatta ett objekts position kan vara av stort intresse inom flertalet

områ-den, och det finns många olika sätt att göra det på. En metod är medDirection

of Arrival(DOA)-mätningar från mottagare, som används för att triangulera fram

positionen för en eller flera sändare. En algoritm som kan hitta DOA-mätvärden

från flera sändare ärMUltiple SIgnal Classification(MUSIC) algoritmen. Detta kan

ändå ge upphov till mångtydighet med falska lösningar, så kallade spökpunkter, om antalet mottagare inte är tillräckligt. I denna rapport så studeras möjligheten att lösa detta problem med hjälp av djupinlärning. Exjobbets huvudsakliga mål är att undersöka om det är det möjligt att utföra positionering med mätningar från MUSIC-algoritmen genom att använda djupinlärning och bildbehandlings-metoder.

Ett neuralt nätverk är konstruerat i Tensorflow och tränat och testat med data genererat i Matlab. Uppställningen i detta arbete är två mottagare för att loka-lisera två sändare. Nätverket använder sig av två MUSIC-spektrum från de två mottagarna och returnerar en sannolikhetsfördelning över var sändarna är loka-liserade. Resultatet jämförs med en traditionell metod och analyseras. Resultatet som presenteras i denna rapport visar att det är möjligt att utföra positionering med hjälp av djupinlärning. Dock finns det mycket förbättringspotential för träff-säkerheten, vilket kan vara en lämplig inriktning för framtida studier.

(6)

(7)

Acknowledgments

I would like to thank the thesis’ examiner Zheng Chen and supervisor Zakir Hus-sain Shaik at ISY, Linköping University for constructive feedback and guidance. Special thanks to Per Brännström and Niclas Granström, my supervisors at FOI, for good discussions and excellent feedback. Furthermore, I want to thank my family and friends, for their love and support, making these years at Linköping University so incredibly fun. Last but not least, a big thanks to my friend William Thelin, who helped me rebuild his old computer, making it possible for me to train my neural networks while working from home.

Linköping, April 2021 Glädje Kalle Olsson

(8)

(9)

1

Introduction

1.1 Background

Knowing an object’s position in space can be of significant interest in several ap-plications: the exact location of a cell phone in a cellular network, guiding a car on the roads using a navigation assistant or getting notifications about running speed from a sports watch. However, developing such advanced technologies in-volves years of research and technical progress.

Various technologies exist to determine an object’s position, like using a sonar, optical instruments, GPS etc. These technologies utilize various measurements, likeTime Of Arrival (TOA), Time Difference of Arrival (TDOA) or Direction Of

Ar-rival (DOA) measurements, to give a few examples, all with their own drawbacks

and advantages.

The first method mentioned uses TOA measurements. This gives highly accurate position estimations that are not affected by the distance between the transmitter and receiver. However, this method has one major drawback: the receiver needs to be precisely synchronized with the transmitters to know when the signal was sent and, in turn, to calculate how far the signal has travelled. With three TOA measurements from three separate transmitters, it is possible to trilaterate one receiver’s position. GPS and similar systems use this method.

The second method mentioned uses TDOA measurements. In this method, after two different receivers receive the signal, the difference of arrival time is utilized to calculate the transmitter’s distance using the known signal speed. Unlike the TOA method, the transmitter does not need to be synchronized with the receivers in this method. Instead one extra receiver is required to position the transmitter.

(12)

2 1 Introduction

The third method uses DOA measurements. Knowing the DOA of a signal in two different receivers, it is possible to triangulate the transmitter’s position without having to synchronize receivers and transmitters. In triangulation, the sought point is found as the third vertex of a triangle, when two angles and the position of the other two vertices are known. One way of finding the DOAs is by using theMUltiple SIgnal Classification (MUSIC) algorithm. With the MUSIC algorithm,

it is possible to find several DOA measurements simultaneously from multiple transmitters with a single receiver, with an accuracy depending on the geometry and number of receiver antenna elements. However with too many transmitters present, the triangulation may give false solutions if the number of receivers are not sufficient, as shown in Figure 1.1. This ambiguity in triangulation, which can lead to false solutions, so-called ghost points, is a hard problem to solve without changing the setup but might be simplified with the help of deep learning.

Figure 1.1:Experiment setup that results in false solutions, i.e., ghost points.

Black dots are receivers, solid circles are transmitters and dotted circles are ghost points.

1.2 Motivation

Deep learning is a type of machine learning where a network can ’learn’ to solve complicated problems without knowing the explicit structure of the problem. Deep learning is finding applications in various domains ranging from software-to biomedical applications. Especially in computer vision, specialized networks are used to identify and locate objects in images by exploring the probability theory while learning and predicting. A similar approach is used in this thesis, where a deep neural network uses MUSIC measurements as input and returns a probability distribution of where the transmitters are most likely to be. This leads to the problem formulation:

• Is it possible to perform positioning with measurements from the MUSIC-algorithm using deep learning and image processing methods?

This thesis does not in first hand aim to outperform the traditional ways of posi-tioning, but to see if this is a feasible approach to this problem.

(13)

2

Theory

2.1 MUSIC

MUSIC (MUltiple SIgnal Classification) is an algorithm used for determining the DOA of signals using multiple receiver antenna elements. What separates MUSIC from other simpler algorithms (e.g. interferometry) is its ability to handle multi-ple signals simultaneously. It was first proposed by Ralph Schmidt and published in [12]. MUSIC is a subspace method[11] which uses the noise subspace of the cross-covariance matrix of the receiver antenna array. There are some required conditions on the transmitted signals, and on the geometry and number of an-tenna elements of the receiver: the transmitted signals needs to be non-coherent and fewer than the number of antenna elements, and the maximum distance be-tween two adjacent antenna elements is less than half the signal wavelength to avoid ambiguity.

The number of transmitters present is D. The receiver is then assumed to be a circular antenna array with M > D elements. The distance between the an-tenna elements is considered small compared to the transmitters’ distance, so the incoming wavefront is assumed to be planar. The D transmitters are simulta-neously transmitting an individual signal each, where the signals are assumed to be non-coherent with a small bandwidth compared to the center frequency. The incoming wavefronts are then given in the M-dimensional signal vector:

X= AF + W (2.1)

where A =a(θ1) a(θ2) ... a(θd)

is a M × D complex valued matrix, containing steer-ing vectors as columns. For a circular array, the steersteer-ing vector for the dth

(14)

4 2 Theory

ing signal is given by:

a(θd) = exp j2πr λ cos(γ − θd) (2.2) where γ =γ1γ2... γm T

is the vector with angular positions of the antenna

el-ements, θd is angle of arrival from the dth signal, measured counter clockwise

relative to the x-axis, r is the array radius and λ is the carrier wavelength. For a setup with two transmitters and one circular array with four elements the matrix Ais a 4 × 2 matrix, A =a(θ1) a(θ2)

with a(θd) given by:

a(θd) =                  expj2πr_λ cos(γ1−θd) expj2πr_λ cos(γ2−θd) expj2πr_λ cos(γ3−θd) expj2πr_λ cos(γ4−θd)                  (2.3)

The D-dimensional vector F contains the complex valued signal vector, and W is the complex valued M-dimensional noise vector. The sample covariance matrix of vector X is given by the M × M matrix:

S= XX∗

= APA∗+ σ2I (2.4)

where P = FF∗ _{is the signal D × D covariance matrix, with the overline used to}

indicate sample mean. If the signals are uncorrelated, or partly correlated, P is positive definite (and therefore has full rank D)[12]. If there are more antenna

elements than incoming signals, then A has rank D, and APA∗is a positive

semi-definite M × M matrix with rank D. The matrix APA∗ then has D non-zero, and

M − D zero eigenvalues. The term σ2Iis the noise correlation matrix when white Gaussian noise is present.

From (2.4), S can be proved to be a Hermitian matrix, which means it has real

valued eigenvalues. With the added noise σ2I(which is positive definite), and

since APA∗

is positive semi-definite, S is positive definite (since the sum of a positive definite and a positive semi-definite matrix is also positive definite), with

M positive eigenvalues λi, i = 1, 2, ..., M with the M − D smallest values being σ2.

The corresponding M eigenvectors ei, i = 1, 2, ..., M must satisfy:

Se_i = λiei (2.5)

and with S = APA∗+ σ2I

(APA∗+ σ2I)ei = λiei =⇒ APA

∗

ei = (λi−σ2)ei (2.6)

The eigenvectors corresponding to the smallest eigenvalues, σ2, results in APA∗e_i =

0, and are therefore orthogonal to the signal space, spanned by the columns of A. That is the eigenvectors corresponding to the M − D smallest eigenvalues spans

(15)

2.2 Traditional Method 5

the noise subspace, EN. With this subspace the MUSIC spectrum from one

re-ceiver can be estimated with the function

PMU(θ) = 1 a∗_(θ)E NE ∗ Na(θ) (2.7) where a(θ) are candidate steering vectors. The angles, θ, which results in high peaks are candidates for the angle of arrival. If the antenna array consists of four antenna elements with a radius small enough so the distance between the elements is smaller than half the wavelength, two DOAs can be found without ambiguity.

2.2 Traditional Method

The traditional method that is used to perform the positioning is based on linear

equations. Given the angle α, and the position of receiver m, (xm, ym), all points

where the transmitter can be located is on the line given by the linear equation

y = ym+ tan(α)(x − xm). Given two receivers with two DOA measurements each, two pairs of different linear systems are obtained:

            −_tan(α₁₎ ₁ ₀ ₀ −_tan(β₁₎ ₁ ₀ ₀ 0 0 −_tan(α₂₎ ₁ 0 0 −_tan(β₂₎ ₁                         X1 Y1 X2 Y2             =             y1−tan(α1)x1 y2−tan(β1)x2 y1−tan(α2)x1 y2−tan(β2)x2             (2.8a)             −_tan(α₁₎ ₁ ₀ ₀ −_tan(β₂₎ ₁ ₀ ₀ 0 0 −_tan(α₂₎ ₁ 0 0 −_tan(β₁₎ ₁                         X1 Y1 X2 Y2             =             y1−tan(α1)x1 y2−tan(β2)x2 y1−tan(α2)x1 y2−tan(β1)x2             (2.8b)

Here αmis the bearing of transmitter m from receiver 1, and βmis the bearing of

transmitter m from receiver 2, measured counter clockwise relative to the x-axis; (yn, xn) is the location of receiver n and (Xm, Ym) is the location of transmitter

m. Equation (2.8a) is obtained if angle α1 is paired with angle β1, and (2.8b) is

obtained if α1 is paired with β2. Since it is only possible to use one angle only

once to form a pair, only one of the two solutions for (X1, Y1, X2, Y2)T is valid.

There would for instance not be a valid operation to pair α1 with both β1 and

β2, since then α2 would not be included in any pair. One way of eliminating

one solution is by checking in which half-plane the points lies in. A negative angle should correspond to a point in the lower half-plane, i.e a negative Y -value, and vice versa. If an angle pairing produces a point in the ’wrong’ half-plane, that angle combination can be dismissed and thus remove some false solutions. This still leaves some cases where both solutions seems valid, eg. when both transmitters are in between the receivers, then all four points lie in the ’correct’ half-plane, like in the case which is shown in Figure 2.1.

(16)

6 2 Theory

X

Figure 2.1: An illustration depicting a transmitter placement that gives all

solutions in the half-plane that the DOAs are pointing in. Circles are re-ceivers, triangles are correct tranmitter position and crosses are false posi-tions. Solid lines represent the DOAs, while dashed lines are extended be-hind the receivers.

2.3 Deep learning

In [7], the authors described deep learning as a method in machine learning where "the computer can learn complicated concepts by building them out of

simpler ones". The termDeep Learning comes from the visualisation of how the

concepts are connected to each other in a network with more than one layer. This visualisation would then be layers of nodes connected to each other, building up a deep network of connected nodes. This can be seen in Figure 2.2.

Many deep learning algorithms are a type of Artificial Neural Networks (ANN) which takes inspiration from the brains neurons, where a node and its connec-tions represents the cell body and the axon respectively. A node in a network gets one or more inputs from which it calculates an output that is transmitted to one or more nodes in the following layer. A node can be connected to only a few nodes in the next layers, or it can be fully connected, i.e. connected to every node in the following layer.

The first layer is the input layer which is the layer that receives the external data. Following the input layer comes a number of hidden layers, named hidden since they are not directly exposed to the input. Between the layers every connection is assigned a weight. A large weight means that the connection is important and transmits a large value and vice versa. It is these weights which are altered dur-ing traindur-ing to make the model fit the problem better. Read more about traindur-ing in Section 2.3.3.

Finally, the last layer, the output layer, gives the result of the model. The result can vary depending on the application. It can, for instance, be a probability

(17)

dis-2.3 Deep learning 7

tribution of what kind of object is present in an image (classification), or a real valued position of where something is located (regression problem).

Input layer

Output layer Hidden layers

Figure 2.2: Example of an Artificial Neural Network (ANN) with an input

layer with two values, two fully connected hidden layers and an output layer with a single value.

2.3.1 Activation Functions

In a layer, weighted values from the previous layer are summed up in each neu-ron. This means that the network can only work properly on linear problems. In order to perform better on non-linear problems, an activation function is applied to the output from each neuron. An activation function is a function that maps the weighted sum from the neuron to an output value, with different mappings used for different applications. The function can be linear or non-linear and be applied to all hidden layers and the output layers. Different activation functions can be used for different parts of the network.

One commonly used activation function is ReLU (REctified Linear Unit), defined as f (z) = max{0, z}[7, p. 170]. The ReLU is important since it reduces the risk of encountering the vanishing gradient problem, which occurs when an activation function saturates and the gradient gets very close to zero.

Other non-linear activation functions include the logistic sigmoid function, f (z) =

₁

1+exp(−z)

, useful in binary classification; the softmax function, f (z) =PKezi j=1ezj

(18)

8 2 Theory

i = 1, . . . , K, useful for classifications with K classes; or the hyperbolic tangent

ac-tivation function, f (z) = tanh(z), useful when the values should lie between -1 and 1.

2.3.2 Convolutional Neural Networks

One special variant of neural networks are Convolutional Neural Networks (CNN). A CNN consist of one or more convolutional layers, which uses the convolution operator instead of matrix multiplications to find the important features and con-nections between layers. The main idea is to let a number of kernels of a desired size "sweep" over the input data and calculate the output from a smaller number of inputs, than would be used in a traditional network. Instead of letting every in-put affect every outin-put, like in a fully connected layer, only a local region of the inputs influences each output node, resulting in a decrease in parameters and number of calculations needed. This can be seen in Figure 2.3. The output from the convolutional layer is often run through an activation function[7, p. 326-335].

Figure 2.3:Example of a convolutional operation with a kernel of size 3. The

grey node in the lower layer is only affected by the three grey nodes in the upper layer.

In a CNN, a convolutional layer is often combined with a pooling layer to output a summary of the statistics in a neighbourhood of nodes. The pooling layer can have different functions applied, but one often used is max pooling which returns the maximum value of the nearby outputs. Much like for the convolutional layer, the main idea with a pooling is to "sweep" a window and calculate the output from a smaller number of inputs. Pooling layers are very useful for downsam-pling the representation size and reducing the computational cost and memory usage, by letting the window "sweep" with a desired stride over the input. In Fig-ure 2.4 a max-pooling operation with a certain stride can be seen. Another thing that pooling layers do are that they makes the representation invariant to small translations, which is good when the presence of a feature is sought, but the exact position is not necessary. [7, p. 335-339]

(19)

2.3 Deep learning 9

0.2

0.7 0.5 0.7

Figure 2.4: Example of a max pooling with a window size 3 and stride 2.

This results in a downsampling by 2.

2.3.3 Training the network

In order to train the network by tuning the weights, some way of evaluating how the training goes is needed. This is done with a loss function, J(θ). This loss func-tion gives a value of how close, according to some measure, the predicted output is to the real answer. To get as good performance as possible we want to minimize the loss function by changing the weights. This can be done by calculating the derivative w.r.t. the weights θ, and move with small steps in the direction op-posite to the sign of the derivative. This is called the gradient decent algorithm, which gives us the optimal weights for the network when we have found the min-imum of the loss function.

In order to use a gradient decent method to find the optimum, we first need to find the gradient. This is done effectively with back-propagation[7, p. 200-220]. Back-propagation uses the computed cost J(θ), and then works its way backwards

through the network to obtain the gradient ∇θJ(θ). The calculated gradient can

then be used in gradient descent methods to optimize the network. Some examples on loss functions are:

• Mean Squared Error: EMSE = N1

PN

i=1(yi−yˆi)2

• Mean Absolute Error: EMAE= N1

PN

i=1|yi−yˆi|

• Binary cross-entropy: EBCE= −_N1 PNi=1yilog ˆyi+ (1 − yi) log(1 − ˆyi)

Here N is the number of predictions, yi are the true values and ˆyi are the

pre-dicted values. Mean squared error and mean absolute error are useful in

regres-sion problems and then yiand ˆyitakes real values. Binary cross-entropy is useful

in classification problems, when the output should belong to multiple classes. Then y is a vector, with as many elements as classes, containing zeros and ones indicating which classes are correct. ˆy is the same length as y, containing values between zero and one.

(20)

10 2 Theory

2.3.4 Batch normalization

Batch normalization is a recently evolving technique in the field of deep learning[9]. During training the gradient tells how to change each weight to minimize the loss, assuming that no other layer changes. But in practice all layers are updated simul-taneously, which may give strange results. With the help of batch normalization, this can be avoided while still having fast learning rates and have an effective training.

Batch normalization is reparametrizing the output of a layer to reduce the in-ternal covariance shift in the batch, which is a number of samples that is run through the network before updating it. The normalization is made by calculat-ing the batch mean and variance. If X = x(1)_{. . . x}(d)

is d layer inputs, with a batch of m samples, the reparametrized input for input k is

ˆ x(k)= x (k)₋_µ √ σ2₊ (2.9) where µ = _m1 Pm

i=1xiis the batch mean, σ2= m1

Pm

i=1(xi−µ)2is the batch variance and is a small value for numerical stability. The reparametrized input is then

scaled and shifted with the trainable parameters γ(k) and β(k) to get the final

output

(21)

3

Deep Learning based Positioning

Methodology

3.1 System Model

The system model in this work considers two receivers and two transmitters. Each receiver consists of a circular array with four antenna elements, with a ra-dius of 0.33 metres, making the distance to each other less than half the signal wavelength, ensuring that the DOAs are unambiguous, as stated in Section 2.1. The signal wavelength of 1 meter is considered. The antenna setup is pictori-ally presented in Figure 3.1a. The two receivers are placed distanced 20 units of length from each other in the x-direction. The receivers are placed in the centre of a square with sides of 400 units of length. The receiver placement is pictorially presented in Figure 3.1b. The transmitters must be placed inside of the square.

3.2 Generating MUSIC Spectrum

The MUSIC algorithm is implemented using MATLAB. The transmitted signals are modelled as complex-valued functions cos(ωt)+j sin(ωt) with different lengths of zero paddings and independent white Gaussian noise, to differentiate the two signals from each other.

The received signal at the antenna array is calculated as

X = AF + W =a(θ1) a(θ2)

F₁

F2

!

+ W = a(θ1)F1+ a(θ2)F2+ W (3.1)

where F is a signal vector with F1 and F2 being transmitted signals from two

transmitters, a(θi) is the steering vector described in (2.2) with θ1and θ2being

the DOA from the two transmitters, and W is additive white noise with zero mean

(22)

12 3 Deep Learning based Positioning Methodology

r = 0.33𝜆

(a)Antenna setup. The radius is expressed in meters.

l = 20

l = 400 l = 400

(b)Receiver setup. The distances are expressed in unit of length.

Figure 3.1:Antenna setup and the receiver setup.

and variance that gives a SNR of 30 dB. Using the received signal we compute the MUSIC spectrum as described in Section 2.1. The candidate angles used in (2.7)

lies in the interval θ ∈ [−180◦

, 180◦

), with a step size of 0.5°. The spectrum ob-tained is shown in Figure 3.2.

-180 -120 -60 0 60 120 180 [degree] -1 0 1 2 3 4 5 log 10 (P MU )

Figure 3.2:Example of a MUSIC spectrum. Note the log scale on the y-axis.

3.3 Data Generation

To train the network a large amount of training data is needed. The two transmit-ters are placed at uniformly randomized points around the receivers. The

(23)

place-3.4 Traditional Method 13

ment is restricted by the square described in Section 3.1 and shown in Figure 3.1b. For every deployment of transmitters, the MUSIC spectrum is obtained as described in section 3.2. Then the spectrum is saved along with the actual trans-mitter positions. To ensure that the transtrans-mitters do not occlude each other and only cause one peak in the MUSIC spectrum, the angle at the receiver, spanned by the receiver and two transmitters must exceed a threshold angle set to 10°. An example of an invalid placement and the resulting spectra can be seen in Figure 3.3. Also, to ensure that the traditional method, described in Section 3.4, does not end up with unsolvable linear equations, the placement must satisfy that the DOAs from the two receivers are not parallel with each other. The data genera-tion is done in MATLAB.

(a) Transmitter place-ment that gives an occlusion. -180 -120 -60 0 60 120 180 [degree] -1 0 1 2 3 4 5 6 7 log 10 (P MU )

(b)The resulting MUSIC spectra

Figure 3.3: In Figure 3.3a, the receivers are represented by circles and the

transmitters as triangles. The lines represents the DOA measurements. In the corresponding spectra in Figure 3.3b, only one peak can be considered correct even if two peaks can be seen, one at -150 degrees and one at 90 degrees.

3.4 Traditional Method

The traditional method is implemented in Python using the NumPy and SciPy library. From the two MUSIC spectra the two highest peaks are found and ex-tracted as the bearings using scipy.signal.find_peaks[3]. These DOAs are plugged into equations 2.8a and 2.8b and the linear equations are solved with numpy.linalg.solve[1]. The two solutions with two positions each are checked to see if any solution gives a position in the half-plane in which the DOA is pointing, and if so, the solution is ignored. A setup with a solution in the ’wrong’ half-plane can be seen in Figure 3.4a. If all solutions seems reasonable,

(24)

like in the setup seen in Figure 3.4b, the positions given by pairing the largest angle from each receiver, measured counter clockwise relative to the x-axis, is chosen since the output should only be two positions. The largest angles are grouped since this gives the correct solutions in the most cases, only when the placement is such as in Figure 3.4c the result can be incorrect.

X

(a) Tranmitter place-ment that gives a so-lution in the half-plane opposite in which the DOA is pointing

X

(b) Transmitter place-ment that gives all so-lutions in the correct half-plane

X

(c) Transmitter place-ment that gives four valid solutions and pairing the largest angles gives the wrong solution.

Figure 3.4: Images showing different transmitter placements. Circles are

receivers, triangles are correct tranmitter position and crosses are false po-sitions. Solid lines represents the DOAs, while dashed lines are extended behind the receivers.

3.5 Network Construction

The network is divided into two main blocks. The first block takes the two

MU-SIC spectra, PMU 1(θ) and PMU 2(θ), from the receivers as input, and returns the

DOAs at the receivers, αm and βm, much like what the function find_peaks

does. These DOAs are used to generate an image which is the input to the second block, which then estimates the location of the transmitters, (X1, Y1) and (X2, Y2),

relative to the receivers. Both blocks are evaluated independently to ensure that they perform sufficiently well, before connecting them to each other to form the combined network. The network is implemented in Python using the TensorFlow library.

(25)

3.5 Network Construction 15

DOA Network Method A

Layer Output dimension

Input 1 × 720 × 1 Conv2D/ReLU 1 × 718 × 32 MaxPool2D 1 × 239 × 32 Conv2D/ReLU 1 × 237 × 64 MaxPool2D 1 × 79 × 64 Conv2D/ReLU 1 × 77 × 128 Flatten 9856 Dense/Sigmoid 720

Table 3.1:Network architecture of Method A

3.5.1 Direction of Arrival

For the first block, two different methods are considered. The first method (Method A) considers the problem as a classification problem, and the DOAs are chosen from classes representing discrete angles. The second method (Method B) solves the task as a regression problem, where the network estimates two vectors, point-ing from the receiver in the direction of the two DOAs.

Method A: Angle Classification

Method A classifies which two angles out of 720 discrete values, one for each 0.5°

in the interval θ ∈ [−180◦

, 180◦), are most probable given the spectrum. The net-work is a CNN with max pooling, which ends with a sigmoid activation function. The output consists of the probability of 720 different classes. The architecture of the network is shown in Table 3.1.

When trained individually, a weighted binary cross-entropy loss function is used. This loss function is chosen so that the network does not guess false on all classes, since only two classes are considered true from the total of 720. The weight ensures the false negatives increase the loss more than the false positives. Method B: Vector Estimation

In Method B, two unit vectors are estimated, one for each DOA. The network is a CNN with max pooling and batch normalization, which takes the spectrum as input and returns the two vectors. The input spectrum is wrapped so that peaks

close to ±180◦

do not get halved at the edge. The hyperbolic tangent function,

tanh, is used as an activation function since the unit vectors only take values

be-tween -1 and 1. The architecture of the network is shown in Table 3.2.

For the individual training, a mean squared error is taken as the loss function. Here the true value is given as y =cos(θ), sin(θ)T_{, where θ is the true angle.}

(26)

DOA Network Method B

Input 1 × 730 × 1 Conv2D/BatchNorm/ReLU 1 × 721 × 16 Conv2D/BatchNorm/ReLU 1 × 356 × 32 MaxPool2D 1 × 178 × 32 Conv2D/BatchNorm/ReLU 1 × 89 × 64 MaxPool2D 1 × 45 × 64 Conv2D/BatchNorm/ReLU 1 × 23 × 128 MaxPool2D 1 × 12 × 128 Conv2D/BatchNorm/ReLU 1 × 6 × 64 MaxPool2D 1 × 3 × 64 Conv2D/BatchNorm/ReLU 1 × 2 × 32 MaxPool2D 1 × 1 × 32 Conv2D/BatchNorm/ReLU 1 × 1 × 32 Dense/Tanh 1 × 1 × 4 Reshape 2 × 2

Table 3.2:Network architecture of Method B

3.5.2 Image Generation

Using the angles, θi, an image is created. The image has four channels, with each

channel representing a pair of receiver and angle. Each pixel is assigned a value, given by the dot product between the estimated vector and the normalized vector spanned between the receiver and the pixel. One channel of the image can look like the one showed in Figure 3.5a. This four-channel image will be the input in the second block.

3.5.3 Location Estimation

The second block is implemented with inspiration from the network Center-Net, presented in [14]. The network is an hourglass-shaped CNN with batch normal-ization, which takes the four-channel image mentioned above as input and out-puts an image with the same resolution as the input but only one channel. The output image contains a distribution of where the two transmitters are located. On the output layer, a sigmoid activation function is used. The architecture of the second block is shown in Table 3.3. In Table 3.4, the architecture of the com-bined network is shown.

Weighted binary cross-entropy is used as a loss function, to ensure that the net-work does not output only zeros. This loss is used both when trained individually and with the combined network. As the ground truth image used during training, an image where the true positions are marked as a smoothed dot is used. An ex-ample is illustrated in Figure 3.5b. When training individually, the input images are generated from the true angles.

(27)

3.5 Network Construction 17

Positioning Network

Input 112 × 112 × 4 Conv2D/BatchNorm/ReLU 112 × 112 × 16 MaxPool2D 56 × 56 × 16 Conv2D/BatchNorm/ReLU 56 × 56 × 32 MaxPool2D 28 × 28 × 32 Conv2D/BatchNorm/ReLU 28 × 28 × 64 MaxPool2D 14 × 14 × 64 Conv2D/BatchNorm/ReLU 14 × 14 × 128 MaxPool2D 7 × 7 × 128 Conv2DTranspose/BatchNorm/ReLU 14 × 14 × 128 Conv2DTranspose/BatchNorm/ReLU 28 × 28 × 64 Conv2DTranspose/BatchNorm/ReLU 56 × 56 × 32 Conv2DTranspose/BatchNorm/ReLU 112 × 112 × 16 Conv2D/Sigmoid 112 × 112 × 1

Table 3.3:Network architecture for the location estimation.

Combined Network

Layer or network Output dimension

Input 1 × 730 × 2

2 parallel Slice 2 parallel 1 × 730 × 1

2 parallel DOA Network Method B 2 parallel 2 × 2

4 parallel Slice 4 parallel 2 × 1 × 1

4 parallel Tile 4 parallel 2 × 112 × 112

4 parallel Multiply 4 parallel 2 × 112 × 112

4 parallel Reduce_Sum 4 parallel 112 × 112

Stack 112 × 112 × 4

Positioning Network 112 × 112 × 1

Table 3.4: Network architecture for the combined network. DOA Network

Method Band Positioning Network are the networks described in tables 3.2

and 3.3. The Slice layers extracts a slice from the tensor. In the first case it divides the two MUSIC spectra from the Input layer into two individual spectra, and in the second case, divides the 2 × 2 output from DOA Network

Method Binto two separate vectors. Tile is a layer that repeats the input in a

given shape, and is used together with the Multiply and Reduce_Sum layers to perform the dot product described in Section 3.5.2.

(28)

18 3 Deep Learning based Positioning Methodology 0 20 40 60 80 100 0 20 40 60 80 100

(a)Example of a image generated to be used as input to the second block. A dark shade is a low value and a bright shade is a high value given by the dot product described in Section 3.5.2. 0 20 40 60 80 100 0 20 40 60 80 100

(b)Example of an image used as the ground truth during training of the second block.

Figure 3.5: Two differnet kind of images used in the second block of the

network.

Extracting the Position

According to the network in Section 3.5.3, predicted positions are set to the most probable point from the output image given from the network. Extracting the two highest peaks in the image might give two points very close together, in the same cluster. Also a point should not be a single high point in the middle of nowhere, so the peak should not be too small. To remove some smaller peaks and noise, a morphological opening[8] is performed on the image, using the mor-phological function skimage.morphology.opening[5] from the scikit-image library. All local maximum in the image are then found using the function

skimage.feature.peak_local_max[6]. With that function it is possible to

find local peaks with a minimal allowed distance separating peaks. The opening makes the peaks bigger, so to make sure that the two highest peaks are not in the same cluster, the function scipy.ndimage.label [4] gives each peak a label. These labels are used by the function scipy.ndimage.measurments.cen-ter_of_mass[2] which returns the mid point from each peak. Of these mid points, the two highest points are returned as the estimated positions.

(29)

4

Results and Discussion

In this chapter, the results of the proposed deep learning approach are presented. Discussion about the performance when compared to the traditional method is also provided. First, the results of the combined network are presented, followed by the individual networks.

4.1 The Combined Network

The network is trained with 9000 data samples and 1000 data samples are used as validation data. A stochastic gradient descent (SGD) optimizer with learning rate of 0.01 using Nesterov[13] accelerated gradient with a momentum of 0.9. A weighted binary crossentropy with the weight 10 is used as the loss function. The model is trained until the validation loss has not decreased for 10 epochs. The final training and validation loss is 0.0551 and 0.0564, respectively.

4.1.1 Accuracy

The network and the traditional method are evaluated by measuring the euclidean distance between the predicted position and the true position of the transmitter. If the distance is smaller than a given threshold, the prediction is considered to be a hit. Two cases are measured:

• Single: the two transmitters are are evaluated individually

• Pair: the two transmitters are evaluated together, being considered a hit only when both positions are within the chosen distance.

Tables 4.1 and 4.2 presents the accuracy of the traditional and the deep learn-ing method respectively. Both methods are evaluated uslearn-ing 10000 samples

(30)

20 4 Results and Discussion

viously unseen by the network, i.e., these samples were not part of training or validation data. a a a a aa Case Dist. 10 20 40 Single 29.95% 54.04% 79.96% Pair 8.81% 29.0% 63.72%

Table 4.1: Results of the deep learning method. The threshold distance

(Dist.) is expressed in units of length

a a a a aa Case Dist. 10 20 40 Single 88.42% 97.24% 99.14% Pair 79.81% 95.06% 98.34%

Table 4.2: Results of the traditional method. The threshold distance (Dist.)

is expressed in units of length

4.1.2 Output

The following figures show output from each of the two methods, i.e., the deep learning method and the traditional method. For the deep learning method, a probability distribution over where the transmitters are estimated to be is shown. In the distribution, a darker shade of grey represents a higher probability for a transmitter to be present in that location, and lighter shade represents a lower probability. From this probability distribution, the prediction is extracted using the method described in Section 3.5.3, and plotted along with the true positions. For the traditional method the predicted positions are plotted along with the true positions. Figure 4.1 shows outputs from cases when the deep learning method performed well, predicting both positions correctly within a radius of 10 units of length. In figure 4.2 outputs from cases when the deep learning method did not perform well is shown. Figure 4.3 shows cases where the traditional method did not perform well. In the following figures, the x- and y-axis shows the pixel coordinates.

(31)

4.1 The Combined Network 21 0 20 40 60 80 100 0 20 40 60 80 100 Sample 937

(a) Sample 937 - deep learning method 0 20 40 60 80 100 0 20 40 60 80 100 Sample 3912 (b)Sample 3912 - deep learning method 0 20 40 60 80 100 0 20 40 60 80 100 Sample 7320 (c)Sample 7320 - deep learning method 0 20 40 60 80 100 0 20 40 60 80 100 Sample 937 (d)Sample 937 - tradi-tional method 0 20 40 60 80 100 0 20 40 60 80 100 Sample 3912

(e)Sample 3912 - tradi-tional method 0 20 40 60 80 100 0 20 40 60 80 100 Sample 7320 (f)Sample 7320 - tradi-tional method

Figure 4.1: Samples that the deep learning method performs well on.

Fig-ures 4.1a to 4.1c shows the results of the deep learning method and FigFig-ures 4.1d to 4.1f shows the results of the traditional method. A circle represents a true position of a transmitter and a cross represents a predicted position.

(32)

22 4 Results and Discussion 0 20 40 60 80 100 0 20 40 60 80 100 Sample 3858

(a)Sample 3858 - deep learning method 0 20 40 60 80 100 0 20 40 60 80 100 Sample 4112 (b)Sample 4112 - deep learning method 0 20 40 60 80 100 0 20 40 60 80 100 Sample 9971 (c)Sample 9971 - deep learning method 0 20 40 60 80 100 0 20 40 60 80 100 Sample 3858 (d)Sample 3858 - tradi-tional method 0 20 40 60 80 100 0 20 40 60 80 100 Sample 4112

Figure 4.2: Samples that the deep learning method does not perform well

on. Figures 4.2a to 4.2c shows the results of the deep learning method and Figures 4.2d to 4.2f shows the results of the traditional method. A circle represents a true position of a transmitter and a cross represents a predicted position.

(33)

4.1 The Combined Network 23 0 20 40 60 80 100 0 20 40 60 80 100 Sample 3330

Figure 4.3: Samples that the traditional method does not perform well on.

Figures 4.3a to 4.3c shows the results of the deep learning method and Fig-ures 4.3d to 4.3f shows the results of the traditional method. A circle rep-resents a true position of a transmitter and a cross reprep-resents a predicted position.

(34)

4.1.3 Discussion on Combined Network

As observed from Table 4.1 and 4.2, the deep method did not perform well when compared to the traditional method. However, Figures 4.1, 4.2 and 4.3 show that even though the predictions are far from the actual position, the network of-ten indicates a probability in the correct position’s neighbourhood. Even though the setup is simplified, with only two transmitters with some restrictions on placement, the result should be reliable enough draw conclusions regarding this method’s potential. Two transmitters are enough to give ambiguity in the trian-gulation when only two receivers are present, and the amount of data used for the evaluation should be enough to give a statistically valid result.

The deep learning method seems to be working best when the transmitters are separated. Other than that it is hard to find a pattern of when the network per-forms well. The cases where the traditional method did not perform well seems to be when the transmitters are located far away from the receivers, like in Fig-ure 4.3e and 4.3f, or when the ghost points are chosen, like in FigFig-ure 4.3d, as described in Section 3.4. The deep learning method does not seem to perform any different when the transmitters are located far from the receiver than when they are close, other than slightly more elongated clusters. In the case with ghost points the deep learning method seems to be a little bit affected by it, but still predicts the positions closer to the true positions compared to the traditional method.

One special case where the network does not perform well is when the two trans-mitters are positioned close to each other, as shown in Figures 4.2c and 4.3c. This is because the probability distribution becomes one big cluster instead of two in-dividual ones, which are hard to separate for the method described in Section 3.5.3, that extracts positions from the probability distribution. Using another method to extract the predicted position from the output, or altering the current method, may give better predictions. However, morphological operations are expected to reduce noise in an image, and early testing gave better result with morphological operations than without.

4.2 DOA Network

The two methods of finding the DOA are both trained using 100 000 data samples, with 80 000 samples used for training and 20 000 as validation data. Method

A uses an Adam[10] optimizer with learning rate 0.001, β1 = 0.9, β2 = 0.999,

 = 1e − 07 and is trained for 10 epochs. Method B uses a SGD optimizer with

learning rate 0.01 and Nesterov accelerated gradient with a momentum of 0.9. Method B is trained until the validation loss has not decreased for 10 epochs.

(35)

4.2 DOA Network 25

4.2.1 Accuracy

The networks are evaluated in a similar way as described in Section 4.1.1 but the absolute error between the prediction and the true angle is measured instead. The deep learning methods are compared to a traditional method of finding the peaks in the spectra. The results can be seen in tables 4.3 to 4.5. All methods are evaluated using 100 000 samples previously unseen by the networks.

a a a a aa CaseAngle 0.5 1.0 2.0 Single 94.95% 99.28% 99.87% Pair 90.25% 98.82% 99.78%

Table 4.3:Results of Method A. The threshold angle (Angle) is expressed in

degrees a a a a aa CaseAngle 0.5 1.0 2.0 Single 29.99% 55.15% 84.41% Pair 9.56% 31.91% 72.83%

Table 4.4:Results of Method B. The threshold angle (Angle) is expressed in

degrees a a a a aa CaseAngle 0.5 1.0 2.0 Single 96.09% 99.63% 99.99% Pair 94.0% 99.42% 99.99%

Table 4.5:Results of the traditional method. The threshold angle (Angle) is

expressed in degrees

4.2.2 Discussion on DOA Network

As observed in Tables 4.3 and 4.4, the Method A performs a lot better than Method B, by accurately predicting the pairs with 90.25% accuracy, compared with 9.56%. This accuracy might affect the final result, since the accuracies in Table 4.1 and 4.4 both have similar magnitudes. The accuracy that Method A gave suggests that using 100 000 samples for training is enough for getting good accuracy on DOA estimation. Since the custom loss function, used when training Method B, is not used when training the combined network, it is not that loss function that causes the low accuracy for Method B. This then suggest that there is a fault in the architecture that lowers the accuracy.

(36)

The reason Method B was chosen to be used in the combined network is because in early testing it has shown promising results, and since it is not necessarily con-strained by the resolution of 0.5 degrees as Method A. Another thing that should be considered is that method B has a lot fewer trainable parameters which gives a smaller model and faster training. In hindsight, Method A would probably perform better, since Table 4.2 shows that the traditional method performs well even though it is constrained to the 0.5 degree resolution, and Table 4.3 shows that Method A performs better than Method B.

4.3 Position Network

The second block is trained with 10 000 data samples of four channel images, like the one seen in Figure 3.5a, created by the true angles. 9000 samples are used as training data and 1000 as validation data. The optimizer used is a SGD optimizer with learning rate 0.01 and Nesterov accelerated gradient with a momentum of 0.9. The model is trained until the validation loss has not decreased for 10 epochs.

4.3.1 Accuracy

The network is evaluated in the same way as described in Section 4.1.1 and com-pared to the traditional method using the true angles as input. The method is evaluated using 10 000 samples previously unseen by the network. The results can be seen in Tables 4.6 and 4.7.

Table 4.6: Results of the deep learning method. The threshold distance

(Dist.) is expressed in units of length

Table 4.7: Results of the traditional method. The threshold distance (Dist.)

(37)

4.3 Position Network 27

4.3.2 Output

The following figures compare the output from the second block of the network with the traditional method, similar to the analysis in Section 4.1.2. Figure 4.4 shows three sample cases where the network performs well, with both predic-tions within a radius of 10 units of length. Figure 4.5 shows three sample cases where the network does not perform well, with at least one prediction more than 40 units of length away from the true position. Figure 4.6 shows three sample cases where the traditional method predicts wrong.

0 20 40 60 80 100 0 20 40 60 80 100 Sample 2546

Figure 4.4:Samples that the deep learning method performs well on. Figure

4.4a to 4.4c shows the results of the deep learning method and Figure 4.4d to 4.4f shows the results of the traditional method. A circle represents a true position of a transmitter and a cross represents a predicted position.

(38)

28 4 Results and Discussion 0 20 40 60 80 100 0 20 40 60 80 100 Sample 1531

Figure 4.5:Samples that the deep learning method does not perform well on.

Figure 4.5a to 4.5c shows the results of the deep learning method and Figure 4.5d to 4.5f shows the results of the traditional method. A circle represents a true position of a transmitter and a cross represents a predicted position.

(39)

4.3 Position Network 29 0 20 40 60 80 100 0 20 40 60 80 100 Sample 403

(a) Sample 403 - deep learning method 0 20 40 60 80 100 0 20 40 60 80 100 Sample 722 (b) Sample 722 - deep learning method 0 20 40 60 80 100 0 20 40 60 80 100 Sample 3302 (c)Sample 3302 - deep learning method 0 20 40 60 80 100 0 20 40 60 80 100 Sample 403 (d)Sample 403 - tradi-tional method 0 20 40 60 80 100 0 20 40 60 80 100 Sample 722

Figure 4.6: Samples that the traditional method does not perform well on.

Figure 4.6a to 4.6c shows the results of the deep learning method and Figure 4.6d to 4.6f shows the results of the traditional method. A circle represents a true position of a transmitter and a cross represents a predicted position.

(40)

4.3.3 Discussion on Position Network

We observe from Table 4.6, that by considering only the second block of the net-work with correct angles as input, the result show that more than 75% of the predicted pairs are within a radius of 10 units of length. This result also shows that 10 000 samples as training data is a sufficient amount to give good accuracy for the position estimation.

Similar to the combined network, the most difficult case for position estimation block is when two transmitters are positioned close to each other, as discussed in Section 4.1.3. This is shown in Figures 4.5a, 4.5b and 4.5c. When it comes to ghost points the second block seems to be a bit affected by them, but predicts the positions better than the traditional method, as is the case with the combined network. This can be seen Figure 4.6. This may be because the network has learned the structure of the dataset, and how the ghost points mostly are placed relative to the real positions.

(41)

5

Conclusion

This thesis’s primary purpose was to examine whether it is possible to perform positioning using deep learning with MUSIC spectra as input. The results and discussion presented in Chapter 4 with the proposed approach look promising. Still, the deep neural network in the current form needs to be modified to give a more acceptable result, i.e., as compared to traditional methods. This thesis shows that estimating transmitters’ position using deep learning methods is an exciting research direction to explore and leverage the benefits of deep learning techniques.

5.1 Future Work

A few aspects which needs to be improved or fields which require further inves-tigations are:

• Better DOA estimation: From Section 4.2, one can observe that the angle estimation method used in the final network did not perform well. A pos-sible solution to improve the performance could be using the first angle estimation method at the beginning of the network.

• Testing the robustness: The robustness of the network can be analysed by observing the system performance at low SNR, when the transmitters are occluded by each other or have a larger experimental setup than the current 400 by 400 square units. Then trying to improve system specification to perform well under these conditions.

• Extend to tracking: One can extend the proposed approach to track the motion of the transmitters besides static positions. To incorporate the track-ing methods (or develop new algorithms with similar motives) presented in [15] can be utilized, where tracking of objects is studied.

(42)

32 5 Conclusion

• Optimize the network: Different factors that affect a deep neural network’s performance are kernel sizes, number of layers, and hyperparameters. A few examples of hyperparameters include learning rate, momentum, and the weight for the weighted binary cross-entropy loss function. These pa-rameters can be tuned and varied to select the most optimal values, which this thesis did not investigate in detail.

(43)

Bibliography

[1] numpy.linalg.solve. https://numpy.org/doc/stable/reference/ generated/numpy.linalg.solve.html. Accessed: 2020-02-23. [2] scipy.ndimage.center_of_mass. https://docs.scipy.org/doc/ scipy/reference/generated/scipy.ndimage.center_of_mass. html, . Accessed: 2020-02-24. [3] scipy.signal.find_peaks. https://docs.scipy.org/doc/scipy/ reference/generated/scipy.signal.find_peaks.html, . Ac-cessed: 2020-02-23. [4] scipy.ndimage.label. https://docs.scipy.org/doc/scipy/ reference/generated/scipy.ndimage.label.html, . Accessed: 2020-02-24. [5] skimage.morphology.opening. https://scikit-image.org/docs/ dev/api/skimage.morphology.html#skimage.morphology. opening, . Accessed: 2020-02-24. [6] skimage.feature.peak_local_max. https://scikit-image.org/ docs/dev/api/skimage.feature.html#skimage.feature.peak_ local_max, . Accessed: 2020-02-24.

[7] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.

[8] R. M. Haralick, S. R. Sternberg, and X. Zhuang. Image analysis using mathe-matical morphology. IEEE Transactions on Pattern Analysis and Machine In-telligence, PAMI-9(4):532–550, 1987. doi: 10.1109/TPAMI.1987.4767941. [9] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep

network training by reducing internal covariate shift, 2015.

[10] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimiza-tion, 2014.

(44)

34 Bibliography

[11] H. Krim and M. Viberg. Two decades of array signal processing research: the parametric approach. IEEE Signal Processing Magazine, 13(4):67–94, 1996. doi: 10.1109/79.526899.

[12] R. Schmidt. Multiple emitter location and signal parameter estimation.

IEEE Transactions on Antennas and Propagation, 34(3):276–280, 1986. [13] Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. On

the importance of initialization and momentum in deep learning. In San-joy Dasgupta and David McAllester, editors, Proceedings of the 30th In-ternational Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1139–1147, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR. URL http://proceedings.mlr.press/v28/

sutskever13.html.

[14] Xingyi Zhou, Dequan Wang, and Philipp Krähenbühl. Objects as points, 2019.

[15] Xingyi Zhou, Vladlen Koltun, and Philipp Krähenbühl. Tracking objects as points, 2020.

Deep Learning for Positioning with MUSIC

Master of Science Thesis in Electrical Engineering

Department of Electrical Engineering, Linköping University, 2021