• No results found

AUTONOMOUS OBJECT TRACKING WITH DRONES

N/A
N/A
Protected

Academic year: 2021

Share "AUTONOMOUS OBJECT TRACKING WITH DRONES"

Copied!
123
0
0

Loading.... (view fulltext now)

Full text

(1)

AUTONOMOUS OBJECT TRACKING WITH DRONES

by

ERIC VELAZQUEZ MARIN

Information and communication technologies engineering, Universitat Politecnica de Catalunya, 2017

A thesis submitted to the Graduate Faculty of the

University of Colorado Colorado Springs

in partial fulfillment of the

requirements for the degree of

(2)

c 2019

ERIC VELAZQUEZ MARIN

(3)

This thesis for the degree of Master of Science by

Eric Velazquez Marin

has been approved for the

Department of Computer Science

by

Sudhanshu Semwal, Chair

Albert Chamillard

(4)

Velazquez Marin, Eric (M.S., Computer Science)

Autonomous object tracking with drones

Thesis directed by Professor Sudhanshu Semwal

ABSTRACT

We propose an extension of a recent work with convolutional neural networks and drones such as Learning to fly by driving (DroNet)[1] that can possibly safely drive a drone autonomously. In other words, we propose a model that will extend this

work in order to safely track any object with a drone. The combination of (i) the DroNet architecture and weights to apply to CNNs to avoid the crashes; (ii) combining it with DLIB tracker, a correlation implemented tracker based on Danelljan et al.’s paper [2]; (iii) the extraction of descriptors using Speeded Up Robust Features[3]; and (iv) Fast Library for Approximate Nearest Neighbors[4] for

the feature matching – leads a drone to track any object and avoid crashes autonomously without any prior information about the object.

(5)
(6)

ACKNOWLEDGEMENTS

The author wishes to sincerely thank the following individuals: • My advisor, Dr. Semwal, for helping and supporting me. • My parents,sister and girlfriend for all the support. • My friends for the good moments.

(7)

CONTENTS CHAPTER I. INTRODUCTION . . . 1 Hypotheses . . . 1 Research questions . . . 1 Proposed method . . . 2

II. Related work . . . 3

Learning to fly by driving . . . 3

Learning to fly by crashing . . . 4

III. DRONET . . . 7 Neural Networks . . . 7 Layers . . . 9 Convolutional . . . 9 Fully connected . . . 10 Max pool . . . 10 Dropout . . . 10 ReLu . . . 10

(8)

RES block . . . 11

Learning approach and Datasets . . . 12

Drone control . . . 13

Results . . . 13

IV. DLIB tracker . . . 15

DLIB correlation tracker . . . 15

Results . . . 17

Advantages and disadvantages . . . 17

V. Descriptors . . . 19

Feature extraction: SURF . . . 19

Feature matching: FLANN by OpenCv docs . . . 22

Homography . . . 22

VI. Test cases . . . 24

Test cases . . . 24

Human walking no interference . . . 25

Human walking with interference . . . 27

Human walking probability of collision . . . 28

Human walking disappear/appear . . . 28

Human walking sharp turns . . . 28

(9)

Light on/off . . . 28

VII. Implementation . . . 29

How we use DroNet . . . 29

How we use DLIB tracker . . . 30

How we use descriptors . . . 31

How we handle edge cases . . . 35

General edge case. . . 36

No light . . . 38 Main algorithm . . . 38 Hardware . . . 41 Software . . . 44 Drone Control . . . 44 VIII. Results . . . 49

Human walking no interference . . . 49

Human walking probability of collision . . . 53

Human walking with interference . . . 59

Human walking disappear/appear . . . 65

Human walking sharp turns . . . 71

(10)

Difficulties . . . 86

IX. Conclusion . . . 90

REFERENCES . . . 91

APPENDICES A. Main Thread code . . . 93

B. Video Get code . . . 98

C. Descriptors code . . . 99

D. Probability of collision code . . . 102

(11)

LIST OF FIGURES

FIGURE

3.1. DroNet architecture . . . 11

3.2. RES Block architecture . . . 12

4.1. Dlib results[2] . . . 17

5.1. LoG approximation with Box filter . . . 20

5.2. SURF orientation by OpenCv docs . . . 21

5.3. FLANN feature matching by OpenCv docs . . . 22

6.1. Drone following human . . . 25

6.2. Drone following human Drone View. . . 26

6.3. Drone following human . . . 27

7.1. Storing initial descriptors . . . 32

7.2. Storing all other descriptors . . . 33

7.3. Replacing old descriptors . . . 33

(12)

7.7. Pitch direction . . . 45

7.8. Yaw direction . . . 45

7.9. Roll direction . . . 46

7.10. Drone control frame . . . 47

8.1. Simple walk result 1 . . . 50

8.2. Simple walk result 2 . . . 50

8.3. Simple walk result 3 . . . 51

8.4. Simple walk result 4 . . . 51

8.5. Simple walk result 5 . . . 52

8.6. Simple walk result 6 . . . 52

8.7. Simple walk result 7 . . . 53

8.8. Probability of collision 1 . . . 54 8.9. Probability of collision 2 . . . 54 8.10. Probability of collision 3 . . . 55 8.11. Probability of collision 4 . . . 55 8.12. Probability of collision 5 . . . 56 8.13. Probability of collision 6 . . . 56 8.14. Probability of collision 7 . . . 57 8.15. Probability of collision 8 . . . 57 8.16. Probability of collision 9 . . . 58

(13)

8.17. Probability of collision 10 . . . 58

8.18. Walking with interference 1 . . . 59

8.19. Walking with interference 2 . . . 60

8.20. Walking with interference 3 . . . 60

8.21. Walking with interference 4 . . . 61

8.22. Walking with interference 5 . . . 61

8.23. Walking with interference 6 . . . 62

8.24. Walking with interference 7 . . . 62

8.25. Walking with interference 8 . . . 63

8.26. Walking with interference 9 . . . 63

8.27. Walking with interference 10 . . . 64

8.28. Walking with interference 11 . . . 64

8.29. Disappear and appear 1 . . . 65

8.30. Disappear and appear 2 . . . 66

8.31. Disappear and appear 3 . . . 66

8.32. Disappear and appear 4 . . . 67

8.33. Disappear and appear 5 . . . 67

8.34. Disappear and appear 6 . . . 68

(14)

8.37. Disappear and appear 9 . . . 69

8.38. Disappear and appear 10 . . . 70

8.39. Disappear and appear 11 . . . 70

8.40. Sharp turns 1 . . . 71 8.41. Sharp turns 2 . . . 72 8.42. Sharp turns 3 . . . 72 8.43. Sharp turns 4 . . . 73 8.44. Sharp turns 5 . . . 73 8.45. Sharp turns 6 . . . 74 8.46. Sharp turns 7 . . . 74 8.47. Sharp turns 8 . . . 75 8.48. Sharp turns 9 . . . 75 8.49. Sharp turns 10 . . . 76 8.50. Sharp turns 11 . . . 76 8.51. Sharp turns 12 . . . 77 8.52. Low light 1 . . . 78 8.53. Low light 2 . . . 78 8.54. Low light 3 . . . 79 8.55. Low light 4 . . . 79 8.56. Low light 5 . . . 80

(15)

8.57. Low light 6 . . . 80 8.58. Low light 7 . . . 81 8.59. Light on/off 1 . . . 82 8.60. Light on/off 2 . . . 82 8.61. Light on/off 3 . . . 83 8.62. Light on/off 4 . . . 83 8.63. Light on/off 5 . . . 84 8.64. Light on/off 6 . . . 84 8.65. Light on/off 7 . . . 85 8.66. Light on/off 8 . . . 85

8.67. DJI Tello broken by some of the difficulties explained before . . . 87

(16)

LIST OF TABLES

TABLE

3.1. DroNet results in different environments. . . 14

6.1. Test cases. . . 24

(17)

ABBREVIATIONS

UCCS University of Colorado Colorado Springs

UAVs Unmanned of Aerial Vehicles

UASes Unmanned of Aircraft Systems

SURF Speeded Up Robust Features

SIFT Scale Invariant Feature Transform

LoG Laplacian Of Gaussian

FLANN Fast Library for Approximate Nearest Neighbor

DoF Degrees Of Freedom

ROI Region Of Inteerst

(18)

CHAPTER I

INTRODUCTION

A drone, in a technological context, is an unmanned aircraft formally known as unmanned aerial vehicles (UAVs) or unmanned aircraft systems (UASes). Essen-tially, a drone is a flying robot. These flying robots can be controlled or can fly autonomously using neural networks for example working in conjunction with on board sensors, cameras or GPS.

During last few years, the use of drones has increased tremendously. Nowadays there is a huge variety of drones in the market. There are drones for fun and others are focused on research. Our focus is to use Artificial Neural Networks (ANNs) to control the functionality of drones.

State-of-the-art approaches on this topic have already provided models and algorithm to autonomously fly a drone indoor or outdoor avoiding crashes. While such approaches are able to safely fly the drones, they are not using it to provide any other application than just flying the drone. By combining fling with object tracking with drones navigation we are creating a new application for drones in this thesis.

1.1 Hypotheses

Our Hypotheses and research questions are the following:

1.1.1 Research questions

(19)

• Can we combine DroNet and Tensorflow object detection API or/and any computer vision algorithm such as DLIB?

• Can we implement an algorithm in order to follow an object, and at the same time avoid crashes?

• Can we identify a new application object detection using drones?

1.1.2 Proposed method

• We will need to change DroNet algorithm in order to track and follow any object.

• There will be a trade off between the DroNet NN and the Tensorflow API/-computer vision algorithm such as DLIB in order to navigate the drone au-tonomously.

(20)

CHAPTER II

RELATED WORK

In this section, we summarize relevant papers for our work.

2.1 Learning to fly by driving

This paper[1] explained a method to train a drone to autonomously fly through the streets of a city, and collect data about cars and bicycles. It is called Dronet. DroNet is a convolutional neural network (CNN), whose purpose is to reliably drive an autonomous drone through the streets of a city. Trained with data collected by cars and bicycles, this system learns from the collected data to follow basic traffic rules, e.g, do not go off the road, and to safely avoid other pedestrians or obstacles. Surprisingly, the policy learned by DroNet can be generalized. For example, the training extends well and even allows to fly a drone in indoor corridors and parking lots.

Learning approaches predicts a steering angle and a probability of collision from the drone on-board forward-looking camera. These are later converted into control/flying commands which enable a UAV to safely navigate while avoiding obstacles. Since the goal is to reduce the image processing time, this paper advocate a single convolutional neural network (CNN) with a relatively small size. The resulting network is called DroNet. The architecture is partially shared by the two tasks to reduce the networks complexity and processing time, but is then separated into two branches at the very end: Steering prediction is a regression problem, which means that it requires the prediction of a quantity, while collision prediction

(21)

is addressed as a binary classification problem. Due to their different nature and output range, networks last fully-connected layer is separated.

During the training procedure, only imagery recorded by manned vehicles is used. Steering angles are learned from images captured from a car, while probability of collision is learned from images captured from a bicycle.

DroNet system for autonomously navigation is tested on a number of different urban trails including straight paths and sharp curves. Moreover, to test the gen-eralization capabilities of the learned policy, they also performed experiments in indoor environments. They compare the approach against two baselines: Straight line policy and Minimize probability of collision policy.

One of the metric is to use the average distance travelled before stopping or colliding. Results indicate that DroNet is able to drive a UAV for a long time on almost all the selected testing scenarios. The main strengths of the policy learned by DroNet are twofold:

• the platform smoothly follows the road lane while avoiding static obstacles. • the drone is never driven into a collision, even in presence of dynamic

obsta-cles, like pedestrians or bicyobsta-cles, occasionally occluding its path.

2.2 Learning to fly by crashing

In this paper[5], a drone whose goal is to crash into objects is built: it samples random trajectories to crash into random objects. They crash their drone 11,500 times to create one of the biggest UAV crash related data-set. This negative dataset provides scenarios of different ways that a UAV can crash. It also represents the policy of how UAV should NOT fly. They use all this negative data in conjunc-tion with positive data sampled from the same trajectories to learn a simple yet surprisingly powerful policy for UAV navigation. This paper show that a simple

(22)

Parrot Ar-Drone 2.0 is used. This Parrot drone is often run in outdoor en-vironments as a hobby drone. The research is very interesting as no additional sensors/cameras are attached in the flying space during the data collection process.

A two-step procedure for data collection is implemented. First, sample naive trajectories lead to collisions with different kind of objects. Then an annotation procedure for collected trajectories is described. The trajectories are first segmented automatically using the accelerometer data. This step restricts each trajectory up to the time of collision. Finally, trajectories are segmented into positive and negative data classification.

For the Neural network portion, AlexNet architecture uses the ImageNet-pertained weights as initialization for the network. The network learns how to do a simple classification which uses an input image to predict if the drone should move for-ward in straight line or not. Based on the right cropped image, complete image and left cropped image, the network predicts the probability to move in right, straight and left direction respectively. If the straight prediction (P(S)) is greater than alpha, drone moves forward with the yaw proportional to the difference between the right prediction (P(R)) and left prediction (P(L)). Intuitively, based on the confidence predictions of left and right, robot is turned to left or right while moving forward. Models performance is tested using a Straight-line policy, a Depth prediction based policy and a human controlled policy. And to show the generalizability of this method. The method is tested on 6 complex indoor environments: Glass Door, NSH 4th Floor, NSH Entrance, Wean Hall, Hallway and Hallway with Chairs.

To evaluate the performance of different baselines, average distance and average time of flight without collisions is used as the metric of evaluation. This metric also terminates flight runs when drones take small loops (spinning on spot).

On every environment/setting, this method performed better than the depth baseline. The best straight baseline provides an estimate of how difficult the en-vironments are. The human controlled baselines have better results than their method for most environments. However, for some environments like Hallway with Chairs, the presence of cluttered objects makes it difficult for the participants

(23)

to navigate through narrow spaces, and in this case drone method surpasses human level control in this environment.

(24)

CHAPTER III

DRONET

In this Chpter we are going to explain in detail what a neural network is, which kind of layers we can find in DroNet networks and the complete functionality of this network in their correspondent project. DroNet is a very important part of this thesis as it helps the drone to avoid crashes while tracking objects.

3.1 Neural Networks

A Neural Network involves a large number of processors operating in parallel and arranged in tiers. The first layer receives the raw input information and each successive layer receives the output from the layer preceding it, rather than from the raw input. The last layer produces the output of the system. Each processing node has its own small sphere of knowledge, including what it has seen and any rule it was originally programmed with or developed for itself. The layers are highly interconnected, which means each node in layer n will be connected to many nodes in layer n-1 (its inputs) and output of nth later will be input to layer n+1, which provides input for those nodes. There may be one or multiple nodes in the output layer, from which the answer it produces can be read.

Neural networks are notable for beingadaptive, which means they modify themselves as they learn from initial training and subsequent runs provide more information about the world. The most basic learning model is centered on weight-ing the input streams, which is how each node weights the importance of input from each of its predecessors. Inputs that contribute to getting right answers are

(25)

weighted higher.

Typically, a neural network is initially trained, or fed large amounts of data. Training consists of providing input and telling the network what the output should be. For example, to build a network to identify the faces of actors, initial training might be a series of pictures of actors, non-actors, masks, statuary, animal faces and so on. Each input is accompanied by the matching identification, such as actors’ names, ”not actor” or ”not human” information. Providing the answers allows the model to adjust its internal weightings to learn how to do its job better.

In defining the rules and making determinations – that is, each node decides what to send on to the next tier based on its own inputs from the previous tier – neural networks use several principles. These include gradient-based training, fuzzy logic, genetic algorithms and Bayesian methods. They may be given some basic rules about object relationships in the space being modeled. For example, a facial recognition system might be instructed, ”Eyebrows are found above eyes,” or ”moustaches are below a nose. Moustaches are above and/or beside a mouth.” Preloading rules can make training faster and make the model more powerful sooner. But it also builds in assumptions about the nature of the problem space, which may prove to be either irrelevant and unhelpful or incorrect and counterpro-ductive, making the decision about what, if any, rules to build is very important.

Neural networks are sometimes described in terms of their depth, including how many layers they have between input and output, or the model’s so-called hidden layers. They can also be described by the number of hidden nodes the model has or in terms of how many inputs and outputs each node has. Variations on the classic neural-network design allow various forms of forward and backward propagation of information among tiers.

Artificial neural networks were first created as part of the broader research effort around artificial intelligence, and they continue to be important in that space, as

(26)

3.2 Layers

Layer is a general term that applies to a collection of nodes operating together at a specific depth within a neural network. We can divide the layers in three:

• Theinput layer contains your input data.

• Thehidden layer(s)are where theblack magichappens in neural networks. Each layer is trying to learn different aspects about the data by minimizing an error/cost function. The most intuitive way to understand these layers is in the context of ’image recognition’ such as a face. The first layer may learn edge detection, the second may detect eyes, third a nose, etc. The idea is to break the problem up in to components that different levels of abstraction can piece together much like our own brains work. In this case first, second and third layer could either work on the original image or on the output of previous layers.

• The output layer is the simplest, usually consisting of a single output for classification problems

There many different kinds of hidden layers and, as said before, each layer is trying to learn different aspects about the data. DroNet and AlexNet use five different types: convolutional, fully connected, max pool, dropout and ReLu.

3.2.1 Convolutional

There are some main parameters that we can change to modify the behavior of each layer:

• The filter sizes, for example a x by x window where x is the number of pixels (if the input is an image).

• The number of filters is the number of neurons, since each neuron performs a different convolution on the input to the layer (more precisely, the neurons’ input weights form convolution kernels).

(27)

• Thestride is the amount by which the filter shifts.

• The Padding is used to preserve as much information about the original input volume so that we can extract those low level features.Zero padding pads the input volume with zeros around the border.

3.2.2 Fully connected

The output from the convolutional layers represents high-level features in the data. While that output could be flattened and connected to the output layer, adding a fully-connected layer is a (usually) cheap way of learning non-linear combinations of these features. Essentially the convolutional layers are providing a meaningful, low-dimensional, and somewhat invariant feature space, and the fully-connected layer is learning a (possibly non-linear) function in that space.

3.2.3 Max pool

This basically takes a filter and a stride of the same length. It then applies it to the input volume and outputs the maximum number in every sub region that the filter convolves around.

3.2.4 Dropout

This layer drops out a random set of activations in that layer by setting them to zero. It forces the network to be redundant. By that I mean the network should be able to provide the right classification or output for a specific example even if some of the activations are dropped out. It makes sure that the network isnt getting too fitted to the training data and thus helps alleviate the overfitting problem. An important note is that this layer is only used during training, and not during test time.

(28)

layers.

3.3 DroNet Architecture

DroNet[1] is a convolutional neural network which is able to predict the probability of collision and the steering angle in real time having as the input, each frame of the drones camera. It was trained using data collected by cars and bicycles driving through different cities. It can guide a drone through a road following the traffic signs, avoiding obstacles and without going off the road.

To reduce the image processing time, DroNet architecture is shared by the two tasks (probability of collision and steering angle) but then is separated into two branches at the end. Steering angle is a regression problem, which means that requires the prediction of a quantity, and probability of collision is a binary classification problem.

DroNet uses the ResNet-8 architecture plus a dropout layer of 0.5 and a reLu non-linearity. The residual blocks of ResNet were proposed by He et al.[6]. After the ReLu layer, the two tasks dont share more parameters and the architecture splits into two different fully-connected layers. The first output is the steering angle and the second one is the probability of collision.

The input of this convolutional neural network is a 200x200 frame in gray-scale.

Figure 3.1: DroNet architecture

3.3.1 RES block

The RES Block can be interpreted as the following formulas:

• (Continuous line) F(x) = Batch Normalization + ReLu + 3x3 Con + Batch Normalization + ReLu + 3x3 Conv

(29)

• (Dotted line) x = 1x1 conv • (Output) H(x) = F(x) + x

Figure 3.2: RES Block architecture

3.3.2 Learning approach and Datasets

The paper uses mean-squared error (MSE) and binary cross-entropy (BCE) to train the probability of collision and steering angle predictions. They used Adam opti-mizer with a starting learning rate of 0.001 and an exponential per-step decay equal to 105

. They used hard negative mining to optimize those samples which are the most difficult to learn.

(30)

To learn the probability of collision they collected their own dataset by mounting a Go Pro camera on a bicycle and driving through different areas of a city. They started recording when they were far away from the obstacle and stopped once they were about to crash with it. This dataset consists of 32,000 images and they manually annotated the probability of collision as 0 for no collision and 1 for frames very close to obstacles.

3.3.3 Drone control

To control the drone, the paper use the outputs of DroNet. They used probability of collision to modulate the forward velocity and the predicted steering angle to calculate the yaw angle of the drone.

For the forward velocity, the drone fly at a maximal speed when the probability of collision is zero and the drone stops when the probability of collision is close to one. A low-pass filtered version of the modulated velocity to drive the drone smoothly is used.

vk = (1 − α)Vk1+ α(1 − pt)Vmax (3.1) For the steering angle, a [-1, 1] range is mapped to a desired yaw angle in the range [−π/2, π/2] and low-pass filtered using:

ωk= (1 − β)ωk1+ (βπsk/2) (3.2) Depending on the environment the value can be changed as well.

3.3.4 Results

Their system was tested on several different urban trails that included straight paths and sharp curves. The main strengths of their system are:

• The drone is able to follow the road lane while avoiding obstacles

• The done never crashes on scenes which were tested, even in presence of dynamic obstacles.

(31)

In the table 3.1, we can see the average travelled distance before stopping in different environments.

Outdoor 1 Outdoor 2 Outdoor 3 High Altitude (5 m) Corridor Garage

52 m 68 m 245 m 45 m 27 m 50 m

Table 3.1: DroNet results in different environments.

One of the benefit of using DroNet is that it works perfectly in many indoor and outdoor scenes that contain line features but it fails in environments such as forests because of the lack of this features in the data set.

(32)

CHAPTER IV

DLIB TRACKER

In this Chapter details of the tracker is explained. It’s working, main advantages and disadvantages are also explained.

4.1 DLIB correlation tracker

DLIB tracker is a correlation implemented tracker based on Danelljan et al.’s paper, Accurate Scale Estimation for Robust Visual Tracking[2].

Their work is based on MOSSE tracker from Bolme et al.’s work[7], Visual Object Tracking using Adaptive Correlation Filters. The MOSSE tracker works well for objects that are translated but it does not work properly for objects that change in scale.

DLIB tracker uses a scale pyramid to accurately estimate the scale of an object after the optimal translation is found. This helps us to track objects that change in translation but also in scaling throughout a video stream and furthermore in real-time.

The approach works by learning discriminating correlation filters based on scale pyramid representation. They learn separate filters for translation and scale estimation. This scale estimation approach is generic and it can be incorporated into any tracking method with no inherent scale.

The input of their algorithm at time step t is the Image It, the previous tar-get position pt1 and scale st1, the translation model A

trans t1 , B

trans

t1 and the scale modelAscale

t1 , B scale

(33)

the updated translation model Ascale t , B

scale

t and the updated scale model. The translation estimation consists of three steps:

• Extract a translation sample ztransfrom Itat pt1 and st1. • Compute the translation correlation ytransusing ztrans, A

trans

t1 and B trans t1 . • Set ptto the target position that maximizes ytrans.

The steps for the scale estimation are the following: • Extract a scale sample zscalefrom Itat pt1 and st1. • Compute the translation correlation yscaleusing zscale, A

scale

t1 and B scale t1 . • Set stto the target position that maximizes yscale.

And finally, the steps for the model update are the following: • Extract samples ftransand fscalefrom Itat ptand st.

• Update the translation model Atrans

t and B

trans t . • Update the scale model Ascale

t and B

scale t ..

Incorporating scale estimation makes the computational cost become much higher and their goal was to have an accurate and robust scale estimation approach and at the same time computationally efficient. Their method propose a fast scale estimation approach by learning separate filters for translation and scale. This restricts the search area to smaller parts of the scale space.

In order to estimate the scale of the target in an image, they use a separate I-dimensional filter. The training example f for updating the scale filter is computed by extracting features using variable patch sizes centered around the target.

(34)

4.2 Results

The results of their work can be seen in the following image:

Figure 4.1: Dlib results[2]

In red we have their method result, in red we have ASLA tracker[8], SCM tracker[9] is the blue one, Struck[10] is the yellow one and in orange we have the LSHT tracker[11].

As we can see, their method works really good for scaling and translation while other trackers dont. It is also faster. It is 2.5 times faster than Struck, 25 times faster than ASLA and 250 times faster than SCM in median FPS.

4.3 Advantages and disadvantages

Obviously, the main advantage of using this correlation tracker to our thesis is that it will track our selected target and it will compute the translation and the scale of it. This will allow us to know either the object is moving left or right and furthermore we will be able to know if the tracking object is moving closer or further from us by just computing the area of the tracking object.

While this algorithm helps us in a very prominent and efficient way it has a very big disadvantage which could make our algorithm fail. This disadvantage occurs

(35)

when the object we are following, completely disappears from the camera field and then it appears again. As this tracker is based in the correlation between frames, it will not be able to detect and track the same object again.

To address this problem, our project uses descriptors and key points of the tracked object so that when the object we are tracking disappears for a few frames, the algorithm will be able to find it again. This feature will be explained and discussed in the next chapter.

(36)

CHAPTER V

DESCRIPTORS

In this fifth chapter we are going to address the main problem we faced when incor-porating an algorithm with the DLIB tracker explained in the previous Chapter. This problem was the loss of the tracking, when the object being tracked, disappeared from the camera and then appeared again.

To handle this problem, we use the feature extraction SURF to extract the main features of the object we are tracking while it is in the frame and then we use the feature matching FLANN to re-detect this object again.

5.1 Feature extraction: SURF

Speeded Up Robust Features (SURF) is a feature extraction algorithm presented in 2006 by Bay et al. [3]. It is a faster version of SIFT algorithm[12].

While Scale Invariant Feature Transform (SIFT) uses Lowe approximated Lapla-cian of Gaussian with Difference of Gaussian for finding scale-space, SURF approxi-mates LoG with Box Filter. The main advantage is that convolution with box filter can be easily calculated with help of integral images and it can be done in parallel for different scales. Surf rely on determinant of Hessian matrix for both scale and location.

(37)

Figure 5.1: LoG approximation with Box filter

For orientation, SURF uses wavelet responses in horizontal and vertical direction for a neighborhood of size 6s where s is the scale at which the point of interest was detected. Gaussian weights are also applied to it. The dominant orientation is estimated by calculating the sum of all responses within a sliding orientation window of angle 60 degrees.

(38)

Figure 5.2: SURF orientation by OpenCv docs

For feature description, this algorithm uses Wavelet responses in horizontal and vertical direction. A neighborhood of size 20sX20s is taken around the key-point where s is the size. It is divided into 4x4 sub-regions. For each sub-region, horizontal and vertical wavelet responses are taken and a vector is formed. This when represented as a vector gives SURF feature descriptor with total 64 dimensions. Lower the dimension, higher is the speed of computation and matching, which provide better distinctiveness of features.

Another important improvement is the use of sign of Laplacian (trace of Hessian Matrix) for underlying interest point. It adds no computation cost since it is already computed during detection. The sign of the Laplacian distinguishes bright blobs on dark backgrounds from the reverse situation. In the matching stage, we only compare features if they have the same type of contrast. This minimal information allows for faster matching, without reducing the descriptor’s performance.

SURF adds a lot of features to improve the speed in every step. Analysis shows it is 3 times faster than SIFT while performance is comparable to SIFT. SURF is good at handling images with blurring and rotation, but not good at handling viewpoint change and illumination change.

(39)

5.2 Feature matching: FLANN by OpenCv docs

Fast Library for Approximate Nearest Neighbors (FLANN) contains a collection of algorithms optimized for fast nearest neighbor search in large datasets, and for high dimensional features.

FLANN needs two dictionaries which specifies the algorithm to be used and its related parameters. For SURF algorithm, it is needed to specify the number of times the trees in the index should be recursively traversed. Higher values give better precision, but also takes more time.

Figure 5.3: FLANN feature matching by OpenCv docs

5.3 Homography

The perspective transformation of any plane which is a re-projection of that plane but from a different point of view (with different position and/or orientation) is called homography.

(40)

Homography relates the transformation between two planes (up to a scale factor). The homography matrix is a 3x3matrix but with 8 DoF (degrees of freedom) as it is estimated up to a scale. h x’ y’ 1 i =hH i *hx y 1 i =      h11 h12 h13 h21 h22 h23 h31 h32 h33      *hx y 1 i

(41)

CHAPTER VI

TEST CASES

In this chapter, we are going to list and classify by their difficulty the test cases that our algorithm is going to face. These test cases are not going to be just trivial cases, in fact we are going to take our system to the limit to know what are its limitations.

6.1 Test cases

The test cases we are going to test are going to follow three different things/objects: a human, a bike and a car. Each of the test cases are going to have different situations which its difficulty will have a range between easy, medium, hard and extreme as explained in the following table:

Situation Difficulty

Human walking no interference Easy Human walking prob collision Hard Human walking with interference Medium Human walking disappear/appear Extreme

Human walking sharp turns Hard

Low light Medium

Light on/off Hard

(42)

6.2 Human walking no interference

This is the most basic test case. We want our drone to follow a human while he is walking and there are no interference in the tracking. By that, we mean that there are no objects crossing between the drone and the human.

(43)
(44)

Figure 6.3: Drone following human

6.3 Human walking with interference

This test case is the same as the last test case but with objects/persons crossing between the drone and the human. Having this interference will force our algorithm to re find the human if the obstacles occlude him.

(45)

6.4 Human walking probability of collision

We want to test if our drone stops moving when the probability of collision is really high and if it starts tracking again when this probability of collision becomes lower.

6.5 Human walking disappear/appear

When the tracking object disappear and appears again we ant to make sure we recognize it as the object we were following before. We wan to test how we apply the descriptors algorithm against this situation.

6.6 Human walking sharp turns

Sharp turns are really difficult to overcome. The sharper the turn is the more challenging is to follow the tracking object.

6.7 Low light

We also want to test our algorithm again adverse conditions and one of them is the low ambient light. The tracker will need to track the human with low light and this could cause some problems.

6.8 Light on/off

Turning the light off could cause a lot of problems calculating the probability of collision, tracking the object and saving descriptors. We want to test our system against these difficulties.

(46)

CHAPTER VII

IMPLEMENTATION

This Chapter explain how we have built our system by joining all its parts: DroNet, DLIB tracker and descriptors. We also want to explain how we provided a novel solution to the some edge cases, and which hardware and software we have used.

7.1 How we use DroNet

As explained in Chapter 2, DroNet is a neural network which is able to predict the probability of collision and the steering angle in real time by having each frame of the video camera as the input. The most useful feature that we have taken profit from DroNet is the probability of collision because the steering angle must cause some bad prediction. This is caused because DroNet is trained to follow roads and not objects, by that we mean that the steering angle will follow the road direction and will not follow the object if the object tries to get off-road. Knowing that we have used the probability of collision as main feature of our project. we are proposing that if the probability of collision is higher than a threshold, the drone should stop immediately. If this probability of collision is lower, then the velocity of the drone will not be calculated based on that probability but based on the velocity of the object we are following. This is a very different feature than the one used in DroNet but since our objectives are very different, our proposal is the best way to add their work into our project.

(47)

Algorithm 1DroNet usage pseudo code WHILE TRUE{

frame = drone camera ( )

p r o b a b i l i t y o f c o l l i s i o n = DroNet ( frame )

WHILE p r o b a b i l i t y o f c o l l i s i o n > THRESHOLD PROB { STOP Drone movement

frame = drone camera ( )

p r o b a b i l i t y o f c o l l i s i o n = DroNet ( frame ) }

DRIVE Drone }

While the probability of collision is higher than the T HRESHOLDPROB, the drone will be stopped at the same position. This is because our goal is to follow anything, for example a person. If we are behind this person, following him, we will not have any collision if he has no collision either but we could have a collision if something crosses between the drone and the person. In this case the drone will stop and once this object has crossed, the drone will restart following the person again.

7.2 How we use DLIB tracker

The DLIB tracker, as explained in Chapter 3, is a correlation algorithm. So this algorithm does not know what is exactly following and does not have memory. This means that if for example, the light goes off and on, the tracker will not know what to follow and it will get lost.

(48)

Algorithm 1.

If the confidence of the tracker is lower than a threshold, then we will mark our tracking object as lost and we will start trying to find it again. If the confidence of the tracker is high enough, then we will move the drone as the object is moving.

Algorithm 2DLIB tracker usage pseudo code

WHILE TRUE {

frame = drone camera ( ) c o n f i d e n c e = DLIB ( frame ) I F c o n f i d e n c e < THRESHOLD TRACK { T r a c k i n g o b j e c t = Lost DRONE t r y t o RE−FIND o b j e c t } ELSE{ T r a c k i n g o b j e c t = Found

DRONE MOVES FOLLOWING T r a c k i n g o b j e c t }

}

As we are going to follow every object from the same approximate distance, we know exactly which is the distance that separates the drone and the tracking object. We can also, easily know, before losing the object what was the direction it was moving (right or left). By knowing that, we could re-find our object again if our tracker would know what was the object we were following. As our tracker does not know it, we use descriptors and key points to be able to re find the lost object.

7.3 How we use descriptors

We extract features, called descriptors, of the object we are tracking using the algorithms explained in Chapter 4. We use this descriptor to re-find the object we were tracking. The basic idea is that we create a buffer of size N where we store

(49)

these descriptors while we are tracking the object and the confidence of the tracker is high. Once the object has been lost, we use this buffer to check descriptor by descriptor if someone matches with what the drone is capturing.

The main point of using a buffer, instead of a single variable is that we want to store several descriptors of the object we are following in case this object changes its form during the following process (due to moving faster or for example sitting down etc). Imagine we are following a person from the back, but then this person turns 180 degrees and he is facing the drone, then we want the descriptors of the person facing the drone and not his back. To be able to do that, our buffer has a fixed size N and every position has a different descriptor value. This value is overwritten once the buffer is full. The only position we do not overwrite is the first position of the buffer where we store the descriptors of the Region of Interest (ROI) of the first frame we used to track because the first ROI will have for sure the descriptors of the object we want to follow without any kind of occlusion.

Imagine the size of this frame is 5, then the workflow will be the following: 1. First, we select the ROI we want to start tracking and we save its descriptors

in the firs position of the buffer:

Figure 7.1: Storing initial descriptors

2. Once we start tracking, we keep saving the descriptors of the ROI until the buffer is full:

(50)

Figure 7.2: Storing all other descriptors

3. Once the buffer is empty, we start replacing the oldest descriptor with the newest ones:

Figure 7.3: Replacing old descriptors

(51)

is always the same size. We also estimate the newest descriptors and keep the initials descriptors because they are expected to be the most accurate ones. As the drone is sending up to 30 frames per second (fps) and the object we are pursuing will rarely move or change its form that fast we just store 2 descriptors per second. Using this technique, our algorithm becomes more efficient and we need less space to store the buffer. The algorithm of storing descriptors is the following:

Algorithm 3Saving descriptors usage pseudo code

B u f f e r s i z e = N

frame = drone camera ( )

Once ROI s e l e c t e d from Frame −> Save d e s c r i p t o r s a t p o s i t i o n 0 S t a r t t r a c k i n g {

Frame = Drone camera c o n f i d e n c e = DLIB ( Frame ) POS = 1

I F c o n f i d e n c e > THRESHOLD TRACK {

Save d e s c r i p t o r s o f ROI a t p o s i t i o n POS o f t h e b u f f e r I F POS == N { POS = 1}

ELSE { POS = POS + 1} }

}

Saving these descriptors in a good and efficient way is really important to be able to re-find our object once it has been lost. Once the object is lost, our algorithm tries to re-find the object by using the descriptors stored in the buffer position by position and starting at position 0 which is where the initial descriptors were stored. If the object is found, then we start tracking again the object using the DLIB tracker.

(52)

Algorithm 4Using descriptors to re-find objects usage pseudo code frame = drone camera ( )

A c t u a l d e s c r i p t o r = d e s c r i p t o r ( Frame ) Found = F a l s e FOR d e s c r i p t o r i n BUFFER { I f a c t u a l d e s c r i p t o r matches d e s c r i p t o r { Found = True BREAK } } I F Found == True { Homography = H( a c t u a l d e s c r i p t o r , d e s c r i p t o r ) O b j e c t i n f r a m e = Homography ( frame ) S t a r t t r a c k i n g DLIB ( O b j e c t i n f r a m e ) }

The way to handle the situation when we cannot find the object again will be explained in the next section because it depends on the edge cases as will be explained in the following Chapter.

7.4 How we handle edge cases

The most common edge cases our system could face are: • No light

• Occlusions • Very sharp turns

(53)

7.4.1 General edge case

As we cannot distinguish each of these cases while we are flying, we have imple-mented a general algorithm to be able to re-find the object. As we previously know the distance between the object and the drone and also the last direction that our object moved, the RE-FIND object algorithm looks like this:

(54)

Algorithm 5General edge cases usage pseudo code D i s t a n c e f r o m o b j e c t = D ( meters )

L a s t d i r e c t i o n = Right/ L e f t

WHILE DRONE MOVES D ( meters ) s t r a i g h t { I F o b j e c t found { S t a r t t r a c k i n g again BREAK } } DRONE TURNS L a s t d i r e c t i o n I F o b j e c t found { S t a r t t r a c k i n g again BREAK } ELSE {

Keep r o t a t i n g u n t i l o b j e c t found or 360 degrees I F o b j e c t found { S t a r t t r a c k i n g again BREAK } ELSE { LAND Drone STOP } }

Using this algorithm, the drone will be expected to first move straight D meters and then turn either left or right based on that last direction the object moved to find the it again. If the object has not been found, then the drone will make a 360

(55)

degrees rotation movement to try to find the object again. If the object has not been found, then the drone will land and the program will be stopped.

7.4.2 No light

When there is no light at all, our tracking algorithm confidence value is NaN (Not a Number). Therefore, our implementation is to set the probability of collision to 1 when the tracking algorithm value is NaN. Using this approach, our drone will stop when the light is off and then re-start the tracking when the light is on.

Algorithm 6No light pseudo code

WHILE TRUE{

Frame = Drone camera c o n f i d e n c e = DLIB ( Frame ) I F c o n f i d e n c e == NaN { STOP drone SET p r o b a b i l i t y o f c o l l i s i o n = 1 } ELSE { S t a r t t r a c k i n g again } } 7.5 Main algorithm

To control the drone, we have to combine all the algorithms we have explained before into one single algorithm. The most relevant parameter we have to check is

(56)

consideration is the DLIB tracker confidence. If this confidence is higher than the threshold, then we will follow the object while we save the correspondent descriptors. If it is lower than the threshold, then we will stop saving descriptors and we will start to find the object again. If it is found, then we will start the tracking movements again, if it is not after completing the edge-cases movements, we will land the drone and stop the program.

(57)

Algorithm 7Drone control pseudo code

D i s t a n c e f r o m o b j e c t = D ( meters ) L a s t d i r e c t i o n = Right/ L e f t

WHILE True{

Frame = Drone camera

P r o b a b i l i t y o f c o l l i s i o n = DroNet ( Frame ) I F P r o b a b i l i t y o f c o l l i s i o n > THRESHOLD PROB{

STOP DRONE MOVEMENT }

ELSE{

Confidence = DLIB ( Frame ) I F c o n f i d e n c e == NaN { STOP drone

SET p r o b a b i l i t y o f c o l l i s i o n = 1 }

ELIF Confidence < THRESHOLD TRACK { T r a c k i n g o b j e c t = Lost

WHILE T r a c k i n g o b j e c t == Lost {

WHILE DRONE MOVES D ( meters ) s t r a i g h t { f i n d d e s c r i p t o r s ( ) I F o b j e c t found { T r a c k i n g o b j e c t = Found BREAK } } DRONE TURNS L a s t d i r e c t i o n f i n d d e s c r i p t o r s ( ) I F o b j e c t found { T r a c k i n g o b j e c t = Found BREAK } ELSE {

Keep r o t a t i n g u n t i l o b j e c t found or 360 degrees f i n d d e s c r i p t o r s ( ) I F o b j e c t found { T r a c k i n g o b j e c t = Found BREAK } ELSE { LAND DRONE STOP } } } } ELSE{ T r a c k i n g o b j e c t = Found

DRONE MOVES FOLLOWING T r a c k i n g o b j e c t L a s t d i r e c t i o n = Right/ L e f t

(58)

There are, also, some parameters we would like to change when we follow different objects. It is not the same to follow a human than a bike or even a car. These parameters are going to be initialized by the user once he knows what the user wants to follow. This parameters are the following:

• Distance between the drone and the object. • Height of the drone.

• Velocity of the drone.

7.6 Hardware

In this project, we have used two different drones: Parrot AR 2.0 and DJI Tello. The election of these drones has been made based on which drones were available at the time of research at the university, and the Python API available for these drones.

The Parrot AR 2.0 is relatively cheap, with a good quality and robust. It is from one of the more popular brand of drones, Parrot, and it is one of the most used drones for research. The most important specifications for the Parrot AR 2.0 drones are:

• Weight: 4 lbs.

• Dimensions: 23 x 0.5 x 23 inches. • Max flight time: 12 minutes. • Battery: 1500 mAh LiPo.

(59)

Figure 7.4: AR Parrot 2.0

The DJI Tello is a newer and smaller drone but it is also cheap and robust. As it is newer, the implementation of the python API is more efficient and more reliable. The most important specifications for the DJI Tello drones are:

• Weight: 0.2 lbs.

• Dimensions: 3.86 x 1.61 x 3.64 inches. • Max flight time: 13 minutes.

• Battery: 1100 mAh LiPo.

(60)
(61)

Figure 7.6: DJI Tello 2

7.7 Software

We have programmed everything using Python and PyCharm as the IDE. The libraries we have used for this project are the following:

• PS Drone: AR Parrot 2.0 python API. • DJITelloPy: DJI Tello python API.

• OpenCV: used for detecting descriptors. • DLIB: main tracker API.

• Tensorflow: used to import DroNet into our project.

7.8 Drone Control

The drone has three axes to control: Pitch (Y axis), Yaw (Z axis) and Roll (X axis). If the aircraft rotates around the Pitch axis it will move in the X axis direction. If the

(62)

If the pitch value is positive, the drone will move forward and if it is negative the drone will move backwards as we can see in the following image.

Figure 7.7: Pitch direction

If the yaw value is positive, the drone will rotate in clockwise direction on itself and if the yaw is negative, the drone will rotate counter clockwise on itself.

Figure 7.8: Yaw direction

If the roll value is positive, the drone will move right and if it is negative the drone will move left as we can see in the following image.

(63)

Figure 7.9: Roll direction

There is another drone variable which has to be controlled, the throttle. This controls the aircraft’s average thrust from its propulsion system. When the aircraft is level, adjusting the throttle will move the aircraft up or down as all the thrust is in the vertical direction. However, when the aircraft is not level (has non-zero pitch or roll), the thrust will have a horizontal component, and therefore the aircraft will move some horizontally. A larger pitch or roll angle will result in more horizontal thrust and therefore faster horizontal movement.

To control this axes of the drone autonomously, what we want is to have always our tracking object in the middle of the frame.

(64)

Figure 7.10: Drone control frame

If the object we are tracking is in the green zone of the frame, we will not move the drone. If the object is in any blue zone, we will set the throttle of the drone to either go up or down. We always set the roll angle to 0 as we do not want the drone to move right or left but in case the object is in any white space, we will set the yaw in order to rotate the drone to the correct direction. To be able to set the drone’s pitch to move the drone forward or backward, what we do is to calculate the area of the tracking object at the beginning of the tracking and if this area increases, we move the drone backwards because it means that the object is approaching the drone, if this area decreases, we move the drone forward because the tracking object is moving further.

(65)

and the distance between the drone and the object has to be defined by the user in order to be able to run the re-finding algorithm when the object is lost.

(66)

CHAPTER VIII

RESULTS

In the following table we can see each test case and its result.

Situation Result

Human walking no interference Passed Human walking prob collision Passed Human walking with interference Passed Human walking disappear/appear Passed Human walking sharp turns Passed

Low light Not tested with drone but with PC web-cam

Light on/off Not tested with drone but with PC web-cam

Table 8.1: Test cases results

8.1 Human walking no interference

When following a human walking with out interference, our system worked as expected and followed the human moving forward/backwards, right/left or up/-down. As it is the simplest case we are testing there were no extra difficulties.

(67)

Figure 8.1: Simple walk result 1

(68)

Figure 8.3: Simple walk result 3

(69)

Figure 8.5: Simple walk result 5

(70)

Figure 8.7: Simple walk result 7

As we can see in the images, the drone tries to have always the person who is following in the middle of the frame. In this test case, the descriptors did not do anything because the object was never lost.

8.2 Human walking probability of collision

When the probability of collision is higher than the threshold, the drone stops completely until this probability of collision becomes lower.

(71)

Figure 8.8: Probability of collision 1

(72)

Figure 8.10: Probability of collision 3

(73)

Figure 8.12: Probability of collision 5

(74)

Figure 8.14: Probability of collision 7

(75)

Figure 8.16: Probability of collision 9

(76)

find again the person using the previous descriptors and once the person is found by the drone, it starts following and tracking again.

8.3 Human walking with interference

When we add interference to our tests, we have to differ to different types of interference: partially occluding or completely occluding.

If the human that the drone is following is partially occluded, the tracking algorithm will work as expected and the behaviour will be the same as if there were no interference. On the other hand, if the human is totally occluded, the tracking algorithm will lose the target and the drone will start doing the General edge case algorithm. Once the complete occlusion has finished, the tracking and descriptor algorithms will find the target again.

(77)

Figure 8.19: Walking with interference 2

(78)

Figure 8.21: Walking with interference 4

(79)

Figure 8.23: Walking with interference 6

(80)

Figure 8.25: Walking with interference 8

(81)

Figure 8.27: Walking with interference 10

(82)

again.

8.4 Human walking disappear/appear

When the human disappears, the tracking algorithm will lose the target and the drone will start doing the General edge case algorithm. If the human appears again before the drone has moved the General edge case algorithm movements, the tracking and descriptor algorithms will find the target again, otherwise, it will land and stop the program.

(83)

Figure 8.30: Disappear and appear 2

(84)

Figure 8.32: Disappear and appear 4

(85)

Figure 8.34: Disappear and appear 6

(86)

Figure 8.36: Disappear and appear 8

(87)

Figure 8.38: Disappear and appear 10

(88)

tracking again.

8.5 Human walking sharp turns

Doing a sharp turn will make our drone lose the tracking object for a moment and it will start doing the General edge case algorithm. Once the drone turns the last direction the person has turned, it will re-find it again and therefore start tracking it again.

(89)

Figure 8.41: Sharp turns 2

(90)

Figure 8.43: Sharp turns 4

(91)

Figure 8.45: Sharp turns 6

(92)

Figure 8.47: Sharp turns 8

(93)

Figure 8.49: Sharp turns 10

(94)

Figure 8.51: Sharp turns 12

As we can see in frames 5,6,7,8 and 9, the drone is making the General edge case algorithm. It is moving forward the distance between the person and the drone and then is turning in the last direction the person moved. Once it finds the person again using descriptors, it starts tracking and following him again.

8.6 Low light

One of the strength of our system is that we keep saving the newest the descriptors every x time. This make our algorithm adapt to different light conditions. As we are in a low light ambient, the descriptors will be saved in that ambient and therefore our target will not be easily lost and the algorithm is more robust compared to when our algorithm was not used.

For some problems with the drones, explained in the conclusion we were not able to test this feature with a drone and we have tested this test case using the laptop camera which has similar specifications than the drone’s camera.

(95)

Figure 8.52: Low light 1

(96)

Figure 8.54: Low light 3

(97)

Figure 8.56: Low light 5

(98)

Figure 8.58: Low light 7

As we can see, the ambient light is really low. In frame 1 we selected the object we want to follow, in this case it was a phone case. In frames 2,3,4 we can see how our algorithm is able to track the object correctly. In frame 5, we hide our object from the camera and we can see how the tracking square disappears as well. In frame 6, we put again the object in front of the camera and the descriptors start matching and in frame 7 the object is detected and tracked again.

8.7 Light on/off

This was one of the most difficult test cases to face. The tracking algorithm returns a NaN value when the light is off. Therefore, following our implementation of the No light algorithm, we set the probability of collision to 1 when the tracking algorithm value is NaN. Using this approach, we have our drone stopped when the light is off and then re-starts the tracking when the light is on.

For some problems with the drones, explained in the next section we were not able to test this feature with a drone and we have tested this test case using the laptop camera which has similar specifications than the drone’s camera.

(99)

Figure 8.59: Light on/off 1

(100)

Figure 8.61: Light on/off 3

(101)

Figure 8.63: Light on/off 5

(102)

Figure 8.65: Light on/off 7

Figure 8.66: Light on/off 8

In frame 1 we select the object that needs to be followed, in this case it was a phone case. Then in frame 2,3 and 4 we are tracking the object with the light on. In

(103)

frame 5 with turn off the light and then we turn on the light again in frame 6. In frame 6 is when the descriptors algorithm is trying to re-find the object and as we can see in frames 7 and 8, the object is found and so it is tracked again.

8.8 Difficulties

While implementing the project there are several difficulties that made very difficult to test our work or even implement some algorithms. These difficulties were external to us and we could not do anything to solve them. In the following list we explain the main difficulties we had:

• Colorado’s weather. This thesis has been implemented and tested in Colorado during winter time and the snow and wind conditions have made it impossible to test our work outdoors. Luckily our focus was on indoor tracking using drones.

• Wind. If the weather is a little windy, the drone reacted by being in a ”safe mode” condition. This made the drone counteract the wind to always be at the same point without moving. This behaviour made the drone to not respond to the orders sent by the computer to follow the tracking object.

• Flying indoor was the best solution to the last two points but it had some limitations too. We could not test the drone anywhere we wanted. We needed a big and high space. There are a lot of restrictions for flying drones indoors at UCCS and a lot of paperwork to do if you want to test your drone indoors. An IRB study was not in the scope of this thesis.

• The APIs do not work perfectly. We have had a huge variety of problems with these APIs. The most common one is that the order sent by the computer makes no effect on the drone because the drone does not receive commands

(104)

• The official DJI Tello API was released once we were in half way of our work. When we tried to add this API to our work, the weird behaviour of the drone persisted and furthermore, the official API caused a huge lag as each frame is processed. We decided to keep working with the previous API which had worked well for us.

• When testing with the official API, the drone started taking off really high (around 70 feet). This made the drone hit the roof several times and, as a consequence, some of the propellers were broken.

Figure 8.67: DJI Tello broken by some of the difficulties explained before

(105)

error did not allow us to even fly the drone. This error has delayed our testing and therefore there are some tests cases we have not been able to test.

Figure 8.68: DJI Tello error 203

• Low battery. When the drone has low battery 40/100 or less, the drone starts to behave very strangely and making random movements.

• As the DLIB tracker is a correlation tracker, sometimes if the tracking object makes a sudden movement, the tracker has some false ”objects” to track and it cause our system to be lost because the confidence is still high enough but the object that it is tracking is not the object selected by the user.

Due to some problems with the DJI Tello, we had to send the drone to the DJI factory. While they were repairing the drone, we tested our algorithm with the AR Parrot 2.0 and we noticed that this drone had two main issues that could break our algorithm.

When setting any of the respective velocities to the drone (throttle, pitch and yaw) to 0 it did not stop the drone movement in that direction (as it happened with the DJI Telo). Setting the velocity to 0 made the drone stop accelerating but glide in the same direction until the air friction stopped it.

(106)

the drone is moving forward and right directions at the same time and we want it to stop going right but keep moving forward this function would stop both directions.

Our approach to this problem was to give a negative impulse to the drone every time it wanted to stop going in any direction. For instance, if the drone was moving forward and right direction and we just wanted to stop the right movement, we sent an impulse into the left direction while the drone kept moving forward. Using this technique, the drone was able to stop any movement without causing an effect in the other movements’ direction.

The second issue this drone had was that when turning 360 degrees in the Edge case algorithm, the DJI Tello turned 360 degrees around itself, but this drone did not do that. This drone when making a 360 degree turn, turns around a radius. This issue made the drone not finding the object in some cases when it was lost as it was happening with the DJI Tello.

(107)

CHAPTER IX

CONCLUSION

In this thesis we proposed new way to track and follow objects in real time with no previous information about that object before. This mean that the user selects the object to track in real time. As the real time feature and the lack of delay between what the user is seeing an what the drone is doing were some of our priority goals, we used a multi-threading approach in order to implement our algorithm.

We found a new application to apply DroNet and the probability of collision and we could not use the other DroNet’s output, the steering angle, because it was not working as expected in most of the cases when we tried the existing algorithm.

We have also introduced a new technique on how to use features descriptors. Storing descriptors in a buffer and then use this information in case that the object is lost is new and efficient way to re-find lost objects.

There has been a huge trade off between the tacker and the probability of collision. As our primary requirement was not to crash the drone, no matter what, the probability of collision predominates the control of the drone and this decision could lead to lose the tracking object in some cases.

As we have tested our application with three different drones, whose price is around 100 dollars, we can affirm that there is a need to improve this kind of drones and of course their Python APIs in order to make it easier for further research on drone.

References

Related documents

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

Coad (2007) presenterar resultat som indikerar att små företag inom tillverkningsindustrin i Frankrike generellt kännetecknas av att tillväxten är negativt korrelerad över

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

Keywords: Building Information Modeling (BIM), Drones, BIM-Drones Solution, Organizational Change Forces, Construction Productivity, Construction

Industrial Emissions Directive, supplemented by horizontal legislation (e.g., Framework Directives on Waste and Water, Emissions Trading System, etc) and guidance on operating