Autonomous Vehicle with Obstacle Identification and Detection

(1)

V¨

aster˚

as, Sweden

DVA502 Thesis for the Degree of Master of Science in Engineering

-Robotics 30.0 credits

AUTONOMOUS VEHICLE WITH

OBSTACLE IDENTIFICATION AND

DETECTION

Weronica Kovala

wrg10002@student.mdh.se

Examiner: Baran Curuklu

M¨

alardalen University, V¨

aster˚

as, Sweden

Supervisors: Martin Ekstr¨

om

M¨

alardalen University, V¨

aster˚

as, Sweden

Company supervisor: Martin Rydberg,

˚

AF Technology, V¨

aster˚

as, Sweden

(2)

Abstract

In order to allow hospital staff to spend a bigger part of their time with actual patients, this thesis has focused on how to implement an autonomous robot to do some of their work. The robot is made to navigate in a hospital environment while pulling a hospital bed as well as avoiding obstacles and humans. A navigation algorithm has been implemented together with a human detection. The prototype is able to navigate in the test environment, an office landscape, in a direction decided by the user whilst avoiding different obstacles. Since a low cost would make it easier to implement the solution in an actual hospital environment, a low cost has been an aim of this work. The solution this thesis presents is a multi-sensor system which is able to navigate through an unknown environment without hitting any obstacles whilst maintaining a certain heading. Analog distance sensors, encoders for motor position and a camera for human detection have been implemented. This prototype also contains a first attempt at a human detection software, which could not be

finished during the time of the thesis work. The results show that both sensor fusion for navigation and the navigational algorithm are fast enough to work on this system. However, at this point, the complete software system with the human detection is not able to run on an embedded system at a satisfactory speed.

(3)

Sammanfattning

För att göra det möjligt för sjukhuspersonal att spendera mer tid med sina patienter s˚a har det här examensarbetet fokuserat p˚a hur en autonom robot skulle kunna utföra en del av deras arbete. Roboten är byggd för att navigera i en sjukhusmiljö samtidigt som den drar en sjukhussäng och undviker hinder samt människor. En navigationsalgoritm har implementerats tillsammans med en människodetektering. Prototypen klarar av att navigera i testmiljön, ett kontorslandskap, i en riktning som användaren har bestämt samtidigt som den undviker hinder. Under arbetet s˚a har m˚alet varit en l˚ag slutkostnad, eftersom en l˚ag kostnad skulle göra det möjligt att implementera systemet p˚a ett riktigt sjukhus. Lösningen det här arbetet presenterar är ett multi-sensorsystem som klarar av att navigera genom en okänd miljö utan att krocka med n˚agra hinder samtidigt som den beh˚aller sin riktning. Analoga avst˚andssensorer, motorenkodrar för positionering av motorer samt en kamera för människodetektering har implementerats. Prototypen inneh˚aller ett första försök till en mjukvara för människodetektering, vilken inte slutfördes under projekttiden. Resultaten visar att bara använda navigationen och hanteringen av sensordata till denna fungerar i dagsläget. Men all mjukvara kan i dagsläget inte köras p˚a det inbyggda systemet eftersom det blir för l˚angsamt.

(4)

Acronyms

IR Infrared

HOG Histogram of Oriented Gradients CENTRIST CENsus TRansform hISTogram CT Census Transform

FL Fuzzy Logic

PWM Pulse Width Modulation fps frames per second

FPGA Field-Programmable Gate Array FIS Fuzzy Interference System SFI Strategies Fusion Index

TP Summed Area Tables for Parallelogram HIK Histogram Intersection Kernel

SVM Support Vector Machine

(5)

3.5 Adjustable Autonomy . . . 19 4 Hardware 21 4.1 Platform . . . 21 4.1.1 Kinematics . . . 21 4.2 Power supply . . . 22 4.2.1 Voltage regulation . . . 22 4.2.2 Motor supply . . . 23 4.3 Sensors . . . 23 4.3.1 USB-camera . . . 23 4.3.2 Distance sensors . . . 23 4.3.3 Motor encoders . . . 24 5 Software 25 5.1 Navigation . . . 25 5.1.1 Fuzzy Navigation . . . 25

5.1.2 Wheel rotation calculation . . . 25

5.2.1 Census Transform . . . 26

5.2.2 Scanning the image . . . 27

6 Results 28 6.1 Research question 1 . . . 28 6.2 Research question 2 . . . 28 6.2.1 Human detection . . . 29 6.2.2 Stationary obstacles . . . 29 6.3 Research question 3 . . . 29 6.3.1 Human detection . . . 29 6.3.2 Navigation . . . 29

(6)

7 Discussions 30 7.1 Research question 1 . . . 30 7.2 Research question 2 . . . 30 7.3 Research question 3 . . . 31 8 Future work 32 9 Conclusion 33 10 Acknowledgments 34 References 37

(7)

1 Introduction

There are a number of tasks in a hospital environment that could be simplified by an autonomous vehicle that can both detect humans and avoid obstacles. One example of this is bed transports in health care facilities where two nurses, or other members of staff, are required in order to move one patient. There are also other domains which could benefit from this kind of product. For example a robot of this sort could do postal deliveries in an office environment.

The goal for this thesis is to create a multi-sensor system that can detect both persons and obstacles and determine whether it is possible to aid nurses using an autonomous robot with object detection. The aim is to create a cost efficient, lightweight robot that is able to identify humans and objects. To do this the differences between detecting humans and other obstacles will have to be identified. It is also desirable for the robot to have some pulling power in order for it to pull some of the weight of the hospital bed.

This work includes researching appropriate sensors, building the accompanying electronics as well as researching and implementing appropriate methods for object detection in software. The expected outcome is a prototype, which should be able to move around in an office environment. The aim is for that prototype to be able to detect humans while avoiding fixed obstacles.

To answer the questions stated in 1.1.2, it will be evaluated how well the robot can detect humans and obstacles. This will be done by testing the prototype in a environment which has both. The sensors and the sensor fusion will also be evaluated to test their reliability.

1.1 Problem Formulation

A hypotheses and a set of research questions has been formulated to focus the scope of the work for the thesis.

1.1.1 Hypothesis The hypothesis is as follows:

Can an autonomous robot make it possible to only use one nurse when doing a bed transportation in a hospital?

The use of two nurses for every bed transport requires a lot of resources which could have been better used in aiding other patients with their problems. The aim is not to completely take away the nurses from bed transports since their human abilities are wanted, both to make the transport safe for the patient and to make sure the transport will go as planned.

1.1.2 Problem formulation

To be able to test the hypothesis the following research questions has been formulated. RQ1 What are the differences in detecting humans and fixed obstacles?

To make sure that humans are detected and identified as something else than other obstacles, the difference between humans and other obstacles has to be identified in order for the robot to interact properly with humans and to not mistake obstacles for humans.

RQ2 What sensors are needed to identify humans and other obstacles?

For an autonomous system one has to be sure that the system will be able to identify all obstacles to avoid collisions. The identification of humans is also crucial since interaction with humans is important if humans are to trust the new way of transportation around the hospital.

RQ3 Is it possible to do the sensor fusion completely on an embedded system which can be placed on the robot?

Since hospital environments are sensitive to electromagnetic interference it is desirable to have a system where all the computations can be done locally. This also reduces the fail risk for the system due to communication problems, a system which can be run completely on site would make the system safer. To get the robot out on the market, it is desired to produce it

(8)

at an as low cost as possible, the embedded system should therefore be as cost efficient as possible. To make sure no humans remain undetected, the human detection must be run at least five times a second taken that the robot should be able to move at a reasonable speed.

(9)

2 Background

This section is divided into five sections. The first describes the motivation for this work and thereby the different areas of implementation, followed by a section describing the different sensors that has been researched. The third section describes different methods on how to implement a human detection. The fourth section describes the different methods of navigational algorithms. The fifth and final section describes the question of adjustable autonomy.

2.1 Motivation

The idea for this work came from hospitals, which are constantly understaffed with nurses. There has been different attempts to aid nurses in their work. Some has been successful, like lifts, which has reduced the need for heavy lifting. There are different patient lifts for different situations, all from lifting patients into showers to holding up body parts that need to be elevated [1]. Using this kind of equipment reduces the risk of injuries on the staff, but there is still more to be done to improve hospital staff’s work conditions. Studies has shown that only 30% of Swedish hospital staff’s work involve actual patient-related tasks. To do a bed transport in any health care facility today will require two members of staff doing the transportation. As of today, the average hospital nurse spend 7 % of their workday on transportation, which in a 40 hour work week will add up to 2.8 hours [2]. An improvement in transportation efficiency would improve the time spent with actual patient care since it would reduce the amount of time spent on moving staff. Since a completely automated transport would possibly risk patient safety, it is important to point out that only one of the two staff members would be replaced in this kind of implementation, which would still leave the patient with a human to keep them company and make sure that the transport is safe. Attempts has been made to do this [3], but they have yet to be implemented in greater numbers. It is possible that one of the reasons is the high investment cost. The idea for this project is therefor to make a less expensive solution which will make it easier for the hospitals to invest in this kind of technology.

Since there is nothing in this prototype that specialize it for use with a hospital bed, the use of this product could be extended to use in other domains where for moving things or easy tasks need executing. For example this robot would be able to send things between rooms or to do postal deliveries in an office.

2.2 Sensors

In order for the robot to orient itself through its surroundings it needs a number of sensors. It will need some sort of sensor to detect humans as well as some sort of sensor to detect other obstacles. The following sections discuss different options for how to detect humans and how to identify other, fixed, obstacles.

2.2.1 Identifying humans

In order to identify humans, one of two different sensors can be used - either a regular camera or a thermal, infrared camera. When using a thermal infrared camera - it is possible to assume that all living things are warmer than their background [4], this makes calculations easier. There are also other differences in detecting a human compared to their surroundings. Wu, Geyer and Rehg has developed a method using the contour of the human body as its detection method [5]. Dalal, Triggs and Schmid concludes that there are certain movements that are very characteristic for humans, however to study movement, one has to analyse several images [6]. Furthermore, using texture and color information might also help in detecting human presence in an image [7].

Using a regular camera requires more calculation power, however a regular camera is more cost efficient and there are more options on how to process the data of a regular camera versus a thermal, infrared camera. The different options for this will be described in further detail in section 2.3.

(10)

2.2.2 Identifying fixed objects

To identify fixed objects, a sensor to measure distance is enough since there is no need to know more about the obstacle than the distance to it. A number of sensors to do this are available. Both photoelectric sensors [8] and Infrared (IR) [9] use a light beam that is reflected in the object to which the distance is to be measured. In this case it is the energy of the received light that is measured. There are however a couple of problems in using this type of sensor. White objects can be detected at a greater distance than darker objects and, foremost, shiny objects will not be detected unless they are parallel to the lens of the sensor [8].

Ultrasonic sensors are widely used since they do not have any problems with different textures on surfaces or particles in the air, e.g. shiny objects and dust. The idea for an ultrasonic sensor is to output a ultrasonic vibration and wait for the echo. The time of flight is measured and the distance can be calculated [10].

The most common sensor for distance measurements in mobile robotics is the laser distance sensor. It is easy to retrieve data from a laser distance sensor. Two kinds of laser distance sensors are available, the first one calculates the distance to an object just like the ultrasonic sensor, using time of flight. The other type uses triangulation to determine the distance towards an object [11]. Object identification could also be done through a regular camera. Using just one camera, however, makes it harder to determine the distance to the object. Therefore this would mean having to use a stereo vision system, which could triangulate the distance from the distance between the cameras and the distance between the object in both pictures from the two cameras [12].

2.3 Human detection

A number of different methods to identify humans in pictures has been created over the years. A few of them are described in this section. When it comes to thermal infrared cameras, the same methods can be applied to these pictures as for regular pictures from a regular camera [13]. Most methods that have been researched are used to detect pedestrians [5,7,14,15].

2.3.1 The C4 method

The method described here is a human detection based on contour information using a cascade classifier and CENTRIST (C4_{). As the name might tell it uses a cascade of classifiers together with} CENsus TRansform hISTogram (CENTRIST) to identify humans by their contours [5].

It can be seen in earlier work that Histograms of Oriented Gradients (HOGs) are commonly used in different human detection applications [6,15,7]. One of the more recent methods is using contour cues as its identification method. The C4 _{method uses the CENTRIST visual descriptor} and a cascade classifier to emphasize the human contour in order to detect humans in images [5]. The method was created on the base of two hypotheses, the first ”For pedestrian detection the most important thing is to encode the contour, and this is the information that HOG is mostly focusing on.” and the second ”Signs of comparisons among neighboring pixels are key to encode the contour.” For real-time applications the advantage is that C4_{, and particularly the} CENTRIST-descriptor, requires neither pre- nor post-processing and it is useful when detecting humans since it finds the most important sign information in an effective way [5].

The CENTRIST descriptor is a histogram of the Census Transform (CT) values for a picture [16]. The CT takes the intensity value of one pixel and compares it to the eight surrounding pixels. For every pixel with a intensity value lower or equal to the analysed pixel, a 1 is rendered. For all values higher, CT renders a 0. These are then put together, in any order as long as it is done the same way every time, and converted into a base-10 number [5].

For an image I with a certain CENTRIST h, it is expected that images with matching CENTRIST descriptors will be similar to the image I. Therefore the C4 _{algorithm divides each} image into smaller blocks, and for each block, I, the aim is to find a known, matching block with the same histogram and CENTRIST descriptor as I [5].

The example in Real-Time Human Detection Using Contour Cues uses detection windows of the size 108-by-36 pixels, which are divided into 9 x 4 blocks (containing 108 pixels each), combining 2 x 2 adjacent blocks is called a super-block. A CENTRIST descriptor is extracted from each

(11)

Figure 1: Some examples of Haar features. The first row represent edge features in images, the second row represent line features and the third row represents center-surround features.

super-block. This results in 24 super-blocks, which can take 256 different values, thus the feature vector for a candidate image patch contains 6144 dimensions [5].

To identify humans in the image, the image is classified as containing a human if equation1 is satisfied. Where w is an already trained linear classifier that exists in R6144_{. The term w can then} be separated in pieces corresponding to the super-blocks where wT

i,j∈ R256, 1 ≤ x ≤ 8, 1 ≤ j ≤ 3 and fi,j is the feature vector, f , which has been divided in a similar fashion as w.

wTf = 2hs−1 X x=2 2ws−1 X y=2 A(t + x, l + y) (1)

Elaborating the equation will make it possible for the computer to process this picture in a more efficient way, which is what makes the frame rate higher than other similar methods [5].

2.3.2 Parallelogram Haar-like features

Different methods using Haar-like features has been developed for some time. However, they are restricted to finding features within a rectangular region. This yields problems when it comes to different backgrounds, angles and the difference in human poses. To improve the human detection performance, parallelogram Haar-like features are introduced [14].

The original features are simple, rectangular features of an image (Fig. 1). The Haar-features simple design make it easy to scale them to different sizes which allows them to be used to detect different sized objects [17].

The Haar-like features described by Van-Dung Hoang, Andrey Vavilin and Kang-Hyun Jo in Fast Human Detection Based on Parallelogram Haar-like Features [14] are based on the same principle but are built upon parallelograms instead of rectangles (Fig. 2).

The most important part Haar-like feature calculation is calculating the sum of intensities for the analysed region, this is an operation that has to be done several times. To speed up the computational times, Summed Area Tables for Parallelogram (TP) are made for the four different types of features, named 1,2,3 and 4 in Figure 2. For group 1, the TP is calculated as seen in equation 2 [14].

T_P(1)= T_P(1)(x − 1, y) + T_P(1)(x + 1, y − 1) − T_P(1)(x, y − 1) + I(x, y) (2) Based on the TP, the sum of intensities within a parallelogram region (SP) can be calculated. For

(12)

Figure 2: Parallelogram Haar-like features, here divided into four groups to speed up computational times.

group 1, this is shown in equation 3 [14].

S_P(1)(x, y, w, h) = S_P(1)(x+w −h, y +h−1)+S_P(1)(x−h, y +h−1)−S_P(1)(x+w, y −1)−S(x, y −1) (3) The classifiers then have to be trained using some method. For detection, using a cascade structure classifier can further speed up computational times since a number of background samples can be discarded using a small number of features [14].

2.4 Navigation

There are two main categories of navigation methods. One where the environment is static and known, this is called global navigation. The other has an unknown environment where sensors are needed to determine the position of different obstacles, this is called local navigation [18]. For global navigation methods, the goal is to optimise the path which the robot will take to its target. However, real environments tend not to be completely known. For a local navigation method on the other hand, some sort of sensors will have to be used to navigate through the environment. Here the main problem turns out to be u-shaped obstacles, also known as dead-end environments. 2.4.1 Hybrid methods

For a final product, parts of the environment will be known. In this case a method which implements parts of global methods and parts of local methods are useful. In many cases the structure of the building is known, whilst some things in the building might be moving. This is where the hybrid navigation methods come in handy. When the environment is known, a larger part of the computing can be made before the robot actually starts moving. This reduces the need for fast computations during the navigation. However, most indoor environments contain objects that are movable, which makes it impossible to completely know the environment before starting the navigation. For these methods there are two extremes and a range of possibilities in between them. either the real scene looks exactly as the memorised map or it looks nothing like it. If the scene matches the memorised map the path will be the one decided by the global part of the system. If, on the other hand, there is no resemblance between the two, the navigation will be completely local [19]. However, most cases are somewhere in between.

(13)

Figure 3: Example of a visibility graph. From each point, all points which can be connected by a straight line without intersecting any of the grey obstacles have a line drawn between them and count as a possible path for travel.

Method 1

Two methods of this type has been researched. The first, described by H. Maaref and C. Barret in Sensor-based navigation of a mobile platform in an indoor environment, is actually made to aid disabled people and the method therefore priorities safety over optimality [19]. The idea for this method is that both a local and a global method is run in parallel. Their respective outputs are then fused together with a comparison between the memorized and real scene.

In this case the method uses a visibility graph to determine all the possible paths between the starting point and the goal without intersecting any of the obstacles (Fig. 3). From each point, a connection to all visible points are made, resulting in an image with all possible paths [19].

To determine which of the possible paths in the visibility graph is the optimal, this method uses an A* algorithm [20]. The A* algorithm divides the space into squares and gives all squares three different scores depending on their position given by equation 4 starting at the starting point and aiming towards the desired goal.

f (n) = g(n) + h(n) (4)

Where g(n) is the cost to move to that particular point from the starting point and h(n) is the estimated distance towards the goal. h(n) could be estimated using different methods, for example the shortest (straight) physical way between two points. For every iteration, the route that has the lowest f (n) is chosen as the next step and the cost for its surrounding blocks is calculated (Fig. 4). When the start and goal is connected, the only thing left to do is to trace the shortest way back, using the smallest f (n) value in every step [20]. In this method the lines in the visibility graph are used as the cells in Figure 4. The local navigation of this algorithm is inspired by human behaviour to try reaching the free space while searching for the goal. This part of the method uses a Fuzzy Interference System (FIS) to navigate. This method automatically avoids some local minima issues, but the problem with concave obstacles still remain. To avoid this kind of problem a combination of the first part of the navigation with a wall following algorithm which follows the wall to a created sub-goal. This part of the method is similar to the fuzzy logic navigation described in section 2.4.2 [19].

To determine the actual output to the robot, the global and the local decisions are fused together with a comparison between the memorised map and the real scene. This comparison generates an Strategies Fusion Index (SFI) which indicates which strategy is the best to apply. The SFI value is determined using a fuzzy decision making and it reflects the actual situation with respect to memorised map [19].

(14)

Figure 4: Example of A* calculation. Purple indicates starting poing, pink wall, yellow target and green the suggested way. Numbers in each cell are f(n),g(n),h(n).

Method 2

The second method studied here is the method described by Lim Chee Wang, Lim Ser Yong and Marcelo H. Ang Jr. in Hybrid of Global Path Planning and Local Navigation Implemented on a Mobile Robot in Indoor Environment [21]. This method implements a potential field methods as its local navigation and a distance transform is used as the global path planner.

The local, potential field navigation creates a virtual field of potential forces attracting the robot towards the goal and repelling it from all obstacles. The attractive potential is described in equation 5 where q is the current position of the robot, qdesired is the goal position and Kattractive is the attraction factor.

Vattractive(q) = 1

2∗ Kattractive(q − qdesired)

2 ₍₅₎

The force on the robot can be described by the negative gradient of the potential function as seen in equation 6

F (q) = −∇V (q) (6)

this means that the attractive force can be described as seen in equation 7 [21].

Fattractive(q) = Kattractive(q − qdesired) (7) The repulsive forces from the obstacles are the opposite of the attractive forces. Just as with the attractive forces which change with the distance to the goal, the repulsive forces change with the distance to the obstacle. The closer the robot gets to the obstacle, the higher the repulsive force gets. Obstacles that are too far away should not add any force since they do not pose a treat at this point. Thus, a maximum effective distance between the obstacle and the robot is needed, obstacles beyond this point will not add to the forces on the robot. Similar to the attractive potential, the repulsive potential is described by equation 8 [21].

Vrepulsive(q) = 1 2Krepulsive( 1 d− 1 d0) 2 _{if d < d} 0 0 if d ≥ d0 (8) which gives equation 9 to describe the repulsive force.

Frepulsive(q) = −Krepulsive(1_d−_d1 0) 1 d2 if d < d0 0 if d ≥ d0 (9)

(15)

Figure 5: An illustration of the distance transform, where the goal is marked as 0 and the black box is an obstacle.

The resulting force is the sum of the attractive forces and the repulsive forces, as seen in equation 10 [21].

Fresult(q) = Fattractive(q) + Frepulsive(q) (10) The article also states that this method does not offer a solution to the local minima problem, which occurs when the attractive force is equal and opposite to the repulsive, which makes a simulated robot stop completely and a real life robot will oscillate around the local minima point [21].

The global path planning uses a distance transform to determine the best route towards the goal. To implement this a map, [x,y], is created, where the x-coordinates range from 0 to xMax+1 and the y-coordinates range from 0 to yMax+1. The distance value of each point will be put into the cells of the map. To ensure that the robot is kept inside the map, the edges are assumed to contain obstacles. The goal cell of the map should contain the lowest value, normally 0. All cells containing obstacles should contain an as high value as possible. The remaining cells should contain some large value, the article uses the product of xMax and yMax [21].

When filling the map with distance values, the goal and obstacles remain unchanged. The rest of the cells will get a value depending on their distance to the goal. Each cell will get a value according to equation 11 (Fig. 5) [21].

cell[x, y]i_dt= min(cell[x + 1, y] + 1, cell[x, y + 1] + 1, cell[x − 1, y] + 1, cell[x, y − 1] + 1, cell[x + 1, y − 1] + 2, cell[x + 1, y + 1] + 2, cell[x − 1, y − 1] + 2, cell[x − 1, y + 1] + 2) (11) When this is done, the final part is to start at any point and simply ”walk down” in the cells towards the lower numbers and the goal will be found [21]. To combine the global and local parts, a virtual circle, with some appropriate radius, is created around the robot. The intersection of the circle and the planned path is set as a sub goal towards which the robot should move. In the case of more than one intersection, the robot is targeting the one with the lowest Distance Value. The navigation to the sub-goal will be done using the potential fields method. When the sub-goal is reached, a new sub goal is set and the process is repeated until the goal has been reached [21]. 2.4.2 Local methods

Since it is not always possible to acquire a map of the environment the robot should operate in, a pair of local navigational methods has been studied. Most hybrid methods can be combined with any kind of local method, which will make it possible to keep the implementation of the local navigation when a map is acquired and a hybrid method is useful.

Fuzzy Logic navigation

One of the newer methods are described by Shayestegan, M. and Marhaban, M.H. in Mobile Robot Safe Navigation in Unknown Environment [22]. This method uses Fuzzy Logic (FL) to navigate

(16)

through the environment. The authors use a two wheeled robot platform as their base, and have divided the surrounding into three different parts (left, right and front). Based on this, 11 rules are formulated for the FL. The small amount of rules makes it possible to read the sensor values more often [22].

To solve the dead-end environment problem, an escape strategy is triggered when the robot detects an obstacle on either side and in front of it. When the obstacle in front and to the side is discovered, the current angle towards the goal is memorized. Using a virtual target as its first goal, the robot then follows the wall until it is at or near the memorized angle again, at which point the original goal once again is set as a goal [22].

Behavior-based Hierarchial navigation

Another new method for local navigation of mobile robots use a similar FL technique. Here instead the robot has four basic behaviours, which it uses depending on sensor values, distance to goal and the steering angle of the robot. The different behaviors are goal seeking, obstacle avoidance, wall following and deadlock disarming [23].

The robot used in the example given by Wang Dongshu et al. has six sensors, two placed towards the front and two placed on both sides. In goal seeking mode the robot will travel in the direction towards the goal. The obstacle avoidance is activated when an obstacle is found in front of the robot, this will cause the robot to steer around the obstacle. If the robot senses an object from at least five of the six sensors, the wall following algorithm will be used, and the robot will follow the wall until it can continue its way towards the goal. If the robot happens to end up inside a U-shaped obstacle, it is often hard to trigger five of the sensors. If this happens a path remembering algorithm will be triggered, reducing the number of iterations that is needed before the robot can escape the obstacle [23].

2.5 Adjustable Autonomy

The question of how to handle the detection of a human requires an analysis of how the human will interact with the robot as well as an understanding of the different hazards that arise when robots and humans interact. Normally the level of autonomy is decided when the system is first made and many systems offer the user the opportunity to chose between running the system autonomously or manually [24].

It has been stated that the purpose of adjustable autonomy is to find the balance between convenience and comfort. In this case the convenience would be to delegate everything to the autonomous system whilst comfort requires some actions to be performed by a human since the autonomous system can not be trusted to perform adequately [25].

This could be implemented as a set of autonomy levels. One example states five different levels; fully autonomous, autonomous with goal biases, waypoint methods, intelligent teleoperation and dormant. Where fully autonomous and dormant are, in some way, self explanatory. Goal-biased autonomy, in this case, lets a human specify a region of interest or a region of risk, the robot’s movement towards the goal position on the other hand is left to the robot to decide. A waypoint method lets the user set a certain set of point which lets the robot do a more complicated task than otherwise possible. The intelligence teleoperation lets the human control the robot remotely via a joystick and a video feedback [26].

(17)

3 Method

The work of this thesis was conducted using the engineering design process [27]. First, the problem was defined, which was followed by a background research where existing methods were researched. From the research it was possible to specify some requirements which a product need to meet in order for it to be usable in a hospital. The next step was to choose which solutions to develop, develop a prototype using these solutions and to test it. The process of the last three steps was then repeated for the duration of the thesis (Fig. 6).

To start the implementation, a platform for the mechanics had to be chosen as well as a platform for the embedded system. For the embedded system, National InstrumentsTM _{myRIO[28] was} chosen due to its relatively high performance compared to its cost. It also allows an implementation in LabVIEW[29], something that simplifies the implementation since it allows for a quick start and does not require any time to set up the environment other than installing and starting LabVIEW. Since the myRIO is limited in its computing power, two myRIOs were be used. One was used only to do the human detection. That means taking the images using a web camera, process and analyse these images and return a true or false to the rest of the system when a picture is analysed. The other myRIO were used to do the rest of the calculations, such as navigation and sensor fusion. Another benefit of using the myRIO is that it has two CPU-cores which could be used to do several things in the navigation at once. Another benefit is that it also has an Field-Programmable Gate Array (FPGA) which could be used to do some of the human detection [28].

The platform was bought as an almost complete kit, containing the mechanical platform, motors with encoders and motor drivers [30]. The robot platform was chosen for two main reasons. It has mecanum wheels, which was desired since this allows for a free movement in the plane. This allows the robot to easier avoid different obstacles. From the different mecanum wheeled options found, this platform was chosen due to its size, loading capacity and speed.

Another reason that this platform was chosen is the hall sensor encoder which sense the number of rotations from the motor, allowing for easier calculations of traveled distance than using for example an accelerometer to determine current position.

3.1 Sensors

The choices of sensors were made due to limitations in the system and the task at hand, aiming at solving the task at hand at an as low computational cost as possible. Three different types of sensors had to be used, one to identify humans, one to identify other obstacles and finally some kind of feedback from the motors to determine the speed and position of the robot.

3.1.1 Identifying human

To identify humans, there were two main strategies, either a thermal infrared camera or a regular camera could be used. The implementation of a thermal infrared camera for human detection might be simpler, thermal infrared cameras are expensive. A regular camera is cost efficient and there are more existing methods for human identification using this method.

3.1.2 Identifying other fixed objects

When it comes to identifying other fixed objects it has been decided that the objects does not have to be identified as to what they are but rather to where they are. This means that the distance is the only relevant factor for this implementation. This also means that using a stereo vision system would be unnecessary, since the distance to objects identified as humans can be measured from the distance sensors. This would also decrease the need for computational power, since no image processing is required to identify fixed objects.

Even after ruling out stereo vision, a number of sensors were still available. IR-sensors and ultra sonic sensors are the easiest to come by. The IR-sensor has a faster response time since light travels faster than sound. However the ultra sonic sensor will work on any material. In for example a hospital environment, the risk is that there will be glass walls, which could become a problem for the IR-sensor. Since IR-sensors are often analog they are easy to connect and implement, therefore the end decision was to use IR-sensors to determine the distance to obstacles around the robot.

(18)

Figure 6: The general engineering method

Due to the square shaped platform, it was decided that three sensors would be sufficient to cover the different sides of the robot (Fig. 7). Since the robot is equipped with mecanum wheels, it is able to move freely in the xy-plane. A rotating platform will be used to place the front viewing sensor and the USB-camera so that the vision will have a good view in the direction of the movement. 3.1.3 Motor encoders

To receive feedback from the motors, a set of encoders were installed. These make it possible to calculate the robot’s position and to set a desired goal for the robot’s movement.

The encoders came with the robot-kit. The encoders have two channels which make it possible to determine both the speed, how far the wheel has rotated as well as determining direction of the wheel [31,32]. Together with a set of kinematic equations, this made it possible to determine the robot’s position and angle towards the goal.

3.2 Human detection

The main problem with different methods for object detection is that they require fast processors and a lot of RAM. The estimate is that the system will have to be able to process between 5 and 10 frames per second (fps) to properly be able to interact with humans.

Of the researched methods, the C4 _{method has the highest frame rate compared to its hardware} performance. It is able to process 20 fps, using one processor core on a 2.8 GHz CPU [5]. Given this information it was decided that this method was the most appropriate to implement to get an as high frame rate as possible.

3.3 Navigation

In this stage of the implementation it was decided that the navigation was the most important part of the software implementation. Without the ability to decide where the robot will go the ability to detect obstacles and/or humans is unnecessary.

(19)

Figure 7: Simple drawing of the sensor placement on the robot, the front viewing sensor (marked as 1. in the figure) together with the camera used for human detection will be on a rotating platform in order for it to view obstacles and humans in the direction of motion.

Of the different methods that were researched, the method described as Method 1 in Section 2.4.1 was found to be the most suited for this implementation. This method implements a FL to do the local navigation and uses an A* algorithm to do the global navigation. Additionally, the method described as Method 2 in the same section does not offer any dead-lock escape, which is a common problem with the local navigation which has to be avoided.

No maps or schematics were available for the test environment, and a local method was therefore implemented. Since the aim is to implement the method described above, it was decided that a regular FL method would be implemented since this would simplify the implementation of the hybrid method in a later stage of the process. Given the different tools that LabVIEW offers, an implementation using only FL, such as in the Fuzzy Logic Navigation method described in Section 2.4.2, will be easy to implement.

3.4 Software system design

A general software overview has been made (Fig. 8). There are three main parts to this overview; inputs, processing and outputs. For inputs the data from the distance sensors, the camera and the hall sensors on the motors as well as the target position which is the goal for the robot mission. In the processing part the image from the camera will be processed, the distance to the different objects will be calculated and all of this will be sent to the navigational algorithm to decide the best direction of travel. This is then sent to the output called movement. However, if the system detects a human, the robot is to stop in order to make sure no collision occurs, the output in this case is for the robot to stop.

Further an overview of the human detection algorithm has been made (Fig. 9). The input to the human detection is an image taken using a USB-camera that is on the system and the output is a boolean to the complete software system telling it if there is a human or not in this image. An overview of the navigational algorithm has also been made (Fig. 10), which uses inputs from the three distance sensors to determine presence to obstacles, the encoders on the motors to determine

(20)

Figure 8: The general software overview with its different inputs, the different steps of processing and the different output options for the system.

the traveled distances and the target position. The output from these are the duty cycles to the motor’s Pulse Width Modulation (PWM)-signals and their direction of motion. This output is also the output that is listed as movement in the general software overview.

3.5 Adjustable Autonomy

Since the question for this thesis was whether it was possible to implement both human detection and all of the other sensor fusion on a cost efficient, small platform the question of adjustable autonomy was decided to be out of scope for this thesis work.

(21)

Figure 9: An overview of the human detection algorithm, the system first step is a Sobel filter, after that follows the CENTRIST descriptor and in the last step, the actual classification.[5]

Figure 10: An overview of the navigation algorithm, this algorithm is uses the sensor values, the target position and the motion of the motors to make the decisions using a fuzzy system. The output of this algorithm is a set of PWM signals to run the motors at different speeds.

(22)

4 Hardware

As the main computer for this system the myRIO from National InstrumentsTM was chosen. It was chosen due to its low cost compared to its computing power. It is also easy to get up and running. As on any other system, there is a limited number of analog inputs on the myRIO. There are 10 analog inputs on each myRIO [33], which made it impossible to connect all desired devices. Each distance sensor requires one input, and to connect both hall sensors on each encoder this would mean an additional eight inputs. Therefore it was decided that only one of the encoders should be attached since this would limit the number of wired cables.

Compared to the hardware used for human detection in other works, the myRIO has less memory and a slower proessor at its 667MHz [5,33].

4.1 Platform

The platform which the robot is built upon uses mecanum wheels which makes for easier calculations of movement since it will be free to rotate around its own axis and it is free to move in any direction in the xy-plane. The platform has four wheels, each driven with its own motor [30]. The motors were switched from the original platform to a model with comes with encoders to simplify calculating traveled motion.

In order to understand the movement of the robot, one must underststand the mecanum wheels. The most important attribute of the mecanum wheel is the angle of the rollers on the wheel. To be able to get complete freedom in the movement in the xy-plane on a four wheeled robot, there has to be two wheels with positive angle and two wheels with negative angle [34]. For this implementation wheels with +45◦ and −45◦ angle has been used. Wheels that have a positive angle have rollers leaning to the right and are therefor called right wheels, whilst the ones with a negative angle is called left for the same reason. The wheels could either be placed so that the rollers form an x or to form a diamond. On this robot, the wheels has been placed for the rollers to form an x (Fig. 11). 4.1.1 Kinematics

One of the main problems with any navigation is the kinematic model, which is needed to know the position and direction of the robot. The platform was chosen due to the wheels and its ability to move freely in the xy-plane. Kinematics Modelling of Mecanum Wheeled Mobile Platform presents a similar kinematic model to the one used for this implementation [34]. The report states a set of kinematic equations which will work for this implementation as well. The equations has been formed based on the kinematic model of the robot (Fig. 11). Equation 12 shows the basic calculations of the velocity in x and y relative to the robot’s position along with the angular velocity around the z axis.

  vx xy ωz  =   1 1 1 1 −1 1 1 −1 − 1 (L+1) 1 (L+1) − 1 (L+1) 1 (L+1)  ·     Rw· ˙θ1 Rw· ˙θ2 Rw· ˙θ3 Rw· ˙θ4     (12)

For each iterations a velocity in both x and y-direction is calculated, using equation 12, which is then converted into an angle relative to the last position of the robot using equation 13.

β = tan−1(vy vx

) (13)

The resulting angle β is relative to the starting position of each calculation - e.g. the angle towards the goal has to be changed for every iteration of the program. This means that the angle towards the desired direction has to be summed together with the angle for each calculation. When passing π (or −π radians), the angle has to be reversed in order for the robot to still remember its target direction.

(23)

Figure 11: Drawing of the kinematics of a mecanum wheeled platform. This platform uses wheels with a negative angle of its rollers on motor 1 and 4, and a positive angle on the wheels placed on motor 2 and 3.

Figure 12: Illustration of the schematics for the voltage regulator.

4.2 Power supply

The power supply of this system is divided into two main parts. The first supplies the myRIO with power and the other is used to drive the motors. This section is divided in a similar fashion. The energy source on this platform is a Li-Ion battery with 22.2 V and 3 AHr [35].

4.2.1 Voltage regulation

The robot’s motors are rated 24 V, and the myRIO needs a power supply between 6 and 16 V. Therefor some kind of voltage regulation was needed. The final decision was a voltage regulator. The LT1085CT-12 was used for this. It has a nominal output voltage of 12 V and an output current of 3 A which is enough to supply the maximum 14 W the myRIO consumes [36].

To the voltage regulator there are two by-pass capacitors to stabilize the output voltage. The voltage regulator is connected with a 100 nF capacitor between its Vinand ground, and a 220 nF capacitor between its Vout and ground (Fig. 12). These differ from the capacitors suggested in the datasheet [36], but were chosen since they were easy to access. The myRIO has a wide supply range, which also allows for some noise from the power supply without any greater problems.

(24)

When reducing the voltage from 22 V to 12 V, a reduction of 10 V, there is a lot of energy that has to go somewhere. As seen in equations 14 to 15, there will be around 15 W that has to go somewhere.

V ∗ I = P (14)

10 ∗ 1.5 = 15 (15)

To reduce this, a heat sink has been used, which is able to keep the temperature at a reasonable level.

4.2.2 Motor supply

The motors are regular DC-motors, rated for 24 V [37]. The motors are driven using an off the shelf PWM motor controller [38]. The controllers use a LMD18200 h-bridge to control the output. The LMD18200 contains a half h-bridge [39]. The required input-signals for the controller are two digital inputs with logic for direction and breaking. It also requires a PWM-signal to determine speed of the motor. The logic of this module can be found in table 1.

PWM Dir Brk Output

H H L Clockwise rotation

H L L Counter clockwise rotation

L X L Zero drive

H H H Zero drive

H L H Zero drive

L X H None

Table 1: Logic truth table of LMD18200

4.3 Sensors

The sensors in this project has involved distance sensors to determine distance to different stationary obstacles, there are also encoders to determine the traveled distance and rotation.

4.3.1 USB-camera

There is an USB-outlet on the myRIO, which made it easy to connect any regular USB web camera. Since there are no requirements for this camera to produce high quality images, the aim was to find one that was cheap and easy to attach. The cheapest camera with a good attachment mechanism found was Andersson Webcam WBC 1.0 [40] which was used for this implementation.

4.3.2 Distance sensors

Two different types of analogue IR-sensors were used. These have the upside that they output a function of the distance as an output voltage directly without any input other than the supply voltage. The downside is that the IR-sensors are not able to identify glass or other shiny surfaces, which makes for example glass walls impossible to identify.

The sensor in front is a Sharp GP2Y0A710K0F [41] which has a range between 100 to 550 cm. When measuring, this particular sensor has been proved to work as close as 70 cm. This has been the premises of this work. To convert the analog input signal to an actual distance for the fuzzy system to work with, equation 16 was used.

distance in cm = 1

V ∗ 0, 007 − 0, 0075 (16)

The two sensors to the sides are Sharp GP2D12 [42], these have a specified range from 10 up to 80 cm, empirical testing proved this to be accurate for the two sensors used in this implementation.

(25)

The left and right sensors differ from the front viewing sensor and therefore have a different conversion from voltage to distance, this can be found in equation 17.

distance in cm = 26

V − 0.42 (17)

No filtering has been used on the sensors, however a mean value of 50 samples are used to get more accurate results. Since the pulses from the motor encoders still had to be measured over a period of time, it was considered a good solution to measure the sensor values the same way.

4.3.3 Motor encoders

Encoders from the motors were already attached to the motors when they were bought. The encoders have two hall sensors each, which makes it possible to detect number of rotations and direction. Each encoder has 6 pins, -MOTOR and +MOTOR, which did not need connecting since they are already connected directly to the motors. A ground and VCC for to supply the hall sensors in the encoders with power. And an additional two pins for the output from the hall sensors. As suggested in the data sheet, a pull up resistor of 1.8kΩ between the signal and VCC was used [32].

(26)

5 Software

The software implementation is divided into two main parts, navigation and human detection. The navigation contains the sensor fusion and the logic for determining desired movement and the output to the motors. The human detection on the other hand uses only a camera as its input and analyses images to detect humans in front of the robot and outputs a boolean to the rest of the system which should determine a relevant solution for each case.

5.1 Navigation

The navigation uses a set of fuzzy rules to navigate. The target for the navigation algorithm is to follow a by the user set direction whilst avoiding obstacles. The base of the navigation algorithms is the kinematic equations, which can be found in section 4.1.1. Even though the end goal is to implement a hybrid navigation no schematics over the test environment were available at this point in time, therefore only the local, fuzzy navigation has been implemented.

5.1.1 Fuzzy Navigation

In LabVIEW there is a tool for modeling fuzzy systems called Fuzzy System Designer. This allows the user to set up rules and variables as well as testing the system.

There are eight variables in the system, one for each sensor, one for the angle towards the direction and four for the different outputs to the motors. The sensors to the sides does not have the same range in distance as the front viewing sensor and therefore they have different membership functions than the front viewing (Fig. 13).

In the same way, the four different motors are modeled with membership functions that range from -1 to 1 (Fig. 14). Negative numbers for negative rotation and positive for positive rotation. The value equal the percentage of the PWM duty cycle to control the speed.

As for the method described under Fuzzy Logic Navigation in section 2.4.2, the output to the motors are based on 11 fuzzy rules, which can be seen in table 2. The aim with the fuzzy system is to maintain a certain heading, and the outputs are the direction and speed of the motors is the output.

5.1.2 Wheel rotation calculation

Since there was a shortage of analog inputs the desire was to not have to connect all four sensors. To do this, a conversion from the duty cycle of each motor had to be converted into steps moved for each motor. Equation 18 shows the calculation of each motor’s movement where sx is the number of pulses from a motor and dcxis the duty cycle.

s1 dc1

∗ dci = si (18)

The steps from each motor are then converted into the rotated angle. Each rotation of the wheel is 191 rotations on the motor. For every rotation of the motor the hall sensor on the motor encoder give out a pulse, this means that each step equals approximately 1.8◦. The LabVIEW functions

(a) The membership function for the front viewing sensor

(b) The membership function for the left and right viewing sensors

(c) The membership function for the angle towards the aimed di-rection

(27)

Figure 14: The memebership function is the same for all four motors.

Rule no. Sl Sf Sr Target direction Motor 1 Motor 2 Motor 3 Motor 4

1 Far Far Far NEG NEG POS NEG POS

2 Far Far Far Z POS POS POS POS

3 Far Far Far POS POS NEG POS NEG

4 Far Near Far NOT POS NEG POS NEG POS

5 Far Near Far POS POS NEG POS NEG

6 Near Far Far NOT POS Z Z Z Z

7 Near Far Far POS POS NEG POS NEG

8 Far Far Near NEG NEG POS NEG POS

9 Far Far Near NOT NEG Z Z Z Z

10 Near Near Far ANY POS NEG NEG POS

11 Far Near Near ANY NEG POS POS NEG

Table 2: The fuzzy system contains 11 fuzzy rules.

for trigonometry require the angles to be given in radians. The rotated angle on each motor is therefore given by equation 19

1.8 ∗ si

180 ∗ π = θi (19)

The rotated angle of each motor is then used with equation 12 to calculate vxand vy, which are then put into equation 13 which gives β, the robot’s angle compared to the last iteration. This angle is then added to the last value of the angle - resulting in an angle towards the desired direction.

5.2 Human detection

Of the different methods researched, the C4 _{method is the fastest, it was therefore implemented} with some minor changes.

The acquired images are not saved to disk since there is no need to view the images after the program is ended. Instead one image is taken, analysed and then replaced with a new image in the same place. This procedure saves time and prevents the memory from being full.

In many previous implementations, OpenCV has been used. OpenCV is an open source computer vision library that contains over 2500 optimized algorithms [43]. OpenCV was not used in this implementation since it was deemed too time consuming to get it to work with the myRIO and no relevant references were found on how to implement programs with OpenCV on the myRIO. 5.2.1 Census Transform

The method implements a histogram of the CT values in every pixel of an already Sobel filtered image, this is called a CENTRIST descriptor. The Sobel filter increases the intensity of contours in an image, and it is the contours the proposed method uses to detect humans in the image. CT is used to compare the intensity between neighboring pixels. In CT, the current pixel is compared to

(28)

its eight neighbors as seen in equation 20. In this implementation a top-down, left-right approach has been used, any order will work as long as it is used consequently through out the program.

12 24 64 24 36 55 64 24 12 ⇒ 1 1 0 1 0 0 1 1 ⇒ (1, 1, 0, 1, 1, 0, 0, 1)2⇒ CT = 217 (20) The image is divided into 9 ∗ 4 blocks and any adjacent 2 ∗ 2 block is treated as a super-block. A CENTRIST-descriptor is calculated for every super-block by adding all the CT values of the blocks pixels into a histogram.

5.2.2 Scanning the image

One auxiliary image is then created, using equation 21, where C is the CT image, hsis the block height and wsis the block width, nx= 8 and ny = 3 correspond to the number of super-blocks in the image. ωk

i,j is the k-th component of the linear classifier ωi,j.

A(x, y) = nx X i=1 ny X j=1 ωC((i−1hs+x,(j−1)ws+y) i,j (21)

After calculating the auxiliary image, the human detection is done using equation 22. In this equation, ω is the linear classifier and f is the feature vector.

ωTf = 2hs−1 X x=2 ws−1 X y=2 A(t + x, l + y) (22)

Since the detection window used in this implementation has the same size as the one in the original implementation of C4, the classifiers from the original implementation has been used [44].

The C4 method calls for a cascade of two classifiers, a linear Support Vector Machine (SVM) classifier as well as a Histogram Intersection Kernel (HIK) SVM classifier. The stated reason is that the linear SVM is faster, and in this case ensures a fast boot-strapping process whilst the HIK SVM achieve higher detection accuracy [5]. For this implementation the already trained classifiers from the original implementation of C4 _{was used [44]. To test this implementation the INRIA data} set [45] which is a set of images with positive examples with upright persons as well as negative examples that does not contain humans. While testing this system, the frame rate ended up at one image every 5 seconds and the system indicated on human presence in every image analysed. In a debugging attempt, the HIK SVM was deleted from the classification process. A small increase in frame rate was detected from approximately 5 seconds per image to 4.2 seconds per image. However, the problem with the faulty detection remained. Further investigations has to be made to determine where this problem is located.

(29)

6 Results

This section describes the results of this thesis. The results will be presented in coherence of RQ1-3, as presented in section 1.1.2.

The final prototype of the robot can be seen in figures 15 and 16. The prototype includes electronics, the embedded system, software and mechanics.

Figure 15: The final prototype from the front. Visible in this image is the USB-camera as well as the front viewing distance sensor.

Figure 16: The final prototype from the side. Visible in this image is the USB-camera as well as the front and left distance sensors.

6.1 Research question 1

The first research question is formulated as: What are the differences in detecting humans and fixed obstacles? The main differences in detecting humans and other stationary obstacles was defined mainly by body temperature, which most often differ from the surroundings. Contour, certain movement patterns and the human face are other distinguishing differences between humans and stationary obstacles.

6.2 Research question 2

The second research question is formulated as: What sensors are needed to identify humans and other obstacles? Therefore this section is divided into two sections, one for human detection

(30)

and one for stationary obstacles. 6.2.1 Human detection

To identify humans, either a regular camera or an thermal infrared camera could be used. Depending on the method of detection in these images the required resolution varies, detecting a contour requires a lower resolution than detecting the structure of a face. Since the myRIO offers a USB-port, a UBS-camera was used in this implementation.

6.2.2 Stationary obstacles

For detecting stationary obstacles, the first question was what properties of these obstacles were relevant. Only the distance was found to be relevant. To determine distance, a number of different sensors are available to use. IR and ultrasonic sensors were found to be the best suited for this implementation. IR distance sensors are implemented in this prototype.

6.3 Research question 3

The final question is formulated as: Is it possible to do the sensor fusion completely on an embedded system which can be placed on the robot? It is not possible to do all of the sensor fusion on this embedded system. The navigation and sensor fusion of distance sensors and motor encoders are run completely on the embedded system as of now. However, the implementation of human detection can not run fast enough to ensure safe detection of humans while the robot is moving.

6.3.1 Human detection

In its current state there is a problem with the human detection which makes it detect humans in every image that is analysed. Since the frame rate was to slow the work to get it working stopped in order to benefit other parts of the work. Even though the myRIO also offers an FPGA which would have improved the processing speed of the system, however due to time limitations there was no time to implement this during the work of this thesis.

Even though a method which had been proven fast compared to other tasks was chosen for implementation in this project, it was not fast enough to implement on this system. Even with a resolution of 120 ∗ 160 pixels in each image, it was not possible to get the frame rate up above one image every 4.2 seconds. The minimum acceptable frame rate was set to 5 fps.

6.3.2 Navigation

When run on its own, the navigational algorithm has an average execution time of 4 ms per iteration, which allows the system to update sensor values and outputs to motors often enough to ensure that there will not be any collisions because of too few measurements.

The navigation still lacks the global part of a hybrid navigation method. An implementation of the local navigation has been made in a way so that it is possible to implement the suggested method without changing the part that is already made.

(31)

7 Discussions

This section consists of a general discussion regarding the results of this thesis, then follows sections regarding the different research questions and last is a section about the future work for this project to be finished.

The work of this thesis has created a starting platform to continue work on in order to get it into an actual hospital. The navigation is working and the user can decide a direction which the robot is able to follow. The robot is also able to successfully avoid different obstacles along this path and pass around them.

As it is today, the human detection indicates the presence of humans in every image, even the ones with no humans in them. This problem has been studied and no possible solution has been found. The most likely fault is a calculation error in the implementation of the classification algorithm.

Having a simulation environment would have made the implementation phase easier since testing would have been easier. Unfortunately due to limitations in time, this never got passed the planning stage of this work.

It is also possible that the final result of this thesis would have been more complete if the focus of the work would have been narrowed down to a smaller implementation. For example only implementing the navigational algorithms would have made it possible to extend them further and to implement a part global part local navigation method which could have made it possible to conduct some initial tests in a hospital or another health care facility.

7.1 Research question 1

The first research question is formulated as: What are the differences in detecting humans and fixed obstacles? It was concluded that there are some major differences in humans and other obstacles, the biggest being temperature. Few things in a hospital are the same temperature as a human. Therefore temperature would have been an ideal way of detecting humans in this kind of environment. In order to detect temperature, something like a thermal infrared camera would have been ideal. These are expensive and therefore it was decided that temperature was not a viable option for human detection in the context of this thesis. However, other differences are contour, certain ways we move and the human face. These can all be used to identify humans. In a hospital environment, the humans that pose the greatest challenge for the robot in this implementation are the upright, standing humans.

Since the chosen method detects humans by their contour, sitting humans will not be detected. This could possibly be a problem. This is why there was a limit to how few images per second was acceptable. Another way of solving this would be to use a detection method which does not identify humans by their body. Facial recognition would be one way of doing this. This kind of method would have had one large problem, people that are walking with their back towards the robot would remain undetected. Since these are one of the largest dangers for this robot, this was deemed a bigger issue than the risk of sitting humans remaining undetected, which is another reason to why the type of method was chosen as it was.

One suggestion is to use dual detection algorithms. One which will identify sitting persons, for example an algorithm which uses faces to detect the presence of humans. Combined with another which is able to detect humans from behind, for example the suggested C4 method, which uses the human contour as its detection key.

7.2 Research question 2

The second research question is formulated as: What sensors are needed to identify humans and other obstacles? It was decided that in order for the robot to identify humans a regular camera would be sufficient. Since the resolution of the image is directly related to the processing time of the image, a low resolution is optimal for a fast image processing. To decrease the need for high resolution, a method using the human contour was chosen instead of a method where for example face recognition is used. In the end, a resolution of 120 ∗ 160 pixels was enough to still get a clear contour of a human in the image.

(32)

However a broader field of vision on the camera would have been to prefer since this would have made it possible for the robot to detect humans closer to the robot. The biggest reason to why this is necessary is the low position of the camera on the robot. On a taller robot this would not have been such a big problem.

To detect obstacles, the biggest question has been whether knowing what object is detected is relevant or if it is enough knowing that something is there. Since there is no difference in avoiding a wall, a chair or a lamp, it was decided that the distance was the only relevant property of the detected obstacle. Because of this, distance sensors were used to detect the different obstacles.

An additional sensor on each side would have made the robot more robust when it comes to object detection. Narrow objects are sometimes missed in this implementation which occasionally leads to the robot bumping in to different obstacles. These sensors could also have a longer range than the ones currently used. At the moment the sensors used on the sides have trouble with the distance, when this exceeds the maximum range of these sensors. Since their maximum range is 40 cm, this happens at many occasions, rendering them to give out random values which are sometimes perceived as negative distances. Most of these hardware bugs has been straightened out in software.

As the system is designed today, the robot only moves in a forward motion and a rotational motion, it was therefore concluded that there was no need to place the camera and the front viewing sensor on a rotational platform since the robot’s motions does not require it to see in any other direction than straight forward.

7.3 Research question 3

The final question is formulated as: Is it possible to do the sensor fusion completely on an embedded system which can be placed on the robot? When it comes to the sensor fusion it was found that it was possible to do the sensor fusion for the navigation along with the actual navigation algorithms completely on the embedded system on the embedded system chosen. It is hard to determine whether the human detection would run at the desired speed on this platform since this part of the system is not working. However, the frame rate need to be speeded up 20 times to achieve this. It is questionable if that kind of improvement is possible. There were never any time to try to implement part of the human detection on the FPGA. However, doing this would speed up the detection. Alternatively, a more high preforming system would be a viable option to increase the frame rate on the human detection.

The navigational algorithm is fast enough to run on this system. At this point in the development, the navigation is deemed the most important part of the implementation since the obstacle detection still will indicate on getting to close to humans.

(33)

8 Future work

In order to test the product in a hospital environment the human detection has to be fixed, this requires use of a different, more powerful, hardware system which would make it possible to get a higher frame rate. Getting the detection to work will also include further investigations to identify and correct the current problems in the human detection.

For initial testing this is the only thing missing. The initial testing in a hospital should contain the robot moving safely through a corridor without bumping into anyone of anything.

Further testing would involve a new navigational algorithm where the floor plan is known to the robot and the travel route is decided from this. These tests would include letting the robot move from one place in the hospital to another. As described earlier a combination of the now implemented fuzzy navigation algorithm with the A* algorithm used for global path planning from the method described under Method 1 in section 2.4.1 is suggested since it would mean that the navigation as it is today would be kept and the new parts would just be added to it. An overview of the suggested method has been made (Fig. 17).

Since the question of adjustable autonomy has still to be studied, further investigations will have to be done to determine how to address this issue. This will have to include some sort of study where care givers and patients are allowed to give their input on autonomous transportation to better determine the benefits and drawbacks as well as determining how avoiding unpleasant encounters.

Figure 17: The suggested design for the navigation algorithm.

During the later parts of this thesis the question of ethics was raised. Is it ethically acceptable to use cameras to navigate a hospital environment? Since patients may be among the people being detected? In this case no images will be saved, nor will they be seen by any humans, however, the cameras can be seen and how will this affect the humans it is interacting with? These are all questions that has to be answered before this product can be launched.

Autonomous Vehicle with Obstacle Identification and Detection

V¨

aster˚

as, Sweden

DVA502 Thesis for the Degree of Master of Science in Engineering

-Robotics 30.0 credits

AUTONOMOUS VEHICLE WITH

OBSTACLE IDENTIFICATION AND

DETECTION

Weronica Kovala

wrg10002@student.mdh.se

Examiner: Baran Curuklu

M¨

alardalen University, V¨

aster˚

as, Sweden

Supervisors: Martin Ekstr¨

om

M¨

alardalen University, V¨

aster˚

as, Sweden

Company supervisor: Martin Rydberg,

˚

AF Technology, V¨

aster˚

as, Sweden

Acronyms

Contents

1

Introduction

1.1

Problem Formulation

2

Background

2.1

Motivation

2.2

Sensors

2.3

Human detection

2.4

Navigation

2.5

Adjustable Autonomy

3

Method

3.1

Sensors

3.2

Human detection

3.3

Navigation

3.4

Software system design

3.5

Adjustable Autonomy

4

Hardware

4.1

Platform

4.2

Power supply

4.3

Sensors

5

Software

5.1

Navigation

5.2

Human detection

6

Results

6.1

Research question 1

6.2

Research question 2

6.3

Research question 3

7