Sampling and predicting geographic areas using participatory sensing

(1)

IT 15083

Examensarbete 30 hp

December 2015

Sampling and predicting geographic

areas using participatory sensing

Wei Wang

Institutionen för informationsteknologi

(2)

(3)

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

Sampling and predicting geographic areas using

participatory sensing

Wei Wang

Participatory sensing is the concept that people contribute information they retrieved independently from the environment using sensors to build a whole body of

knowledge. With the popularity of mobile devices, such as smart phones, which have multiple sensors and wireless interfaces, “participatory sensing” has become feasible in a large-scale. Spatial sampling is a technique using a limited number of geographical samples to achieve high credibility in measurement, and then predicting data values for unsampled areas. In this paper, participatory sensing is combined with spatial sampling and prediction, and evaluated under various scenarios.

In this paper, an approach based on participatory sensing, sampling and predicting spatial data and evaluating participatory sensing involving prediction results is designed. A Java system prototype is implemented based on the design. Perlin noise and the ONE simulator are used to implement simulation for spatial sampling with participatory sensing. In the prediction, three different prediction algorithms are applied, Voronoi diagram, Delaunay triangulation with gradient and ordinary Kriging. Evaluation of participatory sensing and spatial sampling is measured by

root-mean-square-error between true map and predicted map by pixels. The results of the experiments indicate that generally the Voronoi diagram has larger error value than Delaunay triangulation with gradient when only having a few samples. And ordinary Kriging produces the most accurate results but it has highest time complexity and requires a large number of samples to achieve high accuracy. In addition, more evenly distributed samples contribute to higher accuracy of prediction. Given a proper guide, participants in participatory sensing can improve the spatial sampling quality a lot.

Tryckt av: Reprocentralen ITC IT 15083

Examinator: Edith Ngai

(4)

(5)

1

Content

Chapter 1. Introduction ... 1

1.1 Background and motivation ... 1

1.2 Problem description ... 2

1.3 Thesis structure ... 3

Chapter 2. Relevant research ... 5

2.1 Spatial sampling ... 5

2.2 Participatory Sensing ... 6

2.3 Participatory sensing in the context of disasters ... 6

Chapter 3. Experiment design... 8

3.1 Experiment scenario ... 8

3.2 Experiment design ... 9

3.3 Technology roadmap ... 9

Chapter 4. Techniques of sampling spatial data ... 11

4.1 Perlin noise ... 11

4.1.1 Why is Perlin noise? ... 11

4.1.2 Noise ... 12

4.1.3 What is Perlin noise? ... 12

4.2 The ONE simulator ... 14

4.2.1 What is the ONE simulator? ... 15

4.2.2 Why is the ONE simulator? ... 15

4.2.3 The ONE simulator in system prototype... 16

Chapter 5. Prediction Methods ... 17

5.1 Voronoi diagram ... 17

5.1.1 What is Voronoi diagram ... 17

5.1.2 Algorithm of Voronoi diagram ... 18

5.2 Delaunay triangulation with gradient ... 19

5.2.1 Delaunay triangulation ... 20

5.2.2 Barycentric coordinates ... 20

5.2.3 Implementation of barycentric coordinates ... 21

5.2.4 Delaunay triangulation with gradient ... 22

5.3 Kriging ... 22

5.3.1 Regionalized variables ... 23

5.3.2 Variogram function ... 23

5.3.3 Kriging ... 23

Chapter 6. Evaluation of prediction methods... 28

6.1 Grayscale image ... 28

6.2 RMSE ... 29

Chapter 7. System prototype ... 30

(6)

2

7.1.1 System prototype development environment ... 30

7.1.2 System prototype structure ... 31

7.1.3 System prototype process design ... 33

7.2 Input ... 35

7.2.1 Default settings of simulation ... 35

7.2.2 Simulation time ... 36

7.2.3 Update interval ... 36

7.2.4 Number of hosts ... 36

7.2.5 Moving speed ... 36

7.2.6 Perlin noise scale ... 37

7.3 Output ... 37

Chapter 8. Data collection and analysis ... 38

8.1 Results analysis -‐ simulation time ... 38

8.2 Results analysis -‐ update interval ... 40

8.3 Results analysis -‐ number of host ... 42

8.4 Results analysis -‐ Moving speed ... 44

8.5 Results analysis -‐ Scale of Perlin noise ... 45

Chapter 9. Conclusion and future work ... 47

9.1 Conclusion ... 47

9.2 Future work ... 48

Reference ... 50

(7)

1

Chapter 1. Introduction

1.1 Background and motivation

With the popularity of the mobile devices, especially smart phones, more and more functions have been developed based on this handheld equipment. With the help of the mobile internet, smart phones could achieve capturing, transmitting and storing texts, images, locations and other kinds of data, interactively and autonomously [1]. Usually, the wireless devices have relative low battery and memory capacity but they can interface with infrastructure easily. Integrating sensors into smart phones, could act as sensor node in wireless network. Participatory sensing is the concept that people contribute information they retrieved independently from the environment using sensors to build a whole body of knowledge [2]. Thanks to the widespread popularization of mobile devices, constructing a large-‐scale participatory sensing network becomes possible [3].

Natural disaster is always one of the heat-‐debated subjects for scientists all over the world. Not only because of the difficulties to precisely predict it͛s coming, but also the consequent catastrophic destruction brings huge economic losses, countless homelessness, even millions of deaths. According to ͞ŶŶƵĂů Disaster Statistical

Review ϮϬϭϭ͟, in 2011, natural disaster killed 30,773 people and caused 244.7 million

victims worldwide. The estimated economic losses from natural disasters were US$ 366.1 billion [4]. The record only including registered disasters is so appalling, not to speak of counting the regular mini-‐disasters. How to get latest and useful information from disaster area in order to rescue lives in prime time is one of the most meaningful topics. Besides, monitoring the statuses of disaster areas is also important to the disaster rehabilitation work.

Participatory sensing leverages the wider public to each collect small separate piece of data. The purpose is to integrate separate datasets to build overall information for an area. Usually, the whole body of knowledge means much greater accuracy and scope than any single piece of it [2]. For example, an earthquake could destroy roads, bridges and buildings in an area. However, planning a safe escape or rescue route needs the latest traffic map for the whole disaster area. If people could use smart phones to update traffic status at their spots and create a latest traffic map, it could save more people͛s lives. No doubt, using participatory sensing network to help people in disaster area is a meaningful and worthwhile job.

(8)

2 contamination. The local residents who have the radiation sensors embedded in their smart phones are acting as moving nodes in participatory sensing network. The smart phones embedded with radiation sensors collect the data periodically and stores in the smart phones. The data can be uploaded to base station one time per day or transfer in real-‐time via wireless network. Applying proper prediction method to collected data can generate the predicted radioactive contamination map.

The research leverages the public to collect radioactive contamination data, as the local residents would be willing to participate in research for their own security sake. It helps scientists gather large number of first-‐hand data with less time and cost. Furthermore, the predicted radioactive contamination maps contribute to predicting the moving trend of radioactive contamination, and help to rehabilitation work.

1.2 Problem description

This paper focuses on using a few samples to predict spatial data for the whole area based on participatory sensing. An approach is designed and implemented based on it, which could also benefit researches in geographic area. The method for generating different samples distribution is to applying different values to parameters in participatory sensing to simulate different situations for data sampling. The analyzed results can be used to guide participants collecting more valuable pieces of data and contribute time efficiency and economy to participatory sensing.

The thesis research resolves the following questions:

1. Within a certain area, what factors of sampling process will affect the sample distribution?

2. To what extend the factor affect the sample distribution?

3. With different numbers and distributions of samples, which prediction method is more suitable?

4. How to guide participants collecting more valuable data?

(9)

3

Figure 1.1 Procedures for predicting a radiation map

In this paper, different values for participatory sensing parameters are applied to generate different distributions of samples. With the same set of samples, three prediction methods are applied to produce predicted maps: they are Voronoi diagram,

Delaunay triangulation with gradient and ordinary Kriging. According to the prediction

result, conclusions are made from different aspects.

1.3 Thesis structure

Chapter 2 describes the relevant research related to spatial data and participatory sensing. In this paper, spatial data is the research content, and participatory sensing is the research background and approach. Section 2.1 introduces the concept of spatial data and spatial sampling. Section 2.2 describes what is participatory sensing, the differences between wireless sensing network and participatory sensing network, and the premise to make participatory sensing feasible in large scale.

Chapter 3 introduces the experiment scenario and experiment design. Measuring the radioactive contamination level in Fukushima and places nearby is the problem need to solve. In this chapter, an approach based on simulation of participatory sensing sampling and predicting radioactive contamination level in Fukushima and places nearby is design and related technical road map is proposed.

Chapter 4 discusses the techniques applied in sampling spatial data. The following problems will be discussed in chapter 4: (a) What is Perlin noise? (b) Why use Perlin noise instead of using true data? (c) What is the ONE simulator? (d) Why use the ONE simulator to simulate the sampling process instead of real sampling?

In Chapter 5, three prediction methods mentioned in Section 1.1 are introduced. In order to evaluate the performances of different prediction methods, the system prototype uses grayscale image and Root Mean Square Error (RMSE) to compare the difference between true and predicted spatial data maps. They are introduced in Chapter 6.

(10)

4 Chapter 7 introduces the design and implementation of the system prototype and the input of the experiments.

Chapter 8 presents the results of experiment according to the inputs introduced in Section 6.2.3. In this chapter, each section shows results of varying one variable and fixed other variables in the experiment, and analyzes three prediction algorithms in diverse aspects.

(11)

5

Chapter 2. Relevant research

The popularization of the mobile devices prompts people developing more and more functions based on it. With the help of the mobile internet, smart phones could achieve capturing, transmitting and storing texts, images, locations and other kinds of data, interactively and autonomously [1]. These features make constructing a large-‐ scale participatory sensing network possible.

Radioactive contamination is a kind of data, which needs to be identified by geographic location and stored as coordinates or topology. Such types of data are called spatial data [6]. Spatial data are used to describe the properties of objects. Another related concept is spatial sampling. Spatial sampling is the theoretical foundation for predicting spatial data with only a few samples [7]. Research on spatial data is always a heated discussion topic in geostatistics area [8]. Back in 1951, Danie G. Krige had tried to solve problems on mine valuation based on statistic method, which is the base of Kriging [9].

In this paper, spatial data is the research content, and participatory sensing is the research background and approach. Section 2.1 introduces the concept of spatial data and spatial sampling. Section 2.2 describes what are participatory sensing, the differences between wireless sensing network and participatory sensing network, and the premise to make participatory sensing feasible in large scale.

2.1 Spatial sampling

As mentioned above, all the data and geographic information talked about in this paper are spatial data. Spatial data is used to describe the location, shape, size, distribution and other information of space objects in the real world [6]. For example, radioactive contamination, humidity, temperature are all spatial data. Spatial data has unique location in space coordinate system, and it is used to describe the properties of objects. Nowadays, spatial data is widely used in human daily life, such as transportation, urban planning, information communication, aerospace, satellite positioning and so on [10].

(12)

6 In this paper, we start with the story of radioactive contamination in Fukushima, which is an example of spatial data. On March 11, 2011, a magnitude-‐9 earthquake with tsunami visited Fukushima. It crippled the Fukushima Daiichi nuclear plant, which leads to massive radiation releasing into atmosphere and ocean. It is known that long time of exposure to the radiation is harmful to living creatures and the damage is irreversible [13]. In this case, daily monitoring of radiation level in Fukushima and places nearby is really important to local residents.

2.2 Participatory Sensing

With the popularity of mobile devices, such as smart phones, which have multiple sensors and wireless interfaces, ͞ƉĂƌƚŝĐŝƉĂƚŽƌǇ ƐĞŶƐŝŶŐ͟ has become feasible in a large-‐ scale. Participatory sensing is the concept that people contribute information they retrieved independently from the environment using sensors to build a whole body of knowledge. Usually, the wireless devices have relative low battery and memory capacity but they can interface with infrastructure easily. Smart phones can act as spatial sensor nodes in wireless network as they are location-‐aware [1].

Wireless sensor network research has investigated integrating sensing, uploading, and computation in sensors to collect data motivated by military, industry and sciences. The difference between wireless sensor network (WSN) and participatory sensing network (PSN) is that sensors in PSN are generally individually controlled by users [2]. To be precise, the sensors in WSN are pre-‐deployed and controllable devices. However, sensors in PSN are self-‐controlled by participants, and they are more like free style, always on, moving sensors. Usually, the base number of sensors in PSN is much larger than sensors in WSN, and the purposes for collecting data more target in public sphere. In short, participatory sensing leverages the public to build an interactive and participatory network.

Usually, sensors are deployed by organizations to members for sensing information from the environment. Usually in order to arouse the interest for public to take part in, participants get paid or benefit from sensing results [2]. For example, the radioactive contamination around Fukushima is life threatening and closely bound up with local residents͛ daily life. The results of sensing radioactive contamination are important to the public, which enables participatory sensing in the large scale.

In summary, participatory sensing is a concept involves civic engagement, data collection, computational thinking, math and science. It leverages the public to collect data and contribute to the whole body of knowledge.

2.3 Participatory sensing in the context of disasters

(13)

7 human who experience or victim the disaster often react quicker than any government and organization[14][15]. In addition, they could provide information of extend of the damage, the evolution about the disaster. With the help of smart phones and cloud service, the messages and images related to the disaster are easy to be spread to outside. This dynamic and real-‐time data is critical for building information about shelter locations, family tracing, and missing people [14]. Combining pieces of information to a real-‐time situation map gives people more awareness of the current situation and contributes to better decision-‐makings. In the context of disaster, a thorough combination of spatial sampling and participatory sensing network is a promising and meaningful solution.

One research direction concentrates on spatial sampling [15]. For example, researches about improving the accuracy of the spatial sampling. Another direction concentrates on network [14]. Like how to make sure the data can be transferred with integrity timely? How to transfer different type of data in an ad-‐hoc network? How to avoid traffic congestion in the ad-‐hoc network? The others focus on disaster management [14][15]. The topics involve civic engagement and collaboration between different roles in disaster. Such as how to use social media, like applications for smartphones to build the bridge between onsite human sensors to related organizations͛ and governments͛ƌĞƐƉonders? How to effectively collect useful information and build a whole map for rescuing during disaster and rehabilitation after disaster? How to rapidly deliver useful message to corresponding decision makers and responders？ How to use current information to generate more reliable decision?

This thesis is focusing on the integration of spatial sampling and participatory sensing to generate conclusions about what types of aspects affect the prediction results, and how to guide people in participatory sensing to generate more meaningful information. In addition, system architecture is designed and implemented as a proof of contents.

(14)

8

Chapter 3. Experiment design

3.1 Experiment scenario

A magnitude-‐9 earthquake with tsunami visited Fukushima on March 11, 2011. It crippled the Fukushima Daiichi nuclear plant, which leads to massive radiation releasing into atmosphere and ocean. Monitoring the radiation level in Fukushima and places nearby became a regular task for Japanese government and the public [16]. It is known that long time of exposure to the radiation is harmful to living creatures and the damage is irreversible. In this case, scientists tried to predict the radiation level for the whole area by only collecting a few samples.

This method is obviously easier compared with sampling everywhere in the area, considering safety, efficiency, and economy. Besides, sampling everywhere is impossible under some circumstances. For example, the area is extremely large or some spots are out of reach. In conclusion, using a few samples to predict corresponding spatial data for a whole area is preferable.

Figure 3.1 shows procedures to get a predicted radiation level map for places near Fukushima by using a few samples. Firstly, collect samples from the area being contaminated. The grayscale image (A) is a simulation of radiation map for the area being contaminated. The different level of gray colors, varying from black at the weakest intensity to white at the strongest, represents the different levels of radioactive contaminations [17]. The red spots on image (B) are samples (radioactive contamination data) collected on those places. Secondly, predicting the radioactive contamination level for other places in this area only according to the samples gathered in step one. The image (C) is a predicted radioactive contamination map. It is generated by a prediction method and samples gathered in image (B).

(15)

9

3.2 Experiment design

In this experiment, our approach is designed to predict spatial data based on participatory sensing and calculate the RMSE (root-‐mean-‐square error) to evaluate the performance of different algorithms [18]. This approach is applicable to sampling and predicting different types of scalar spatial data. For example, temperature, humidity, traffic condition and environment pollution. In order to evaluate the algorithm precisely, Perlin noise technique is used to generate a ͞ŐƌŽƵŶĚ ƚƌƵƚŚ͟ map. Perlin noise is a procedural texture generation technique [19]. The ONE simulator is an Opportunistic Network Environment (ONE) simulator. It is used to simulate a sampling procedure [20].

In this paper, the experiment area is 500 square kilometers. It is the size of a medium/large city approximately. Perlin noise generations with 500*500 pixels are generated to simulate radioactive contamination levels in this area. Random waypoint is chosen as mobility model of participants, since it is one of the most fundamental and widely used movement models. In the experiment, the settings of number of participants, simulation time, update interval, and participants͛ moving speed are varied.

Figure 3.2 shows the true map of radiation level (A), the true map with samples (B) and predicted map (C). The experiment is divided into three procedures, sampling (1), prediction (2), and algorithm evaluation (3). The procedure (3) is the generated by comparing true map (A) and prediction map (C).

Figure 3.2 Experiment design：sampling, prediction and evaluation

3.3 Technology roadmap

(16)

10 prediction and evaluation. The techniques mentioned in Figure 3.3 are introduced in Chapter 3, 4, and 5.

Figure 3.3 Technology roadmap

In order to achieve all procedures as shown in Figure 3.2, a prototype system is implemented based on Java. The prototype system is planned to finish the following tasks:

1. Produce Perlin noise generations.

2. Integrate with the ONE simulator, sampling data according to input parameters.

3. Implement three different prediction algorithms, Voronoi diagram, Delaunay triangulation with gradient, and Ordinary Kriging.

4. Compare the ground truth with the predicted map using the RMSE to understand how well the prediction works.

5. Achieve batch mode to produce result according to different inputs.

(17)

11

Chapter 4. Techniques of sampling spatial data

In this paper, the approach is based on participatory network, using Perlin noise generation to simulate spatial data, sampling and predicting spatial data. Prediction techniques are designed, implemented and evaluated. The approach shows in Figure 3.2. The grayscale image (A) is a true map of one kind of spatial data for a square area, and the grayscale value of each pixel represents the spatial data value at that point. The red points in image (B) are samples. The image (C) is the prediction only based on gathered samples in image (B). The evaluation is to compare the true map (A) with predicted map (C) on each pixel. The paper will follow Figure 3.2 to introduce the related techniques and implemented experiment.

According to Figure 3.2, the project is divided into three stages, which are sampling, prediction and evaluation. Chapter 4 discusses the techniques applied in sampling spatial data, the process is marked as (1) in Figure 3.2. The following problems will be discussed in chapter 4: (a) What is Perlin noise? (b) Why use Perlin noise instead of using true data? (c) What is the ONE simulator? (d) Why use the ONE simulator to simulate the sampling process instead of real sampling?

4.1 Perlin noise

Perlin noise is a procedural texture generation technique [19]. It is a type of noise that appears smooth and looks natural, which is widely used in computer graphics. Especially in real time computer games, Perlin noise is used to trade time for space [21]. In this paper, Perlin noise is used to simulate the spatial data. The reasons for using Perlin noise instead of real spatial data are stated in Section 4.1.1. Section 4.1.3 introduces what is Perlin noise, and Section 4.1.4 describes the steps to produce Perlin noise.

4.1.1 Why is Perlin noise?

(18)

12 unavailable, since the huge investment of time and resources. In some special cases, sampling some area is impracticable. On the contrary, Perlin noise is produced by computer program. It is quick, cheap and customized easily by varying variables [23]. Besides, Perlin noise generations are repeatable. This ͞pseudo-‐random͟ feature enables Perlin noise to be a perfect subject in the perspective of assessing prediction methods [24]. Finally, Perlin noise could generate thousands of generations under the same condition by varying value of random seed. To be precise, the experiments will run multiple Perlin noise generations with different appearances but the same setting of variables. The average value of repeated experiments contributes a confident result.

4.1.2 Noise

Noise is a primitive texture. It can be used to create a wide variety of natural appearance texture. Combining noises into different mathematical expressions generates procedural texture, which is called noise function in mathematics [23]. Noise function has three features, which are shared with Perlin noise.

Pseudo-‐random

Images produced by the noise function appear to be random, but they are not truly random. Random means the noise generation looks random and irregular. However, given the same input, noise function will produce the same output [23]. This feature makes experiments based on Perlin noise repeatable.

ࡾ࢔ to R

Noise is a mapping from ܴ௡ to R, and ͚n͛ represents the dimension of the space. Inputting an n-‐dimensional real coordinates in the space, then noise function will return a real value. The most commonly used are n=1, n=2 and n=3. Given a space, any coordinate in the space has a corresponding value. This feature is shared with spatial data, and it is also the theoretical foundation for using Perlin noise to simulate spatial data.

Band-‐limited

If noise is looked as a signal, generally almost all of its energy is focused on a small part of the frequency spectrum. In another word, the high frequencies and low frequencies contribute very little energy [23]. This is the feature shared with all the natural things, as for most of things in nature are in normal distribution. One step further, this feature is the theoretical basis for Perlin noise͛s natural appearance.

4.1.3 What is Perlin noise?

(19)

13 Perlin noise is a procedural texture technique. It is developed by Ken Perlin, who received a Technical Achievement Award from the Academy of Motion Picture Arts and Sciences in 1997 [24]. Perlin noise is a type of gradient noise. It is used to increase the appearance of realism in computer graphics. The function generates a pseudo-‐ random appearance, which mentioned in Section 3.1.1. Given the same input, Perlin noise function will produce the same output. This property makes it easy to control, and the results are repeatable. Multiple scaled copies of Perlin noise can be put into mathematical expressions to generate a variety of procedural textures [24]. Images generated based on Perlin noise are in high quality of simulation. Most textures on natural objects can be produced based on Perlin noise, such as smoke, cloth, fire and marble. Figure 4.1 is the computer graphics generation based on Perlin noise.

Figure 4.1 Computer graphics generation based on Perlin noise

(20)

14

Figure 4.2 Perlin noise generations in scale 1 (A) and scale 4 (B)

With the same scale, generating different appearances of Perlin noise is controlled by random seed. In the Project, the seeds = 1, 10,100,1000,10000 are chosen to reduce the impact from a specific Perlin noise generation and contribute a relative reliable results. Perlin noise generation in scale 4 with seed = 1, 10 are shown in Figure 4.3.

Figure 4.3 Perlin noise generations in scale 4 with different random seed values

4.2 The ONE simulator

In this project, the ONE (Opportunistic Network Environment) simulator was chosen to simulate the participatory sensing process. Nodes in the ONE simulator are worked as participants (data collectors). The ONE simulator provides multiple movement models for sensor nodes. Besides, some parameters in the project (number of sensor nodes, moving speed and total simulation time) are set in the ONE simulation.

(21)

15 noise, the ONE simulator could simulate participants moving as preset and periodically read and upload spatial data at their current positions.

4.2.1 What is the ONE simulator?

The ONE simulator is an Opportunistic Network Environment (ONE) simulator. It is developed by scholars from Helsinki University in Finland [25]. Opportunistic networking is a subclass of Delay-‐Tolerant Network where network contacts are intermittent or link performance is highly variable. In this case, there is no end-‐to-‐end path between source and destination for most of the time. Besides, the path can be highly unstable and may change or break frequently. In order to make communication possible in opportunistic networks, the intermediate nodes are using different protocols to realize message ferrying [26].

The ONE simulator can generate nodes movement according to different movement models. It also provides routing message between nodes with different routing algorithms and various sender-‐receiver types. In addition, users can visualize the nodes movement and message delivery in real time with the graphic user interface of the ONE. What is more, it provides multiple ways to visualize data and results. Figure 4.4 shows the components of the ONE simulator [20].

Figure 4.4 components of the ONE simulator [20]

4.2.2 Why is the ONE simulator?

(22)

16 ONE simulator provides batch mode to run simulations, which is suitable for a large number of experiments. Fourthly, the ONE simulator is developed in Java, and the system prototype is also based on Java. Technically speaking, the ONE simulator could be integrated into a system prototype seamlessly.

4.2.3 The ONE simulator in system prototype

Perlin noise generations are used to simulate spatial data maps, and the ONE simulator is used to simulate sensing and sampling spatial data. Combining them together can help model participatory sensing. Random waypoint is chosen as the nodes͛ movement model in simulations. There are two main reasons for choosing random waypoint. Firstly, it is the most common used movement model for all-‐ purpose [27]. Secondly, the movement of people is relatively random compared to pre-‐deployed sensors. We can set simulation conditions by varying values of parameters in the ONE simulator. Parameters need to be set are shown as following: ͻ (X,Y)：The size of Perlin noise generation, as well as the size of movement area

for sensors.

 Simulation Time(s)：The total time for simulation.

 Update Interval(s)：The time interval for participants (nodes) to read and upload

data.

 Number of Hosts：The number of participants (nodes).

 Moving Speed(m/s)：The moving speed of participants (nodes).

 Mobility Model：The movement model for participants (nodes), it is set as

random waypoint.

 Mobility Random Seed：Values of movement random seed.

(23)

17

Chapter 5. Prediction Methods

In this chapter, three prediction methods for spatial data are introduced. They are Voronoi diagram, Delaunay triangulation with gradient and Kriging. Voronoi diagram is a classic nearest neighbor searching algorithm, which is widely used in Computational Geometry [28]. Voronoi diagram is the first introduced algorithm because of its simple idea and facile realization. It is easy to come up with the idea of using the value of samples to predict areas nearby. Delaunay triangulation is a dual to Voronoi diagram. Delaunay triangulation with gradient is the idea combined of Delaunay triangulation [29] and barycentric coordinates [30]. Kriging is a geostatistical estimator, an optimal interpolation based on observed values and weights according to spatial covariance values [31]. The complexity of implementing three prediction methods grows in challenge and complexity.

5.1 Voronoi diagram

5.1.1 What is Voronoi diagram

Voronoi is a fundamental geometric data structure. It is also called Dirichlet tessellation. It was first proposed by Dirichlet in 1850 [32]. Then a Russian mathematician, Georgy Fedoseevich Voronoi, gave a further explanation in 1907 [33]. Voronoi diagram is a way to divide a space into a number of regions. Given a set of specified points in a space P {p1, ƉϮ͕͙͕ pn}, which are called seeds. For each seed, there are points within a corresponding region and all of them are closer to this seed than to any other. The divided regions are ͞Voronoi cells͕͟ and also called Thiessen polygon. Thiessen polygon has the following properties:

1. Every Thiessen polygon contains and only contains one seed.

2. All points in the Thiessen polygon have and only have one closest seed, which is the seed in the Thiessen polygon.

3. The points on edges of Thiessen polygon are in equal Euclidean distance to two adjacent seeds. In other word, edges of Thiessen polygon are the perpendicular bisectors of two adjacent ƐĞĞĚƐ͛ connecting lines.

(24)

18

Figure 5.1 Voronoi diagram

Voronoi diagram is a nearest neighbor-‐searching algorithm [28]. It is chosen to be the first prediction method in my project, because of its simple idea and straightforward realization. It is widely used in data analysis, data mining and data prediction. Voronoi diagram is widely used in geometry, crystallography, geography and meteorology.

5.1.2 Algorithm of Voronoi diagram

The idea of Voronoi diagram is easy to understand. Firstly, divide the area into pieces, and make each piece contains only one sample. Then, the predicted values of each piece are the same as the value as the sample.

The implementation of Voronoi diagram is the easiest of three prediction methods. The key is to find the Thiessen polygon with given samples. The procedure for producing Voronoi diagram can be decomposed into the following steps [34].

1. Input the samples͛ coordinates and values.

2. Traverse all the points on the plane. For each point, do the following operations: 1) Calculate the Euclidean distances with each sample.

2) Count the number of nearest samples.

3) Classify the points according to number of nearest samples. Points with only one nearest seed are points in Thiessen polygon. Points with more than one nearest seeds are on the edges of Thiessen polygon.

4) Assign points in Thiessen polygon with the value of the sample in the same Thiessen polygon. Assign values of points on the edge of Thiessen polygon closest ƐĂŵƉůĞƐ͛ mean.

5) Generate grayscale image according to the assigned values.

(25)

19

Figure 5.2 true map of Perlin noise in scale 1 (A) and predicted map with Voronoi diagram by using 2000 samples (B)

Figure 5.3 true map of Perlin noise in scale 4 (A) and predicted map with Voronoi diagram by using 2000 samples (B)

5.2 Delaunay triangulation with gradient

(26)

20

5.2.1 Delaunay triangulation

The idea of Delaunay triangulation is to divide an area into a triangle meshes. In this paper, all the vertexes of triangles are samples. The rules of Delaunay triangulation constraints the triangle meshes to be unique and the most regular.

In order to illustrate the Delaunay triangulation, we define Delaunay edges first. With given point set P, the edge set E is made up by two points from P. e is an edge in set

E, and the endpoints of e are a and b. If edge e satisfied ͞ŵƉƚǇ ĐŝƌĐůĞ͟ property, it is

a Delaunay edge. Definition of ͞ŵƉƚǇ ĐŝƌĐůĞ͟ is that if and only if there is a circle passing through a and b, and the circle does not contains any other points from set P. If a Delaunay triangulation only contains Delaunay edges, then the triangulation is Delaunay triangulation [36]. Delaunay triangulation is named after Boris Delaunay for his work on this topic from 1934. Figure 5.4 shows a Delaunay triangulation.

Delaunay triangulation has some good properties: 1. Uniqueness

No matter where to start building the triangle network, the final triangle network is always the same with the same inputted set of discrete points.

2. Most Regular triangle network

Delaunay triangulation maximizes the minimum angle of all the triangles in the triangulation, which contributes to build a most regularized triangle network. 3. Regionality

Adding, deleting or moving a vertex from the triangle network will only affect triangles nearby.

Figure 5.4 Delaunay triangulation

5.2.2 Barycentric coordinates

(27)

21 be written as a weight of three vertices ሺߣଵǡ ߣଶǡ ߣଷሻ with the constraint ߣଵ൅ ߣଶ൅ ߣଷ ൌ

ͳ [38]. Three parameters of barycentric coordinates represent the proportional affection from three vertices. In other word, vertex closer to the point has greater effect than the other two vertices. The corresponding parameter of that vertex is larger than the other. If values of three vertices of a triangle are given byݎ_ଵ , ݎ_ଶ and ݎ_ଷ. And ݎ_௣ is the value of a point P on this triangle. ݎ_௣ can be calculated by the formula ݎ_௣ൌ ߣ_ଵݎ_ଵ൅ ߣ_ଶݎ_ଶ൅ ߣ_ଷݎ_ଷ.

5.2.3 Implementation of barycentric coordinates

The weight of three vertices to point P in ߂ܣܤܥ is the same as the barycentric coordinates of P. if the vertices of the triangle are written as ሺͳǡͲǡͲሻ , ሺͲǡͳǡͲሻ andሺͲǡͲǡͳሻ, and the barycentric coordinate of P is ሺߣଵǡ ߣଶǡ ߣଷሻ with the constraint ߣଵ൅

ߣ_ଶ൅ ߣ_ଷ ൌ ͳ. Then , , , which is shown in Figure 5.5.

Figure 5.5 Barycentric coordinate

The area of triangle in the formula can be calculated by Heron͛s formula [39]. Heron͛s formula is defined as (5.1). Edges of a triangle are written as , and the area of the triangle is written as .

, (5.1)

Given the coordinates of three vertices A, B, C of ȟ in two-‐dimensional space, which areሺୟǡ ୟሻ, ሺୠǡ ୠሻ andሺୡǡ ୡሻ. Then the length of three edges of ߂ܣܤܥ are

ܣܤ ൌ ඥሺݔ_௕െ ݔ_௔ሻଶ_{൅ ሺݕ}

௕െ ݕ௔ሻଶ , ܤܥ ൌ ඥሺݔ௖ െ ݔ௕ሻଶ൅ ሺݕ஼െ ݕ௕ሻଶ , ܥܣ ൌ

ඥሺݔ_௔െ ݔ_஼ሻଶ_{൅ ሺݕ}

௔െ ݕ௖ሻଶ. If a point is on the edge of triangle, ,ĞƌŽŶ͛Ɛ formula is not

suitable anymore. According to the ͞ĂĨĨĞĐƚŝŽŶ ƚŚĞŽƌǇ͕͟ the point is only affected by two vertices on the same edge. If point P is on the edge AB of߂ܣܤܥ , then the

(28)

22

5.2.4 Delaunay triangulation with gradient

The combination of Delaunay triangulation and barycentric coordinates generates smoothly changing values within the area of Delaunay convex hull. The prediction of the area outside the Delaunay convex hull is to use the value of the nearest predicted point͛s value. The true map and predicted map with Delaunay triangulation with gradient of Perlin noise in scale 1 is shown in Figure 5.6. Figure 5.7 shows true map and predicted map with Delaunay triangulation with gradient of Perlin noise in scale 4.

Figure 5.6 true map of Perlin noise in scale 1 (A) and predicted map with Delaunay triangulation with gradient by using 2000 samples (B)

Figure 5.7 true map of Perlin noise in scale 4 (A) and predicted map with Delaunay triangulation with gradient by using 2000 samples (B)

5.3 Kriging

(29)

23 perspective, Kriging is an unbiased optimal estimation, based on correlation and variability of variables, targets on predicting regionalized variables. From differential analysis perspective, Kriging is a method to generate linear optimal, unbiased interpolation estimation for spatial data [40]. Kriging is suitable for spatially correlated regionalized variables. In order to understand Kriging, the concept of regionalized variables and variogram function will be introduced first, which will be introduced in Section 5.3.1 and 5.3.2 respectively. Kriging will be introduced in Section 5.3.3.

5.3.1 Regionalized variables

Regionalized variables refer to variables distributed spatially. This kind of variable itself reflects property of spatial distribution. Such as mineral, meteorology, ecology, temperature, humidity, concentration and so on, these are all regionalized variables. Without the description of regional properties, these values are meaningless [41]. Regionalized variable has the following features:

1. The regionalized variable Z(X) is a random function. It is a random value before observation.

2. The regionalized variables have general structure. That is to say, the random variable Z(X) and Z(X + h) at point X and X + h have autocorrelation to some extent. In addition, the autocorrelation depends on h, the distance between two points, and structural characteristics of regional variable.

5.3.2 Variogram function

Variogram function is a specific basic tool for geostatistical analysis. It is frequently used in estimation process. It describes the spatial dependency of spatial random field [42]. It is characterization of spatial correlation. In one-‐dimensional space, variogram function is defined as following. As space point x moving on X axis, Z (x) and Z (x + h) are regionalized variables at point x and point (x + h). The variogram function for Z(x) on X axis is marked as ࢽሺࢎሻ. Then the expression of variogram function is shown as (5.2)

（5.2）

5.3.3 Kriging

Simply speaking, Kriging interpolation is a method that weights values of measurement points around a point to be predicted, to generate the value of the predicted point. Kriging interpolation algorithm is similar with the inverse distance weighting method. The key for both of them is to calculate the weights. The commonly used formula for Kriging interpolation is shown as expression (5.3), in which ࢆሺ࢙_࢏ሻ represents the observed value at position i, ࣅ_࢏ is the unknown weight at position i, ࢙_૙ is the predicted position, and N is the total number of measured positions [42].

(30)

24 （5.3）

In Kriging interpolation, weights of measurement points are not only related to the distance between measurement points, but also related to the overall spatial distribution of measurement points. That is to say, spatial autocorrelation need to be quantified. Therefore, Kriging interpolation usually has two steps: 1. Create variagram function and covariance function to estimate the spatial autocorrelation; 2. According to kriging interpolation formula, predict the values of unknown points.

First of all, pair all the measurement points, and calculate the distances between pairs of measurement points. Then determine the ͞ůĂŐ͟ according to the shortest distance and the longest distance. For example, Figure 5.8 is the scatter plot graph determining 10 meters as lag, in which Y-‐axis represents the average value of semivariogram, and X-‐axis represents the lag.

Figure 5.8 Scatter plot of semivariogram（y-‐axis: average value of semivariogram, x-‐axis : lag）

The next step is fitting a model according to the scatter plot of experimental semivariogram function. Theoretically, it is similar to the regression analysis. It is a

process of constructing a curve that has the best fit to data points. Kriging can use many

models for modeling the experimental semivariogram function: linear model, exponential model, Gaussian model, spherical model, and circle model. In this paper, exponential model is chosen as the fitting model, since it is the most widely used model. The exponential model is shown in Figure 5.9. The fitting result is shown in Figure 5.10.

(31)

25

Figure 5.9 the exponential model

Figure 5.10 Fitting result of exponential model

How to get value of unknown point based on semivariogram function and measurement points? That is to find the related points, and calculate the value of unknown point according to related points weights to the unknown point. Here is an example to illustrate. There are three points 1, 2, 3, and a point 0 needs to be predicted. We need to calculate the value of point 0 according to following equations, and ensure the minimum error estimation. In the expression, ߛ൫݄௜௝൯ represents the

semivariance of point i and point j; ߣ represents the Lagrange multiplier. Expressions (5.4) show how to calculate weights.

(32)

26 The weight of point 1, 2, 3 to point 0, marked as ଵ,ଶ,ଷ, are obtained based on

expressions (5.4). The value of point 0 can be calculated by substituting ଵ,ଶ,ଷ into

the formula (5.5).

ܼ଴ ൌ ݓଵܼଵ ൅ ݓଶܼଶ൅ ݓଷܼଷ （5.5）

There are several types of Kriging, Ordinary Kriging, Simple Kriging, Universal Kriging, Co-‐Kriging, Logistic Normal Kriging, Indicator Kriging, Probability Kriging, and Disjunctive Kriging and so on. This paper uses ordinary kriging method, since it is the most widely used kriging method. Ordinary kriging assumes the average value is constant and unknown. If it there is no scientific basis against this assumption, then it is reasonable [43]. Other kriging methods will not be discussed in this paper. If you are interested in Kriging, please refer to the related citations in the reference. In this paper, ordinary kriging is an open source code, implemented in Matlab, by Wolfgang Schwanghart. Prediction of Kriging for Perlin noise in scale 1 and scale 4 are shown in Figure 5.11 and Figure 5.12 respectively.

(33)

27

Figure 5.12 Prediction of Ordinary Kriging for Perlin noise in scale 4

(34)

28

Chapter 6. Evaluation of prediction methods

In order to evaluate the performances of different prediction methods, the system prototype uses grayscale image and Root Mean Square Error (RMSE) to compare the difference between true and predicted spatial data maps. The two methods evaluate the performances of different prediction methods from rough visual aspect to precise quantitative analysis.

6.1 Grayscale image

In computer field, grayscale image refers to images in which each pixel only has one scalar value. This type of images shows the color from black to white and gradient grey between them. The color and brightness of objects are all expressed through different grey level. Grayscale image is different from black and white images. In computer field, black and white images only have black and white two colors. Except black and white, grayscale image has different degrees of grey between black and white. These different degrees of grey colors can be represented by grayscale value. The range of grayscale value is from 0 to 255, 0 represents black and 255 represents white. From black to white, there are 256 gradients of grey [44].

In order to visually, quickly, and roughly compare prediction methods, the simulation generates grayscale images of the true Perlin noise generation and the predicted one. Values of pixels on Perlin noise generations are real number in the range of [-‐1, 1], however, values of pixels on grayscale images are integer in the range of [0,255]. In order to show Perlin noise image in grayscale, the values of Perlin noise are converted into required range. Given the value of one pixel on Perlin noise r, then the grayscale value of this pixel is the round up value of (r + 1) * 127.5. Figure 6.1 shows grayscale images of Perlin noise in scale 1 and scale 4 respectively.

(35)

29

6.2 RMSE

Root mean square error (RMSE) is a kind of method to measure error between the predicted values and actual observed values [45]. RMSE is sensitive to the very large or small error in a set of measurements, so the root mean square error can reflect the precision of the measurement [46]. RMSE is very suitable for evaluating results of different prediction models with the same parameters setting under multiple experiments. However, it is not suitable for the same model with different parameters. Formula (6.1) shows the formula of RMSE, in which ݕ̰௝is the predicted

value, and ݕ௝ is the observed value, n is the number of experiments.

（6.1）

In the simulation, the error of each pixel on the true map and predicted map are calculated, the n in formula (5.1) is the number of pixels. Since the value range of each pixel on Perlin noise is [-‐1, 1], the range of RMSE of true map and predicted map is [0, 2]. 0 represents the true Perlin noise map and the predicted one is exactly the same. The smaller the RMSE, the better the accuracy achieved.

(36)

30

Chapter 7. System prototype

7.1 System prototype

The experiment is divided into three procedures, sampling, prediction and algorithm evaluation. The sampling procedure applies the ONE simulator to simulate manual data collection, and uses Perlin noise generations to simulate the spatial data map for sampling. After that, according to the setting parameters, using participatory sensing data is achieved. In the procedure of prediction, three prediction methods described in chapter 4 are implemented to generate the whole prediction map of Perlin noise from the discrete samples. The algorithm evaluation procedure evaluates the three prediction methods based on the RMSE between the ground truth Perlin noise generation and the predicted Perlin noise generation.

This paper implemented a java-‐based prototype system to achieve sampling the data, generating predicted results and evaluating prediction methods with a computer. In addition, the different settings of parameters are running in batch mode with the system prototype. As long as the parameters are inputted, the predicted results and RMSE will be generated automatically, as shown in Figure 7.1.

Figure 7.1 System prototype design

7.1.1 System prototype development environment

The Java programming language is an object-‐oriented, simple and type-‐safe computer programming language, this language is generic, efficient, and portable. Java provides powerful framework and libraries to support the system development. Besides that, this paper uses the ONE opportunistic network simulator that is an open-‐source tool based on Java, thus, using java is the ideal choice for developing the system.

(37)

31

7.1.2 System prototype structure

Our system prototype is composed of 5 modules, sampling, prediction, algorithm evaluation, user interface and main frame.

Sampling and prediction modules are two completely independent system prototypes. Sampling module uses opportunistic network to simulate manually sampling data from the true map (Perlin noise generation), and save samples͛ coordinate, spatial data values into a text file.

The predicted module reads the coordinate and spatial data value from the text file and executes the selected prediction method(s) to generate the predicted value of whole geographic area.

Evaluation module calculates the whole area͛s RMSE and generates the grayscale predicted map according to true map and predicted map.

To configure all input parameters and choose different algorithms in one time, the user interface integrates the parameters that are required form different modules.

Finally, the prototype system will configure ONE simulator and Perlin noise diagram to batch generate prediction result, and store the predicted result according to different type of input parameters. The main frame is the bridge among the other four modules, which is responsible for data transferring.

The reason for System prototype contains two independent sub-‐systems is to avoid messing up the sampled data and predicted value with each other, and ensure the data are trustable. Besides, the two independent system prototypes promote the reusability of the system.

(38)

32

(39)

33

Figure 7.3 User interface of system prototype

7.1.3 System prototype process design

(40)

34